Generation And Grounding Of Natural Language Descriptions For Visual Data