VISION-LANGUAGE MODEL FOR ROBOT GRASPING