+
+ In real-world settings, vision language models (VLMs) should robustly handle naturalistic, noisy visual content as well as domain-specific language and concepts.
+ For example, K-12 educators using digital learning platforms may need to examine and provide feedback across many images of students' math work.
+ To assess the potential of VLMs to support educators in settings like this one, we introduce DrawEduMath,
+ an English-language dataset of 2030 images of students' handwritten responses to K-12 math problems.
+
+
+
+ Teachers provided detailed annotations, including free-form descriptions of each image and 11,661 question-answer (QA) pairs.
+ These annotations capture a wealth of pedagogical insights, ranging from students' problem-solving strategies to the composition of their drawings, diagrams, and writing. We evaluate VLMs on teachers' QA pairs,
+ as well as 4,362 synthetic QA pairs derived from teachers' descriptions using language models (LMs).
+ We show that even state-of-the-art VLMs leave much room for improvement on DrawEduMath questions.
+ We also find that synthetic QAs, though imperfect, can yield similar model rankings as teacher-written QAs.
+
+ We release DrawEduMath to support the evaluation of VLMs' abilities to reason mathematically over images gathered with educational contexts in mind.
+
+