arXiv AI recent: Measuring Whether LLM Tutors Teach or Solve: A Diagnostic for Educational Impact
Researchers introduced a diagnostic to assess whether LLM tutoring benchmarks distinguish learning-supportive behavior from mere answer production. Using MathTutorBench data, they found a...
The study used public MathTutorBench leaderboard results and TutorBench sample data. The correlation between solving-oriented and pedagogy-oriented performance was 0.421 across eight models.