1. mlphys101 - Exploring the performance of Large-Language Models in multilingual undergraduate physics education
- Author
-
Völschow, M., Buczek, P., Carreno-Mosquera, P., Mousavias, C., Reganova, S., Roldan-Rodriguez, E., (0000-0002-4974-230X) Steinbach, P., Strube, A., Völschow, M., Buczek, P., Carreno-Mosquera, P., Mousavias, C., Reganova, S., Roldan-Rodriguez, E., (0000-0002-4974-230X) Steinbach, P., and Strube, A.
- Abstract
Large-Language Models such as ChatGPT have the potential to revo- lutionize academic teaching in physics in a similar way the electronic calculator, the home computer or the internet did. AI models are patient, produce answers tailored to a student’s needs and are accessible whenever needed. Those involved in academic teaching are facing a number of questions: Just how reliable are pub- licly accessible models in answering, how does the question’s language affect the models’ performance and how well do the models perform with more difficult tasks beyond retrieval? To adress these questions, we benchmark a number of publicly available models on the mlphys101 dataset, a new set of 823 university level MC5 questions and answers released alongside this work. While the original questions are in English, we employ GPT-4 to translate them into various other languages, followed by revision and refinement by native speakers. Our findings indicate that state-of-the-art models perform well on questions involving the replication of facts, definitions, and basic concepts, but struggle with multi-step quantitative reason- ing. This aligns with existing literature that highlights the challenges LLMs face in mathematical and logical reasoning tasks. We conclude that the most advanced current LLMs are a valuable addition to the academic curriculum and LLM pow- ered translations are a viable method to increase the accessibility of materials, but their utility for more difficult quantitative tasks remains limited. The dataset is available in English here only and will be removed, once the mlphys101 publication was accepted and released to the public.
- Published
- 2024