Cite
SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading
MLA
Dinh, Tu Anh, et al. SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading. 2024. EBSCOhost, widgets.ebscohost.com/prod/customlink/proxify/proxify.php?count=1&encode=0&proxy=&find_1=&replace_1=&target=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2406.10421&authtype=sso&custid=ns315887.
APA
Dinh, T. A., Mullov, C., Bärmann, L., Li, Z., Liu, D., Reiß, S., Lee, J., Lerzer, N., Ternava, F., Gao, J., Röddiger, T., Waibel, A., Asfour, T., Beigl, M., Stiefelhagen, R., Dachsbacher, C., Böhm, K., & Niehues, J. (2024). SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading.
Chicago
Dinh, Tu Anh, Carlos Mullov, Leonard Bärmann, Zhaolin Li, Danni Liu, Simon Reiß, Jueun Lee, et al. 2024. “SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading.” http://widgets.ebscohost.com/prod/customlink/proxify/proxify.php?count=1&encode=0&proxy=&find_1=&replace_1=&target=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2406.10421&authtype=sso&custid=ns315887.