Back to Search Start Over

Unveiling the power of language models in chemical research question answering

Authors :
Xiuying Chen
Tairan Wang
Taicheng Guo
Kehan Guo
Juexiao Zhou
Haoyang Li
Zirui Song
Xin Gao
Xiangliang Zhang
Source :
Communications Chemistry, Vol 8, Iss 1, Pp 1-11 (2025)
Publication Year :
2025
Publisher :
Nature Portfolio, 2025.

Abstract

Abstract While the abilities of language models are thoroughly evaluated in areas like general domains and biomedicine, academic chemistry remains less explored. Chemical QA tools also play a crucial role in both education and research by effectively translating complex chemical information into an understandable format. Addressing this gap, we introduce ScholarChemQA, a large-scale QA dataset constructed from chemical papers. Specifically, the questions are from paper titles with a question mark, and the multi-choice answers are reasoned out based on the corresponding abstracts. This dataset reflects typical real-world challenges, including an imbalanced data distribution and a substantial amount of unlabeled data that can be potentially useful. Correspondingly, we introduce a ChemMatch model, specifically designed to effectively answer chemical questions by fully leveraging our collected data. Experiments show that Large Language Models (LLMs) still have significant room for improvement in the field of chemistry. Moreover, ChemMatch significantly outperforms recent similar-scale baselines: https://github.com/iriscxy/chemmatch .

Subjects

Subjects :
Chemistry
QD1-999

Details

Language :
English
ISSN :
23993669
Volume :
8
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Communications Chemistry
Publication Type :
Academic Journal
Accession number :
edsdoj.95a5fcaae37240aa86febd91a75a9e08
Document Type :
article
Full Text :
https://doi.org/10.1038/s42004-024-01394-x