1. BioNumQA-BERT
- Author
-
Ruibang Luo, Hing-Fung Ting, Tak-Wah Lam, and Ye Wu
- Subjects
Scheme (programming language) ,Language representation ,Source code ,Computer science ,Generalization ,business.industry ,media_common.quotation_subject ,computer.software_genre ,Encoding (memory) ,Question answering ,Leverage (statistics) ,Language model ,Artificial intelligence ,business ,computer ,Natural language processing ,media_common ,computer.programming_language - Abstract
Biomedical question answering (QA) is playing an increasingly significant role in medical knowledge translation. However, current biomedical QA datasets and methods have limited capacity, as they commonly neglect the role of numerical facts in biomedical QA. In this paper, we constructed BioNumQA, a novel biomedical QA dataset that answers research questions using relevant numerical facts for biomedical QA model training and testing. To leverage the new dataset, we designed a new method called BioNumQA-BERT by introducing a novel numerical encoding scheme into the popular biomedical language model BioBERT to represent the numerical values in the input text. Our experiments show that BioNumQA-BERT significantly outperformed other state-of-art models, including DrQA, BERT and BioBERT (39.0% vs 29.5%, 31.3% and 33.2%, respectively, in strict accuracy). To improve the generalization ability of BioNumQA-BERT, we further pretrained it on a large biomedical text corpus and achieved 41.5% strict accuracy. BioNumQA and BioNumQA-BERT establish a new baseline for biomedical QA. The dataset, source codes and pretrained model of BioNumQA-BERT are available at https://github.com/LeaveYeah/BioNumQA-BERT.
- Published
- 2021
- Full Text
- View/download PDF