Start Over

Testing QA Systems’ ability in Processing Synonym Commonsense Knowledge

Authors :: Gongqi Lin
Khandakar Ahmed
Yuan Miao
Bijay Sigdel
Source :: IV
Publication Year :: 2020
Publisher :: IEEE, 2020.
Abstract: 'Synonym' is an imperative instrument of commonsense knowledge that we apply to make a good sense and sound judgement of our reading. To investigate the ability of machine comprehension models in handling the synonym commonsense knowledge, we developed an innovative approach to automatically generate a dataset based on the Stanford Question Answering Dataset (SQuAD 2.0). The brand-new dataset consists of additional distracting sentences or questions spawned using synonym commonsense knowledge. We formulated new questions by replacing noun entities of the original ones in SQuAD 2.0 with their synonyms. This approach followed the two fundamental principles of SQuAD 2.0 dataset: relevancy and plausibility (incorrect answers are more challenging if they are relevant and plausible). It improves the robustness/abstraction of the question set. To improve the synonym selection strategy in Word Sense Disambiguation (WSD) problem, we designed a new algorithm Multiple Source Adapted Lesk Algorithm (MSALA). Rather than only using WordNet as the source of gloss for adapted Lesk algorithm, we used both lexical database WordNet and commonsense database ConceptNet. This fusion provides a rich hierarchy of semantic relations for the MSALA algorithm. Using this method, we devised 11,000 questions and evaluated the performance of the state-of-the-art question answering system-BERT. Our result shows that the accuracy of the contemporary BERT-Base model dropped from 74.98% to 63.24%. This 10+% accuracy drop revealed the limitations of BERT in handling synonym commonsense knowledge.

Subjects :: Commonsense knowledge
Computer science
business.industry
WordNet
computer.software_genre
Semantics
Lexical database
Knowledge extraction
Noun
Question answering
Artificial intelligence
business
computer
Natural language processing
Abstraction (linguistics)

Details

Database :: OpenAIRE
Journal :: 2020 24th International Conference Information Visualisation (IV)
Accession number :: edsair.doi...........cfb3e6e2335e6bc3b6793c03c8f4c79d
Full Text :: https://doi.org/10.1109/iv51561.2020.00059