1. Discovering Content through Text Mining for a Synthetic Biology Knowledge System.
- Author
-
McInnes BT, Downie JS, Hao Y, Jett J, Keating K, Nakum G, Ranjan S, Rodriguez NE, Tang J, Xiang D, Young EM, and Nguyen MH
- Subjects
- Natural Language Processing, Data Mining methods, Synthetic Biology
- Abstract
Scientific articles contain a wealth of information about experimental methods and results describing biological designs. Due to its unstructured nature and multiple sources of ambiguity and variability, extracting this information from text is a difficult task. In this paper, we describe the development of the synthetic biology knowledge system (SBKS) text processing pipeline. The pipeline uses natural language processing techniques to extract and correlate information from the literature for synthetic biology researchers. Specifically, we apply named entity recognition, relation extraction, concept grounding, and topic modeling to extract information from published literature to link articles to elements within our knowledge system. Our results show the efficacy of each of the components on synthetic biology literature and provide future directions for further advancement of the pipeline.
- Published
- 2022
- Full Text
- View/download PDF