1. Menagerie: A text-mining tool to support animal-human translation in neurodegeneration research
- Author
-
Amanda P. Beck, Dongwook Shin, Charles Sneiderman, Brent C. Vander Wyk, Halil Kilicoglu, Natalie Zatz, and Caroline J. Zeiss
- Subjects
0301 basic medicine ,Biomedical Research ,Computer science ,Psychological intervention ,Disease ,Monkeys ,computer.software_genre ,Biochemistry ,Field (computer science) ,Levodopa ,Translational Research, Biomedical ,0302 clinical medicine ,Outcome Assessment, Health Care ,Medicine and Health Sciences ,Data Mining ,Function (engineering) ,media_common ,Mammals ,Movement Disorders ,Multidisciplinary ,Drugs ,Eukaryota ,Neurodegenerative Diseases ,Parkinson Disease ,Neurochemistry ,Animal Models ,3. Good health ,Neurology ,Experimental Organism Systems ,Vertebrates ,Medicine ,Neurochemicals ,Macaque ,Natural language processing ,Research Article ,Primates ,media_common.quotation_subject ,Science ,Research and Analysis Methods ,External validity ,03 medical and health sciences ,Text mining ,Old World monkeys ,Genetics ,Animals ,Humans ,Generalizability theory ,Animal Models of Disease ,Pharmacology ,business.industry ,Organisms ,Biology and Life Sciences ,030104 developmental biology ,Test set ,Amniotes ,Genetics of Disease ,Animal Studies ,Artificial intelligence ,business ,computer ,030217 neurology & neurosurgery ,Neuroscience - Abstract
Discovery studies in animals constitute a cornerstone of biomedical research, but suffer from lack of generalizability to human populations. We propose that large-scale interrogation of these data could reveal patterns of animal use that could narrow the translational divide. We describe a text-mining approach that extracts translationally useful data from PubMed abstracts. These comprise six modules: species, model, genes, interventions/disease modifiers, overall outcome and functional outcome measures. Existing National Library of Medicine natural language processing tools (SemRep, GNormPlus and the Chemical annotator) underpin the program and are further augmented by various rules, term lists, and machine learning models. Evaluation of the program using a 98-abstract test set achieved F1 scores ranging from 0.75-0.95 across all modules, and exceeded F1 scores obtained from comparable baseline programs. Next, the program was applied to a larger 14,481 abstract data set (2008-2017). Expected and previously identified patterns of species and model use for the field were obtained. As previously noted, the majority of studies reported promising outcomes. Longitudinal patterns of intervention type or gene mentions were demonstrated, and patterns of animal model use characteristic of the Parkinson's disease field were confirmed. The primary function of the program is to overcome low external validity of animal model systems by aggregating evidence across a diversity of models that capture different aspects of a multifaceted cellular process. Some aspects of the tool are generalizable, whereas others are field-specific. In the initial version presented here, we demonstrate proof of concept within a single disease area, Parkinson's disease. However, the program can be expanded in modular fashion to support a wider range of neurodegenerative diseases.
- Published
- 2019