Back to Search
Start Over
Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain
- Source :
- Advances in Distributed Computing and Artificial Intelligence Journal, Vol 1, Iss 1, Pp 1-8 (2013), Advances in Distributed Computing and Artificial Intelligence Journal, Vol 1, Iss 1, Pp 1-8 (2012), Repositório Científico de Acesso Aberto de Portugal, Repositório Científico de Acesso Aberto de Portugal (RCAAP), instacron:RCAAP
- Publication Year :
- 2013
- Publisher :
- Ediciones Universidad de Salamanca, 2013.
-
Abstract
- Scientific publications are the main vehicle to disseminate information in the field of biotechnology for wastewater treatment. Indeed, the new research paradigms and the application of high-throughput technologies have increased the rate of publication considerably. The problem is that manual curation becomes harder, prone-to-errors and time-consuming, leading to a probable loss of information and inefficient knowledge acquisition. As a result, research outputs are hardly reaching engineers, hampering the calibration of mathematical models used to optimize the stability and performance of biotechnological systems. In this context, we have developed a data curation workflow, based on text mining techniques, to extract numerical parameters from scientific literature, and applied it to the biotechnology domain. A workflow was built to process wastewater-related articles with the main goal of identifying physico-chemical parameters mentioned in the text. This work describes the implementation of the workflow, identifies achievements and current limitations in the overall process, and presents the results obtained for a corpus of 50 full-text documents.
- Subjects :
- Text mining
Computer science
Process (engineering)
Context (language use)
02 engineering and technology
Scientific literature
Field (computer science)
lcsh:QA75.5-76.95
030218 nuclear medicine & medical imaging
Domain (software engineering)
03 medical and health sciences
0302 clinical medicine
0202 electrical engineering, electronic engineering, information engineering
General Environmental Science
Science & Technology
Data curation
business.industry
General Engineering
Procedure optimization
Data science
Knowledge acquisition
6. Clean water
Biotechnology
Workflow
General Earth and Planetary Sciences
020201 artificial intelligence & image processing
Biotechnology applications
lcsh:Electronic computers. Computer science
business
Subjects
Details
- Language :
- English
- ISSN :
- 22552863
- Volume :
- 1
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- Advances in Distributed Computing and Artificial Intelligence Journal
- Accession number :
- edsair.doi.dedup.....50312924da885d320bc14f6129b1a5cb