Start Over

Automatic extraction and structuration of soil–environment relationship information from soil survey reports

Authors :: De-sheng WANG
Jun-zhi LIU
A-xing ZHU
Shu WANG
Can-ying ZENG
Tian-wu MA
Source :: Journal of Integrative Agriculture, Vol 18, Iss 2, Pp 328-339 (2019)
Publication Year :: 2019
Publisher :: Elsevier, 2019.
Abstract: In addition to soil samples, conventional soil maps, and experienced soil surveyors, text about soils (e.g., soil survey reports) is an important potential data source for extracting soil–environment relationships. Considering that the words describing soil–environment relationships are often mixed with unrelated words, the first step is to extract the needed words and organize them in a structured way. This paper applies natural language processing (NLP) techniques to automatically extract and structure information from soil survey reports regarding soil–environment relationships. The method includes two steps: (1) construction of a knowledge frame and (2) information extraction using either a rule-based method or a statistic-based method for different types of information. For uniformly written text information, the rule-based approach was used to extract information. These types of variables include slope, elevation, accumulated temperature, annual mean temperature, annual precipitation, and frost-free period. For information contained in text written in diverse styles, the statistic-based method was adopted. These types of variables include landform and parent material. The soil species of China soil survey reports were selected as the experimental dataset. Precision (P), recall (R), and F1-measure (F1) were used to evaluate the performances of the method. For the rule-based method, the P values were 1, the R values were above 92%, and the F1 values were above 96% for all the involved variables. For the method based on the conditional random fields (CRFs), the P, R and F1 values for the parent material were, respectively, 84.15, 83.13, and 83.64%; the values for landform were 88.33, 76.81, and 82.17%, respectively. To explore the impact of text types on the performance of the CRFs-based method, CRFs models were trained and validated separately by the descriptive texts of soil types and typical profiles. For parent material, the maximum F1 value for the descriptive text of soil types was 90.7%, while the maximum F1 value for the descriptive text of soil profiles was only 75%. For landform, the maximum F1 value for the descriptive text of soil types was 85.33%, which was similar to that of the descriptive text of soil profiles (i.e., 85.71%). These results suggest that NLP techniques are effective for the extraction and structuration of soil–environment relationship information from a text data source.

Subjects :: soil–environment relationship
text
natural language processing
extraction
structuration
Agriculture (General)
S1-972

Details

Language :: English
ISSN :: 20953119
Volume :: 18
Issue :: 2
Database :: Directory of Open Access Journals
Journal :: Journal of Integrative Agriculture
Publication Type :: Academic Journal
Accession number :: edsdoj.5019c5ffa9fa47c68fa656ae85368e80
Document Type :: article
Full Text :: https://doi.org/10.1016/S2095-3119(18)62071-4

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Automatic extraction and structuration of soil–environment relationship information from soil survey reports

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Automatic extraction and structuration of soil–environment relationship information from soil survey reports

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources