Descriptor: "corpus construction" / Journal: bmc medical informatics & decision making - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"corpus construction"' showing total 3 results

Start Over Descriptor "corpus construction" Journal bmc medical informatics & decision making

3 results on '"corpus construction"'

1. BertSRC: transformer-based semantic relation classification.

Author: Lee, Yeawon, Son, Jinseok, and Song, Min
Abstract: The relationship between biomedical entities is complex, and many of them have not yet been identified. For many biomedical research areas including drug discovery, it is of paramount importance to identify the relationships that have already been established through a comprehensive literature survey. However, manually searching through literature is difficult as the amount of biomedical publications continues to increase. Therefore, the relation classification task, which automatically mines meaningful relations from the literature, is spotlighted in the field of biomedical text mining. By applying relation classification techniques to the accumulated biomedical literature, existing semantic relations between biomedical entities that can help to infer previously unknown relationships are efficiently grasped. To develop semantic relation classification models, which is a type of supervised machine learning, it is essential to construct a training dataset that is manually annotated by biomedical experts with semantic relations among biomedical entities. Any advanced model must be trained on a dataset with reliable quality and meaningful scale to be deployed in the real world and can assist biologists in their research. In addition, as the number of such public datasets increases, the performance of machine learning algorithms can be accurately revealed and compared by using those datasets as a benchmark for model development and improvement. In this paper, we aim to build such a dataset. Along with that, to validate the usability of the dataset as training data for relation classification models and to improve the performance of the relation extraction task, we built a relation classification model based on Bidirectional Encoder Representations from Transformers (BERT) trained on our dataset, applying our newly proposed fine-tuning methodology. In experiments comparing performance among several models based on different deep learning algorithms, our model with the proposed fine-tuning methodology showed the best performance. The experimental results show that the constructed training dataset is an important information resource for the development and evaluation of semantic relation extraction models. Furthermore, relation extraction performance can be improved by integrating our proposed fine-tuning methodology. Therefore, this can lead to the promotion of future text mining research in the biomedical field. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

2. Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine.

Author: Zhang, Tingting, Wang, Yaqiang, Wang, Xiaofeng, Yang, Yafei, and Ye, Ying
Subjects: *CHINESE medicine, *MEDICAL records, *CORPORA, *NAMED-entity recognition
Abstract: Background: In this study, we focus on building a fine-grained entity annotation corpus with the corresponding annotation guideline of traditional Chinese medicine (TCM) clinical records. Our aim is to provide a basis for the fine-grained corpus construction of TCM clinical records in future.Methods: We developed a four-step approach that is suitable for the construction of TCM medical records in our corpus. First, we determined the entity types included in this study through sample annotation. Then, we drafted a fine-grained annotation guideline by summarizing the characteristics of the dataset and referring to some existing guidelines. We iteratively updated the guidelines until the inter-annotator agreement (IAA) exceeded a Cohen's kappa value of 0.9. Comprehensive annotations were performed while keeping the IAA value above 0.9.Results: We annotated the 10,197 clinical records in five rounds. Four entity categories involving 13 entity types were employed. The final fine-grained annotated entity corpus consists of 1104 entities and 67,799 tokens. The final IAAs are 0.936 on average (for three annotators), indicating that the fine-grained entity recognition corpus is of high quality.Conclusions: These results will provide a foundation for future research on corpus construction and named entity recognition tasks in the TCM clinical domain. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

3. Developing a cardiovascular disease risk factor annotated corpus of Chinese electronic medical records.

Author: Jia Su, Bin He, Yi Guan, Jingchi Jiang, Jinfeng Yang, Su, Jia, He, Bin, Guan, Yi, Jiang, Jingchi, and Yang, Jinfeng
Subjects: *CARDIOVASCULAR diseases risk factors, *ELECTRONIC health records, *DATA mining, *PERIODIC health examinations, *ANNOTATIONS, *CARDIOVASCULAR diseases, *INFORMATION retrieval, *NATURAL language processing
Abstract: Background: Cardiovascular disease (CVD) has become the leading cause of death in China, and most of the cases can be prevented by controlling risk factors. The goal of this study was to build a corpus of CVD risk factor annotations based on Chinese electronic medical records (CEMRs). This corpus is intended to be used to develop a risk factor information extraction system that, in turn, can be applied as a foundation for the further study of the progress of risk factors and CVD.Results: We designed a light annotation task to capture CVD risk factors with indicators, temporal attributes and assertions that were explicitly or implicitly displayed in the records. The task included: 1) preparing data; 2) creating guidelines for capturing annotations (these were created with the help of clinicians); 3) proposing an annotation method including building the guidelines draft, training the annotators and updating the guidelines, and corpus construction. Meanwhile, we proposed some creative annotation guidelines: (1) the under-threshold medical examination values were annotated for our purpose of studying the progress of risk factors and CVD; (2) possible and negative risk factors were concerned for the same reason, and we created assertions for annotations; (3) we added four temporal attributes to CVD risk factors in CEMRs for constructing long term variations. Then, a risk factor annotated corpus based on de-identified discharge summaries and progress notes from 600 patients was developed. Built with the help of clinicians, this corpus has an inter-annotator agreement (IAA) F1-measure of 0.968, indicating a high reliability.Conclusion: To the best of our knowledge, this is the first annotated corpus concerning CVD risk factors in CEMRs and the guidelines for capturing CVD risk factor annotations from CEMRs were proposed. The obtained document-level annotations can be applied in future studies to monitor risk factors and CVD over the long term. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

3 results on '"corpus construction"'

1. BertSRC: transformer-based semantic relation classification.

2. Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine.

3. Developing a cardiovascular disease risk factor annotated corpus of Chinese electronic medical records.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

3 results on '"corpus construction"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources