Descriptor: "BERT" / Region: china - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"BERT"' showing total 24 results

Start Over Descriptor "BERT" Region china

24 results on '"BERT"'

1. The Impact of Data Preparation and Model Complexity on the Natural Language Classification of Chinese News Headlines.

Author: Wagner, Torrey, Guhl, Dennis, and Langhals, Brent
Subjects: *NATURAL languages, *LANGUAGE models, *LINGUISTIC complexity, *CHINESE language, *HEADLINES, *NATURAL language processing
Abstract: Given the emergence of China as a political and economic power in the 21st century, there is increased interest in analyzing Chinese news articles to better understand developing trends in China. Because of the volume of the material, automating the categorization of Chinese-language news articles by headline text or titles can be an effective way to sort the articles into categories for efficient review. A 383,000-headline dataset labeled with 15 categories from the Toutiao website was evaluated via natural language processing to predict topic categories. The influence of six data preparation variations on the predictive accuracy of four algorithms was studied. The simplest model (Naïve Bayes) achieved 85.1% accuracy on a holdout dataset, while the most complex model (Neural Network using BERT) demonstrated 89.3% accuracy. The most useful data preparation steps were identified, and another goal examined the underlying complexity and computational costs of automating the categorization process. It was discovered the BERT model required 170x more time to train, was slower to predict by a factor of 18,600, and required 27x more disk space to save, indicating it may be the best choice for low-volume applications when the highest accuracy is needed. However, for larger-scale operations where a slight performance degradation is tolerated, the Naïve Bayes algorithm could be the best choice. Nearly one in four records in the Toutiao dataset are duplicates, and this is the first published analysis with duplicates removed. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Automated Scoring of Translations with BERT Models: Chinese and English Language Case Study.

Author: Cui, Yizhuo and Liang, Maocheng
Subjects: DEEP learning, LANGUAGE models, CHINESE language, ENGLISH language, TRANSLATING & interpreting, ARTIFICIAL intelligence
Abstract: With the wide application of artificial intelligence represented by deep learning in natural language-processing tasks, the automated scoring of translations has also advanced and improved. This study aims to determine if the BERT-assist system can reliably assess translation quality and identify high-quality translations for potential recognition. It takes the Han Suyin International Translation Contest as a case study, which is a large-scale and influential translation contest in China, with a history of over 30 years. The experimental results show that the BERT-assist system is a reliable second rater for massive translations in terms of translation quality, as it can effectively sift out high-quality translations with a reliability of r = 0.9 or higher. Thus, the automated translation scoring system based on BERT can satisfactorily predict the ranking of translations according to translation quality and sift out high-quality translations potentially shortlisted for prizes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Using pre-trained models and graph convolution networks to find the causal relations among events in the Chinese financial text data.

Author: Hu, Kai, Li, Qing, Xie, Jie, Pu, Yingyan, and Guo, Ya
Subjects: CHINESE language, PARSING (Computer grammar), SOFTWARE as a service, SPACE exploration, RESEARCH personnel, CONTRACTING out, RANDOM graphs
Abstract: Nowadays, information explosion happens in every field. In the stock market of China, automatically understanding the market dynamics is extremely important. However, the information and datasets in Chinese are overwhelming for researchers in the field. How to extract useful information and understand the underlying logic in the Chinese corpus are the research hotspot. Causal relation identification is one of the most central tasks. Many works have made important progress in finding the causal relations in open-domain text, however, there is still space for further explorations in the specific domain of the financial field. In this paper, we propose to use the graph convolution network (GCN) to help represent the dependency relations among the entities in the logical networks provided by the Chinese dependency parsing tool, language technology platform(LTP). The motivation for using the GCN method to help represent the dependency relations is that the causal relations are highly correlated with language structures. Besides, we also choose to use the domain-specific pre-trained model FinBERT because this pre-trained model is specific to the financial field. Results show that the GCN-based method and pertained models of FinBERT in our proposed model play a key role in outperforming the baseline model of the traditional sequential labeling method and the start of art method from F1 of 0.4573 and 0.5506 to 0.6254. Our approach also wins third place in the Eastern District of software service outsourcing competition in China in the year 2021. We believe the proposed methods can contribute as at least an alternative option in future relation extraction tasks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. CnGeoPLM: Contextual knowledge selection and embedding with pretrained language representation model for the geoscience domain.

Author: Ma, Kai, Zheng, Shuai, Tian, Miao, Qiu, Qinjun, Tan, Yongjian, Hu, Xinxin, Li, HaiYan, and Xie, Zhong
Subjects: *LANGUAGE models, *NATURAL language processing, *KNOWLEDGE graphs
Abstract: Several recent efforts have been devoted to enhancing pretrained language models (PLMs) by utilizing extra heterogeneous knowledge in knowledge graphs (KGs) and have achieved consistent improvements on various knowledge-driven natural language processing (NLP) tasks. However, most of these knowledge-enhanced PLMs, regardless of the domain-specific knowledge required by PLMs, lack a Chinese corpus to support large-scale PLMs, especially in domain-specific PLMs. Herein, we harvest and process public geological reports (mainly various reports from the National Geological Archives of China), abstracts of papers from 34 geological journals, and abstracts of geological dissertations from 161 universities or scientific institutions. This paper introduces a large-scale Chinese geological corpus (GeoCorpus) containing approximately 232 MB of training data with a total of 243 million characters. Based on GeoCorpus, we also introduce a Chinese pretrained language representation model in the geological domain (CnGeoPLM). The model utilizes the generic BERT to complete parameter initialization and combines the GeoCorpus for the second stage of training. We evaluate and test CnGeoPLM and Bidirectional Encoder Representations from Transformers (BERT) on three tasks of geological named entity recognition (GeoNER), geological entity relationship extraction (GeoRE), and geological entity clustering (GeoClu). In the results of both the GeoNER and GeoRE experiments, CnGeoPLM has different degrees of improvement in evaluation metrics compared to BERT. In the GeoClu experiment, CnGeoPLM is significantly more effective and discriminative in entity clustering effect compared to BERT. These results show that CnGeoPLM has a stronger representation ability in geological NLP tasks than BERT. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

5. Automatic medical specialty classification based on patients' description of their symptoms.

Author: Mao, Chao, Zhu, Quanjing, Chen, Rong, and Su, Weifeng
Subjects: *MEDICAL coding, *MEDICAL specialties & specialists
Abstract: In China, patients usually determine their medical specialty before they register the corresponding specialists in the hospitals. This process usually requires a lot of medical knowledge for the patients. As a result, many patients do not register the correct specialty for the first time if they do not receive help from the hospitals. In this study, we try to automatically direct the patients to the appropriate specialty based on the symptoms they described. As far as we know, this is the first study to solve the problem. We propose a neural network-based model based on a hybrid model integrated with an attention mechanism. To prove the actual effect of this hybrid model, we utilized a data set of more than 40,000 items, including eight departments, such as Otorhinolaryngology, Pediatrics, and other common departments. The experiment results show that the hybrid model achieves more than 93.5% accuracy and has a high generalization capacity, which is superior to traditional classification models. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

6. GeoBERT: Pre-Training Geospatial Representation Learning on Point-of-Interest.

Author: Gao, Yunfan, Xiong, Yun, Wang, Siqi, and Wang, Haofen
Subjects: ARTIFICIAL intelligence, NATURAL language processing, HOME prices, SURGICAL gloves, INFORMATION technology
Abstract: Thanks to the development of geographic information technology, geospatial representation learning based on POIs (Point-of-Interest) has gained widespread attention in the past few years. POI is an important indicator to reflect urban socioeconomic activities, widely used to extract geospatial information. However, previous studies often focus on a specific area, such as a city or a district, and are designed only for particular tasks, such as land-use classification. On the other hand, large-scale pre-trained models (PTMs) have recently achieved impressive success and become a milestone in artificial intelligence (AI). Against this background, this study proposes the first large-scale pre-training geospatial representation learning model called GeoBERT. First, we collect about 17 million POIs in 30 cities across China to construct pre-training corpora, with 313 POI types as the tokens and the level-7 Geohash grids as the basic units. Second, we pre-train GeoEBRT to learn grid embedding in self-supervised learning by masking the POI type and then predicting. Third, under the paradigm of "pre-training + fine-tuning", we design five practical downstream tasks. Experiments show that, with just one additional output layer fine-tuning, GeoBERT outperforms previous NLP methods (Word2vec, GloVe) used in geospatial representation learning by 9.21% on average in F1-score for classification tasks, such as store site recommendation and working/living area prediction. For regression tasks, such as POI number prediction, house price prediction, and passenger flow prediction, GeoBERT demonstrates greater performance improvements. The experiment results prove that pre-training on large-scale POI data can significantly improve the ability to extract geospatial information. In the discussion section, we provide a detailed analysis of what GeoBERT has learned from the perspective of attention mechanisms. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

7. An Information Retrieval-Based Joint System for Complex Chinese Knowledge Graph Question Answering.

Author: Xiao, Yuliang, Zhang, Lijuan, Huang, Jie, Zhang, Lei, and Wan, Jian
Subjects: KNOWLEDGE graphs, CHINESE language, SEMANTIC computing, KNOWLEDGE base, INFORMATION retrieval
Abstract: Knowledge graph-based question answering is an intelligent approach to deducing the answer to a natural language question from structured knowledge graph information. As one of the mainstream knowledge graph-based question answering approaches, information retrieval-based methods infer the correct answer by constructing and ranking candidate paths, which achieve excellent performance in simple questions but struggle to handle complex questions due to rich entity information and diverse relations. In this paper, we construct a joint system with three subsystems based on the information retrieval methods, where candidate paths can be efficiently generated and ranked, and a new text-matching method is introduced to capture the semantic correlation between questions and candidate paths. Results of the experiment conducted on the China Conference on Knowledge Graph and Semantic Computing 2019 Chinese Knowledge Base Question Answering dataset verify the superiority and efficiency of our approach. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

8. A Comparative Study of Natural Language Processing Algorithms Based on Cities Changing Diabetes Vulnerability Data.

Author: Wang, Siting, Song, Fuman, Qiao, Qinqun, Liu, Yuanyuan, Chen, Jiageng, and Ma, Jun
Subjects: NATURAL language processing, THEMATIC analysis, TYPE 2 diabetes, DIABETES
Abstract: (1) Background: Poor adherence to management behaviors in Chinese Type 2 diabetes mellitus (T2DM) patients leads to an uncontrolled prognosis of diabetes, which results in significant economic costs for China. It is imperative to quickly locate vulnerability factors in the management behavior of patients with T2DM. (2) Methods: In this study, a thematic analysis of the collected interview materials was conducted to construct the themes of T2DM management vulnerability. We explored the applicability of the pre-trained models based on the evaluation metrics in text classification. (3) Results: We constructed 12 themes of vulnerability related to the health and well-being of people with T2DM in Tianjin. We considered that Bidirectional Encoder Representation from Transformers (BERT) performed better in this Natural Language Processing (NLP) task with a shorter completion time. With the splitting ratio of 6:3:1 and batch size of 64 for BERT, the test accuracy was 97.71%, the completion time was 10 min 24 s, and the macro-F1 score was 0.9752. (4) Conclusions: Our results proved the applicability of NLP techniques in this specific Chinese-language medical environment. We filled the knowledge gap in the application of NLP technologies in diabetes management. Our study provided strong support for using NLP techniques to rapidly locate vulnerability factors in T2DM management. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

9. A novel deep learning approach to extract Chinese clinical entities for lung cancer screening and staging.

Author: Huanyao Zhang, Danqing Hu, Huilong Duan, Shaolei Li, Nan Wu, Xudong Lu, Zhang, Huanyao, Hu, Danqing, Duan, Huilong, Li, Shaolei, Wu, Nan, and Lu, Xudong
Subjects: *RESEARCH, *RESEARCH methodology, *LUNG tumors, *EARLY detection of cancer, *MEDICAL cooperation, *EVALUATION research, *COMPARATIVE studies, *RESEARCH funding, *ALGORITHMS
Abstract: Background: Computed tomography (CT) reports record a large volume of valuable information about patients' conditions and the interpretations of radiology images from radiologists, which can be used for clinical decision-making and further academic study. However, the free-text nature of clinical reports is a critical barrier to use this data more effectively. In this study, we investigate a novel deep learning method to extract entities from Chinese CT reports for lung cancer screening and TNM staging.Methods: The proposed approach presents a new named entity recognition algorithm, namely the BERT-based-BiLSTM-Transformer network (BERT-BTN) with pre-training, to extract clinical entities for lung cancer screening and staging. Specifically, instead of traditional word embedding methods, BERT is applied to learn the deep semantic representations of characters. Following the long short-term memory layer, a Transformer layer is added to capture the global dependencies between characters. Besides, pre-training technique is employed to alleviate the problem of insufficient labeled data.Results: We verify the effectiveness of the proposed approach on a clinical dataset containing 359 CT reports collected from the Department of Thoracic Surgery II of Peking University Cancer Hospital. The experimental results show that the proposed approach achieves an 85.96% macro-F1 score under exact match scheme, which improves the performance by 1.38%, 1.84%, 3.81%,4.29%,5.12%,5.29% and 8.84% compared to BERT-BTN, BERT-LSTM, BERT-fine-tune, BERT-Transformer, FastText-BTN, FastText-BiLSTM and FastText-Transformer, respectively.Conclusions: In this study, we developed a novel deep learning method, i.e., BERT-BTN with pre-training, to extract the clinical entities from Chinese CT reports. The experimental results indicate that the proposed approach can efficiently recognize various clinical entities about lung cancer screening and staging, which shows the potential for further clinical decision-making and academic research. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

10. Investigation of causal public opinion indexes for price fluctuation in vegetable marketing.

Author: Li, Youzhu, Yao, Jinyu, Song, Jingjing, Feng, Yixin, Dong, Heng, Zhao, Jingliang, Lian, Yulong, Shi, Feng, and Xia, Jingbo
Subjects: *PRICE fluctuations, *PUBLIC opinion, *PRICE indexes, *PRODUCE markets, *PRICES
Abstract: Though it is believed that public opinion helps to predict vegetable prices, an intelligent scheme for predicting vegetable prices by online public opinion is missing. This research proposes an intelligent vegetable price prediction model, TPCDR, which includes five modules, i.e., topic matching, public opinion computing, causing index selection, de-noise, and regression of price prediction. To investigate the correlation between vegetable price and public opinion in a fine-grained manner, this research considers four types of economic causing indexes, i.e., natural environment, supply, demand, and economic policy. TPCDR model integrates Granger causal testing and linear regression and discovers the significant causal factors raised from the natural environment and supply. Meanwhile, through vegetable price forecasting experiments, our model outperforms other baseline methods. This result suggests that the TPCDR pipeline is reasonable and reliable to model the association between public opinion and vegetable price. • An intelligent model for the association between public opinion and vegetable price. • Four types of economic causing indexes considered. Causal relation obtained. • Constructing a corpus for online public opinion with vegetable prices in China. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. A BERT Fine-tuning Model for Targeted Sentiment Analysis of Chinese Online Course Reviews.

Author: Zhang, Huibing, Dong, Junchao, Min, Liang, and Bi, Peng
Subjects: *SENTIMENT analysis, *ONLINE education, *CONSUMERS' reviews, *RANDOM fields, *USER-generated content
Abstract: Accurate analysis of targeted sentiment in online course reviews helps in understanding emotional changes of learners and improving the course quality. In this paper, we propose a fine-tuned bidirectional encoder representation from transformers (BERT) model for targeted sentiment analysis of course reviews. Specifically, it consists of two parts: binding corporate rules — conditional random field (BCR-CRF) target extraction model and a binding corporate rules — double attention (BCR-DA) target sentiment analysis model. Firstly, based on a large-scale Chinese review corpus, intra-domain unsupervised training of a BERT pre-trained model (BCR) is performed. Then, a Conditional Random Field (CRF) layer is introduced to add grammatical constraints to the output sequence of the semantic representation layer in the BCR model. Finally, a BCR-DA model containing double attention layers is constructed to express the sentiment polarity of the course review targets in a classified manner. Experiments are performed on Chinese online course review datasets of China MOOC. The experimental results show that the F1 score of the BCR-CRF model reaches above 92%, and the accuracy of the BCR-DA model reaches above 72%. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

12. Research on the identification and evolution of health industry policy instruments in China.

Author: Jin J and Du H
Subjects: China, Manufacturing Industry, Environmental Policy, Health Policy, Industry
Abstract: The application of health industry policies could be discovered more quickly and comprehensively through the automated identification of policy tools, which could provide references for the formulation, implementation, and optimization of subsequent policies in each province. This study applies the Bidirectional Encoder Representation from Transformer (BERT) model to identify policy tools automatically, utilizes Focal Loss to reduce the unbalance of a dataset, and analyzes the evolution of policy tools in each province, which contains time, space, and topic. The research demonstrates that the BERT model can improve the accuracy of classification, that supply and environment policy tools are more prevalent than demand tools, and that policy instruments are organized similarly in four major economic regions. Moreover, the policy's attention to topics related to healthcare, medicine, and pollution has gradually shifted to other topics, and the extent of policy attention continues to be concentrated on the health service industry, with less attention paid to the manufacturing industry from the keywords of the various topics., Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest., (Copyright © 2024 Jin and Du.)
Published: 2024
Full Text: View/download PDF

13. Stacking-BERT model for Chinese medical procedure entity normalization.

Author: Li L, Zhai Y, Gao J, Wang L, Hou L, and Zhao J
Subjects: Humans, China, Language, Semantics, Electronic Health Records
Abstract: Medical procedure entity normalization is an important task to realize medical information sharing at the semantic level; it faces main challenges such as variety and similarity in real-world practice. Although deep learning-based methods have been successfully applied to biomedical entity normalization, they often depend on traditional context-independent word embeddings, and there is minimal research on medical entity recognition in Chinese Regarding the entity normalization task as a sentence pair classification task, we applied a three-step framework to normalize Chinese medical procedure terms, and it consists of dataset construction, candidate concept generation and candidate concept ranking. For dataset construction, external knowledge base and easy data augmentation skills were used to increase the diversity of training samples. For candidate concept generation, we implemented the BM25 retrieval method based on integrating synonym knowledge of SNOMED CT and train data. For candidate concept ranking, we designed a stacking-BERT model, including the original BERT-based and Siamese-BERT ranking models, to capture the semantic information and choose the optimal mapping pairs by the stacking mechanism. In the training process, we also added the tricks of adversarial training to improve the learning ability of the model on small-scale training data. Based on the clinical entity normalization task dataset of the 5th China Health Information Processing Conference, our stacking-BERT model achieved an accuracy of 93.1%, which outperformed the single BERT models and other traditional deep learning models. In conclusion, this paper presents an effective method for Chinese medical procedure entity normalization and validation of different BERT-based models. In addition, we found that the tricks of adversarial training and data augmentation can effectively improve the effect of the deep learning model for small samples, which might provide some useful ideas for future research.
Published: 2023
Full Text: View/download PDF

14. An imConvNet-based deep learning model for Chinese medical named entity recognition.

Author: Zheng Y, Han Z, Cai Y, Duan X, Sun J, Yang W, and Huang H
Subjects: Humans, Language, Data Mining, China, Deep Learning, Names
Abstract: Background: With the development of current medical technology, information management becomes perfect in the medical field. Medical big data analysis is based on a large amount of medical and health data stored in the electronic medical system, such as electronic medical records and medical reports. How to fully exploit the resources of information included in these medical data has always been the subject of research by many scholars. The basis for text mining is named entity recognition (NER), which has its particularities in the medical field, where issues such as inadequate text resources and a large number of professional domain terms continue to face significant challenges in medical NER., Methods: We improved the convolutional neural network model (imConvNet) to obtain additional text features. Concurrently, we continue to use the classical Bert pre-training model and BiLSTM model for named entity recognition. We use imConvNet model to extract additional word vector features and improve named entity recognition accuracy. The proposed model, named BERT-imConvNet-BiLSTM-CRF, is composed of four layers: BERT embedding layer-getting word embedding vector; imConvNet layer-capturing the context feature of each character; BiLSTM (Bidirectional Long Short-Term Memory) layer-capturing the long-distance dependencies; CRF (Conditional Random Field) layer-labeling characters based on their features and transfer rules., Results: The average F1 score on the public medical data set yidu-s4k reached 91.38% when combined with the classical model; when real electronic medical record text in impacted wisdom teeth is used as the experimental object, the model's F1 score is 93.89%. They all show better results than classical models., Conclusions: The suggested novel model (imConvNet) significantly improves the recognition accuracy of Chinese medical named entities and applies to various medical corpora., (© 2022. The Author(s).)
Published: 2022
Full Text: View/download PDF

15. Revealing Public Opinion towards the COVID-19 Vaccine with Weibo Data in China: BertFDA-Based Model.

Author: Zhu J, Weng F, Zhuang M, Lu X, Tan X, Lin S, and Zhang R
Subjects: Humans, COVID-19 Vaccines, Pandemics prevention & control, Public Opinion, China epidemiology, COVID-19 epidemiology, COVID-19 prevention & control, Social Media
Abstract: The COVID-19 pandemic has created unprecedented burdens on people's health and subjective well-being. While countries around the world have established models to track and predict the affective states of COVID-19, identifying the topics of public discussion and sentiment evolution of the vaccine, particularly the differences in topics of concern between vaccine-support and vaccine-hesitant groups, remains scarce. Using social media data from the two years following the outbreak of COVID-19 (23 January 2020 to 23 January 2022), coupled with state-of-the-art natural language processing (NLP) techniques, we developed a public opinion analysis framework (BertFDA). First, using dynamic topic clustering on Weibo through the latent Dirichlet allocation (LDA) model, a total of 118 topics were generated in 24 months using 2,211,806 microblog posts. Second, by building an improved Bert pre-training model for sentiment classification, we provide evidence that public negative sentiment continued to decline in the early stages of COVID-19 vaccination. Third, by modeling and analyzing the microblog posts from the vaccine-support group and the vaccine-hesitant group, we discover that the vaccine-support group was more concerned about vaccine effectiveness and the reporting of news, reflecting greater group cohesion, whereas the vaccine-hesitant group was particularly concerned about the spread of coronavirus variants and vaccine side effects. Finally, we deployed different machine learning models to predict public opinion. Moreover, functional data analysis (FDA) is developed to build the functional sentiment curve, which can effectively capture the dynamic changes with the explicit function. This study can aid governments in developing effective interventions and education campaigns to boost vaccination rates.
Published: 2022
Full Text: View/download PDF

16. Disease prediction based on multi-type data fusion from Chinese electronic health record.

Author: Liang Z, Zhang Z, Chen H, and Zhang Z
Subjects: Humans, China, Delivery of Health Care, Hospitals, Electronic Health Records
Abstract: Disease prediction by using a variety of healthcare data to assist doctors in disease diagnosis is becoming a more and more important research topic recently. This paper proposes a disease prediction model that fuses multiple types of encoded representations of Chinese electronic health records (EHRs). The model framework utilizes a multi-head self-attention mechanism, which combines textual and numerical features to enhance text representations. The BiLSTM-CRF and TextCNN models are used, respectively, to extract entities and then obtain the embedding representations of them. The representations of text and entities in it are combined together for formulating representations of EHRs. The experimental results on EHRs data collected from a Three Grade Class B Hospital General in Gansu Province, China, show that our model achieved an F1 score of 91.92%, which outperforms the previous baseline methods.
Published: 2022
Full Text: View/download PDF

17. SiBERT: A Siamese-based BERT network for Chinese medical entities alignment.

Author: Ma Z, Zhao L, Li J, Xu X, and Li J
Subjects: China, Electronic Health Records, Semantics, Vocabulary, Controlled, Algorithms, Deep Learning
Abstract: Entity alignment aims at associating semantically similar entities in knowledge graphs from different sources. It is widely used in the integration and construction of professional medical knowledge. The existing deep learning methods lack term-level embedding representation, which limits the performance of entity alignment and causes a massive computational overhead. To address these problems, we propose a Siamese-based BERT (SiBERT) for Chinese medical entities alignment. SiBERT generates term-level embedding based on word embedding sequences to enhance the features of entities in similarity calculation. The process of entity alignment contains three steps. Specifically, the SiBERT is firstly pre-trained with synonym dictionary in the public domain, and transferred to the task of medical entity alignment. Secondly, four different categories of entities (disease, symptom, treatment, and examination) are labeled based on the standard terms selected from standard terms dataset. The entities and their standard terms form term pairs to train SiBERT. Finally, combined with the entity alignment algorithm, the most similar standard term is selected as the final result. To evaluate the effectiveness of our method, we conduct extensive experiments on real-world datasets. The experimental results illustrate that SiBERT network is superior to other compared algorithms both in alignment accuracy and computational efficiency., (Copyright © 2022 Elsevier Inc. All rights reserved.)
Published: 2022
Full Text: View/download PDF

18. Research on the Construction of Typhoon Disaster Chain Based on Chinese Web Corpus.

Author: Liu, Hongliang, Luo, Nianxue, and Zhao, Qiansheng
Subjects: TYPHOONS, HISTORICAL literacy, EMERGENCY management, DISASTERS, DATA mining, MULTISENSOR data fusion
Abstract: China is one of the countries most affected by typhoon disasters. It is of great significance to study the mechanism of typhoon disasters and construct a typhoon disaster chain for emergency management and disaster reduction. The evolution process of typhoon disaster based on expert knowledge and historical disaster data has been summarized in previous studies, which relied too much on artificial experience while less in-depth consideration was given to the disaster exposure, the social environment, as well as the spatio-temporal factors. Hence, problems, such as incomplete content and inconsistent expression of typhoon disaster knowledge, have arisen. With the development of computer technology, massive Web corpus with numerous Web news and various improvised content on the social media platform, and ontology that enables consistent expression new light has been shed on the knowledge discovery of typhoon disaster. With the Chinese Web corpus as its source, this research proposes a method to construct a typhoon disaster chain so as to obtain disaster information more efficiently, explore the spatio-temporal trends of disasters and their impact on human society, and then comprehensively comprehend the process of typhoon disaster. First, a quintuple structure (Concept, Property, Relationship, Rule and Instance) is used to design the Typhoon Disaster Chain Ontology Model (TDCOM) which contains the elements involved in a typhoon disaster. Then, the information extraction process, regarded as a sequence labeling task in the present study, is combined with the BERT model so as to extract typhoon event-elements from the customized corpus. Finally, taking Typhoon Mangkhut as an example, the typical typhoon disaster chain is constructed by data fusion and structured expression. The results show that the methods presented in this research can provide scientific support for analyzing the evolution process of typhoon disasters and their impact on human society. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

19. Use of BERT (Bidirectional Encoder Representations from Transformers)-Based Deep Learning Method for Extracting Evidences in Chinese Radiology Reports: Development of a Computer-Aided Liver Cancer Diagnosis Framework.

Author: Liu, Honglei, Zhang, Zhiqiang, Xu, Yan, Wang, Ni, Huang, Yanqun, Yang, Zhenghan, Jiang, Rui, and Chen, Hui
Subjects: LIVER tumors, NATURAL language processing, INFORMATION retrieval, COMPUTER-aided diagnosis, MEDICAL specialties & specialists
Abstract: Background: Liver cancer is a substantial disease burden in China. As one of the primary diagnostic tools for detecting liver cancer, dynamic contrast-enhanced computed tomography provides detailed evidences for diagnosis that are recorded in free-text radiology reports.Objective: The aim of our study was to apply a deep learning model and rule-based natural language processing (NLP) method to identify evidences for liver cancer diagnosis automatically.Methods: We proposed a pretrained, fine-tuned BERT (Bidirectional Encoder Representations from Transformers)-based BiLSTM-CRF (Bidirectional Long Short-Term Memory-Conditional Random Field) model to recognize the phrases of APHE (hyperintense enhancement in the arterial phase) and PDPH (hypointense in the portal and delayed phases). To identify more essential diagnostic evidences, we used the traditional rule-based NLP methods for the extraction of radiological features. APHE, PDPH, and other extracted radiological features were used to design a computer-aided liver cancer diagnosis framework by random forest.Results: The BERT-BiLSTM-CRF predicted the phrases of APHE and PDPH with an F1 score of 98.40% and 90.67%, respectively. The prediction model using combined features had a higher performance (F1 score, 88.55%) than those using APHE and PDPH (84.88%) or other extracted radiological features (83.52%). APHE and PDPH were the top 2 essential features for liver cancer diagnosis.Conclusions: This work was a comprehensive NLP study, wherein we identified evidences for the diagnosis of liver cancer from Chinese radiology reports, considering both clinical knowledge and radiology findings. The BERT-based deep learning method for the extraction of diagnostic evidence achieved state-of-the-art performance. The high performance proves the feasibility of the BERT-BiLSTM-CRF model in information extraction from Chinese radiology reports. The findings of our study suggest that the deep learning-based method for automatically identifying evidences for diagnosis can be extended to other types of Chinese clinical texts. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

20. A tag based joint extraction model for Chinese medical text.

Author: Liu X, Liu Y, Wu H, and Guan Q
Subjects: China, Humans, Medicine, Chinese Traditional, Models, Molecular, Medical Informatics
Abstract: Information extraction in medical field is an important method to structure medical knowledge and discover new knowledge. Traditional methods handle this task in a pipelined manner regarding the entity recognition and relation extraction as two sub-tasks, which, however, neglects the relevance between the two of them. In recent years, the research on the joint extraction model has achieved encouraging results in the general field, yet scholarship focusing on the joint extraction model applied to medical field is insufficient. In this paper, we construct a joint extraction model based on tagging scheme for Chinese medical texts. Firstly, we design a series of pretreatment procedures for Chinese medical data to obtain effective Chinese word sequence. Then, we propose the BIOH12D1D2 tagging scheme to convert the joint extraction task into a tagging problem and to solve the overlapping entity problem. After that, we use the encoder-decoder model to obtain the tag prediction sequence. And in decoding layer, the Bert pre-training model is adopted to extract token features to enhance the feature representation ability of our model. Lastly, the joint extraction model gains a F1 value by 0.7 on CHIP-2020, which increases by 0.364 compared with the baseline., (Copyright © 2021 Elsevier Ltd. All rights reserved.)
Published: 2021
Full Text: View/download PDF

21. A novel deep learning approach to extract Chinese clinical entities for lung cancer screening and staging.

Author: Zhang H, Hu D, Duan H, Li S, Wu N, and Lu X
Subjects: Algorithms, China, Early Detection of Cancer, Humans, Deep Learning, Lung Neoplasms diagnostic imaging
Abstract: Background: Computed tomography (CT) reports record a large volume of valuable information about patients' conditions and the interpretations of radiology images from radiologists, which can be used for clinical decision-making and further academic study. However, the free-text nature of clinical reports is a critical barrier to use this data more effectively. In this study, we investigate a novel deep learning method to extract entities from Chinese CT reports for lung cancer screening and TNM staging., Methods: The proposed approach presents a new named entity recognition algorithm, namely the BERT-based-BiLSTM-Transformer network (BERT-BTN) with pre-training, to extract clinical entities for lung cancer screening and staging. Specifically, instead of traditional word embedding methods, BERT is applied to learn the deep semantic representations of characters. Following the long short-term memory layer, a Transformer layer is added to capture the global dependencies between characters. Besides, pre-training technique is employed to alleviate the problem of insufficient labeled data., Results: We verify the effectiveness of the proposed approach on a clinical dataset containing 359 CT reports collected from the Department of Thoracic Surgery II of Peking University Cancer Hospital. The experimental results show that the proposed approach achieves an 85.96% macro-F1 score under exact match scheme, which improves the performance by 1.38%, 1.84%, 3.81%,4.29%,5.12%,5.29% and 8.84% compared to BERT-BTN, BERT-LSTM, BERT-fine-tune, BERT-Transformer, FastText-BTN, FastText-BiLSTM and FastText-Transformer, respectively., Conclusions: In this study, we developed a novel deep learning method, i.e., BERT-BTN with pre-training, to extract the clinical entities from Chinese CT reports. The experimental results indicate that the proposed approach can efficiently recognize various clinical entities about lung cancer screening and staging, which shows the potential for further clinical decision-making and academic research., (© 2021. The Author(s).)
Published: 2021
Full Text: View/download PDF

22. A Coarse-to-Fine Model for Geolocating Chinese Addresses.

Author: Qian, Chunyao, Yi, Chao, Cheng, Chengqi, Pu, Guoliang, and Liu, Jiashu
Subjects: *DELIVERY of goods
Abstract: Address geolocation aims to associate address texts to the geographic locations. In China, due to the increasing demand for LBS applications such as take-out services and express delivery, automatically geolocating the unstructured address information is the key issue that needs to be solved first. Recently, a few approaches have been proposed to automate the address geolocation by directly predicting geographic coordinates. However, such point-based methods ignore the hierarchy information in addresses which may cause poor geolocation performance. In this paper, we propose a hierarchical region-based approach for geolocating Chinese addresses. We model the address geolocation as a Sequence-to-Sequence (Seq2Seq) learning task, that is, the input sequence is a textual address, and the output sequence is a GeoSOT grid code which exactly represents multi-level regions covered by the address. A novel coarse-to-fine model, which combines BERT and LSTM, is designed to learn the task. The experimental results demonstrate that our model correctly understands the Chinese addresses and achieves the highest geolocation accuracy among all the baselines. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

23. Exploration of text matching methods in Chinese disease Q&A systems: A method using ensemble based on BERT and boosted tree models.

Author: Wu Z, Liang J, Zhang Z, and Lei J
Subjects: China, Language, Natural Language Processing
Abstract: Background: Text matching is one of the basic tasks in the field of natural language processing. Owing to the particularity of Chinese language and medical texts, text matching has greater application and research value in the medical field. In 2019, at the China Health Information Processing Conference (CHIP), 30,000 sets of real disease Q&A data in Chinese on diabetes, hypertension, hepatitis B, AIDS, and breast cancer were released for public evaluation. A total of 90 teams participated in the evaluation., Purpose: To explore the best method of text matching of Chinese medical Q&A data by participating in an evaluation competition., Method: After analyzing the Chinese medical Q&A data provided by the competition, we used the bidirectional encoder representations from transformers (BERT) model and a boosted tree model to compare the effects. At the same time, we analyzed the importance of the features extracted through feature engineering. Finally, we integrated the BERT and boosted tree models, and proved the effectiveness of the ensemble through a correlation analysis., Results: The final F1 score of the ensemble model is 0.90825, ranking first among the 90 participating teams. The highest F1 score of the single BERT model is 0.87443, whereas the highest F1 score of the boosted tree single model is only 0.86915. The F1 score of the BERT multi-model ensemble is 0.87473 (an average increase of 0.756% compared to the single model), and the F1 score of the boosted tree multi-model ensemble is 0.86720 (an average decrease of 0.03% compared to the single model). In the feature importance experiment, the out-degree and in-degree of the Q&A sentence are of utmost importance. In the correlation experiment, the correlation coefficients between models of the same type are all as high as 0.9, which shows a high similarity. The correlation coefficient between different types of models is approximately 0.7, which shows a certain degree of discrimination. With the ensemble of the two types of models, the F1 score reached 0.90825, which is 3.88% higher than that of the optimal single model., Conclusion: In our study, the proposed model ensemble method was shown to effectively improve the performance of a single model. It achieves good results in Chinese medical Q&A tasks and has a good generalization property., (Copyright © 2021 Elsevier Inc. All rights reserved.)
Published: 2021
Full Text: View/download PDF

24. Chinese clinical named entity recognition with variant neural structures based on BERT methods.

Author: Li X, Zhang H, and Zhou XH
Subjects: China, Electronic Health Records, Neural Networks, Computer, Text Messaging
Abstract: Clinical Named Entity Recognition (CNER) is a critical task which aims to identify and classify clinical terms in electronic medical records. In recent years, deep neural networks have achieved significant success in CNER. However, these methods require high-quality and large-scale labeled clinical data, which is challenging and expensive to obtain, especially data on Chinese clinical records. To tackle the Chinese CNER task, we pre-train BERT model on the unlabeled Chinese clinical records, which can leverage the unlabeled domain-specific knowledge. Different layers such as Long Short-Term Memory (LSTM) and Conditional Random Field (CRF) are used to extract the text features and decode the predicted tags respectively. In addition, we propose a new strategy to incorporate dictionary features into the model. Radical features of Chinese characters are used to improve the model performance as well. To the best of our knowledge, our ensemble model outperforms the state of the art models which achieves 89.56% strict F1 score on the CCKS-2018 dataset and 91.60% F1 score on CCKS-2017 dataset., Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2020 Elsevier Inc. All rights reserved.)
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

24 results on '"BERT"'

1. The Impact of Data Preparation and Model Complexity on the Natural Language Classification of Chinese News Headlines.

2. Automated Scoring of Translations with BERT Models: Chinese and English Language Case Study.

3. Using pre-trained models and graph convolution networks to find the causal relations among events in the Chinese financial text data.

4. CnGeoPLM: Contextual knowledge selection and embedding with pretrained language representation model for the geoscience domain.

5. Automatic medical specialty classification based on patients' description of their symptoms.

6. GeoBERT: Pre-Training Geospatial Representation Learning on Point-of-Interest.

7. An Information Retrieval-Based Joint System for Complex Chinese Knowledge Graph Question Answering.

8. A Comparative Study of Natural Language Processing Algorithms Based on Cities Changing Diabetes Vulnerability Data.

9. A novel deep learning approach to extract Chinese clinical entities for lung cancer screening and staging.

10. Investigation of causal public opinion indexes for price fluctuation in vegetable marketing.

11. A BERT Fine-tuning Model for Targeted Sentiment Analysis of Chinese Online Course Reviews.

12. Research on the identification and evolution of health industry policy instruments in China.

13. Stacking-BERT model for Chinese medical procedure entity normalization.

14. An imConvNet-based deep learning model for Chinese medical named entity recognition.

15. Revealing Public Opinion towards the COVID-19 Vaccine with Weibo Data in China: BertFDA-Based Model.

16. Disease prediction based on multi-type data fusion from Chinese electronic health record.

17. SiBERT: A Siamese-based BERT network for Chinese medical entities alignment.

18. Research on the Construction of Typhoon Disaster Chain Based on Chinese Web Corpus.

19. Use of BERT (Bidirectional Encoder Representations from Transformers)-Based Deep Learning Method for Extracting Evidences in Chinese Radiology Reports: Development of a Computer-Aided Liver Cancer Diagnosis Framework.

20. A tag based joint extraction model for Chinese medical text.

21. A novel deep learning approach to extract Chinese clinical entities for lung cancer screening and staging.

22. A Coarse-to-Fine Model for Geolocating Chinese Addresses.

23. Exploration of text matching methods in Chinese disease Q&A systems: A method using ensemble based on BERT and boosted tree models.

24. Chinese clinical named entity recognition with variant neural structures based on BERT methods.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

24 results on '"BERT"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources