Descriptor: "Data mining" / Region: china - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Data mining"' showing total 906 results

Start Over Descriptor "Data mining" Region china

906 results on '"Data mining"'

1. THE TRENDS OF POTENTIAL USER RESEARCH FROM 2014-2023 BASED ON BIBLIOMETRIC AND BERTOPIC.

Author: Liu Kun, Alli, Hassan, and Azlin Abd Rahman, Khairul Aidil
Subjects: SMALL business, TEXT mining, BIBLIOMETRICS, MARKET share, CHINA-United States relations, CIVIL service, DATA mining, GREEN business
Abstract: Copyright of Environmental & Social Management Journal / Revista de Gestão Social e Ambiental is the property of Environmental & Social Management Journal and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

2. Network analysis of three-dimensional hard-soft tissue relationships in the lower 1/3 of the face: skeletal Class I-normodivergent malocclusion versus Class II-hyperdivergent malocclusion.

Author: Wang, Tianyi, Nie, Kaichen, Fan, Yi, Chen, Gui, Xu, Kaiyuan, Han, Bing, Pei, Yuru, Song, Guangying, and Xu, Tianmin
Subjects: FACIAL anatomy, FACE, MALOCCLUSION, ORTHODONTICS, TEETH, THREE-dimensional imaging, RESEARCH funding, COMPUTED tomography, RETROSPECTIVE studies, CEPHALOMETRY, INCISORS, HUMAN body, COMPARATIVE studies, MAXILLA, CHIN, TOOTH cervix
Abstract: Background: The determining effect of facial hard tissues on soft tissue morphology in orthodontic patients has yet to be explained. The aim of this study was to clarify the hard-soft tissue relationships of the lower 1/3 of the face in skeletal Class II-hyperdivergent patients compared with those in Class I-normodivergent patients using network analysis. Methods: Fifty-two adult patients (42 females, 10 males; age, 26.58 ± 5.80 years) were divided into two groups: Group 1, 25 subjects, skeletal Class I normodivergent pattern with straight profile; Group 2, 27 subjects, skeletal Class II hyperdivergent pattern with convex profile. Pretreatment cone-beam computed tomography and three-dimensional facial scans were taken and superimposed, on which landmarks were identified manually, and their coordinate values were used for network analysis. Results: (1) In sagittal direction, Group 2 correlations were generally weaker than Group 1. In both the vertical and sagittal directions of Group 1, the most influential hard tissue landmarks to soft tissues were located between the level of cemento-enamel junction of upper teeth and root apex of lower teeth. In Group 2, the hard tissue landmarks with the greatest influence in vertical direction were distributed more forward and downward than in Group 1. (2) In Group 1, all the correlations for vertical-hard tissue to sagittal-soft tissue position and sagittal-hard tissue to vertical-soft tissue position were positive. However, Group 2 correlations between vertical-hard tissue and sagittal-soft tissue positions were mostly negative. Between sagittal-hard tissue and vertical-soft tissue positions, Group 2 correlations were negative for mandible, and were positive for maxilla and teeth. Conclusion: Compared with Class I normodivergent patients with straight profile, Class II hyperdivergent patients with convex profile had more variations in soft tissue morphology in sagittal direction. In vertical direction, the most relevant hard tissue landmarks on which soft tissue predictions should be based were distributed more forward and downward in Class II hyperdivergent patients with convex profile. Class II hyperdivergent pattern with convex profile was an imbalanced phenotype concerning sagittal and vertical positions of maxillofacial hard and soft tissues. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Uncovering emotion sequence patterns in different interaction groups using deep learning and sequential pattern mining.

Author: Huang, Changqin, Yu, Jianhui, Wu, Fei, Wang, Yi, and Chen, Nian‐Shing
Subjects: *DATA mining, *RESEARCH funding, *CLUSTER analysis (Statistics), *DATA analysis, *CONCEPTUAL models, *MASSIVE open online courses, *EDUCATIONAL outcomes, *UNIVERSITIES & colleges, *KRUSKAL-Wallis Test, *GROUP dynamics, *EMOTIONS, *LEARNING, *INTERNET, *DESCRIPTIVE statistics, *DISCUSSION, *PSYCHOLOGY, *DEEP learning, *ONLINE education, *COLLEGE teacher attitudes, *STATISTICS, *INTERPERSONAL relations, *LEARNING strategies, *STUDENT attitudes, *ALGORITHMS, *ACHIEVEMENT
Abstract: Background: Investigating emotion sequence patterns in the posts of discussion forums in massive open online courses (MOOCs) holds a vital role in shaping online interactions and impacting learning achievement. While the majority of research focuses on the relationship between emotions and interactions in MOOC forum discussions, research on identifying the crucial difference in emotion sequence patterns among different interaction groups remains in its infancy. Objectives: This research utilizes deep learning and sequential pattern mining to investigate whether there are differences in emotion sequence patterns across different groups of learners who exhibit various types of interactions in online discussion forums. Methods: Data from a comprehensive array of sources, including log files, discussion texts and scores from 498 learners in online discussion forums, were collected for this study. The agglomerative hierarchical algorithm is used to classify learners into groups with different levels of interactions. Additionally, we implement and evaluate multiple deep learning models for detecting different emotions from online discussions. Relevant emotion sequence patterns were identified using sequence pattern analysis and the identified emotion sequence patterns were compared across different groups with different levels of interactions. Results and Conclusions: Using an agglomerative hierarchical algorithm, we classified learners into three distinct groups characterized by different levels of interactions: high, average and low level. Leveraging the bi‐directional long short‐term memory model for emotion detection yielded the highest predictive performance, with an impressive F‐measure of 94.01%, a recall rate of 93.83% and an accuracy score of 95.01%. The results also revealed that learners in the low‐level interaction group experienced more emotion transition from boredom to frustration than the other two groups. Therefore, the aggregation of students into groups and the utilization of their MOOC log data offer educators the capability to provide adaptive emotional feedback, customize assessments and offer more personalized attention as needed. Lay Description: What is currently known about the subject matter: Emotions are dynamic over time when learners experience cognitive disequilibrium/equilibrium.Online interactions are critical components, which influence learners' emotional state, cognitive processes and learning achievement.It is not clear what are differences in emotion sequence patterns across the groups with different interaction types. What the paper adds: An agglomerative hierarchical algorithm was implemented to cluster learners into three groups by analysing behavioural data.We explore possibilities for automated classification of emotions using deep learning approaches.Learners in the low‐level interaction group experienced more emotion transition from boredom to frustration. Implications for practitioners: A considerable amount of effort should be expended to identify and respond to learners who experience boredom and frustration emotions.Designing interventions or scaffolding to facilitate learners' interaction and promote favourable emotions.Educators could provide more personalized support based on learners' online interaction cluster. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. A data-driven precision teaching intervention mechanism to improve secondary school students' learning effectiveness.

Author: Wang, Yu-Jie, Gao, Chang-Lei, and Ye, Xin-Dong
Subjects: PRECISION teaching, DATA mining, EFFECTIVE teaching, TEACHING models, SECONDARY education
Abstract: The continuous development of Educational Data Mining (EDM) and Learning Analytics (LA) technologies has provided more effective technical support for accurate early warning and interventions for student academic performance. However, the existing body of research on EDM and LA needs more empirical studies that provide feedback interventions, and more attention should be paid to primary and secondary school students. This study proposed a data-driven precision teaching intervention mechanism combining EDM and LA technologies. The proposed mechanism aims to assist teachers in predicting students' academic performance and implementing corresponding interventions. This approach enables early warnings and reminders for students in crisis, and offers teaching assistance and support tailored to students at different levels. A quasi-experimental design was employed to examine the impact of the data-driven precision teaching intervention mechanism on secondary school students' learning outcomes. A total of 142 seventh-grade students participated in the intervention experiment, with an experimental group (50) receiving the data-driven precision teaching intervention, control group2 (48) receiving a group intervention stratified by teacher experience, and control group1 (44) receiving a traditional group intervention. Posttest data were collected after three rounds of intervention. Compared to the two control groups, students in the experimental group demonstrated superior academic achievement, intrinsic motivation, self-efficacy, and meta-cognitive awareness. These findings indicate that the data-driven precision teaching intervention approach positively impacted students' academic development, and effectively promoted their personalized learning. The findings provide pedagogical insights into the application of EDM in conjunction with LA prediction and actionable interventions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. A Surface Water Extraction Method Integrating Spectral and Temporal Characteristics.

Author: Yebin Zou and Rui Shu
Subjects: REMOTE sensing, DATA mining, LANDSAT satellites, LAND cover, SURFACE area
Abstract: Remote sensing has been applied to observe large areas of surface water to obtain higher-resolution and long-term continuous observation records of surface water. However, limitations remain in the detection of large-scale and multi-temporal surface water mainly due to the high variability in water surface signatures in space and time. In this study, we developed a surface water remote sensing information extraction model that integrates spectral and temporal characteristics to extract surface water from multi-dimensional data of long-term Landsat scenes to explore the spatiotemporal changes in surface water over decades. The goal is to extract open water in vegetation, clouds, terrain shadows, and other land cover backgrounds from medium-resolution remote sensing images. The average overall accuracy and average kappa coefficient of the classification were verified to be 0.91 and 0.81, respectively. Experiments applied to China's inland arid area have shown that the method is effective under complex surface environmental conditions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. A Persistent Scatterer Point Selection Method for Deformation Monitoring of Under-Construction Cross-Sea Bridges Using Statistical Theory and GMM-EM Algorithm.

Author: Li, Jianyong, Xu, Zidong, Zhang, Xuedong, Ma, Weiyu, and He, Shuguang
Subjects: *EARTHWORK, *ELASTIC deformation, *BRIDGE design & construction, *POINT set theory, *BUILDING sites, *DATA mining
Abstract: Using traditional algorithms to identify persistent scatterer (PS) points is challenging during bridge construction because of short-term changes at construction sites, such as earthworks, as well as the erection and dismantling of temporary structures. To address this issue, this study proposes a PS point selection method based on statistical theory and Gaussian Mixture Model-Expectation Maximization (GMM-EM) algorithm. This method adopts amplitude information as an incoherence evaluation indicator. Furthermore, the statistical median of the amplitude dispersion index and amplitude mean is screened twice to extract a set of candidate points, including PS points that exhibit stable backscattering over long durations. Temporal coherence is simultaneously used as the coherence evaluation indicator. Another candidate point set is obtained by extracting high-coherence PS points using the GMM-EM algorithm. These sets of candidate points are then combined to obtain a final PS points set. In the experiment, the deformation monitoring of the under-construction Shenzhen-Zhongshan Cross-Sea Bridge in China was selected as a case study, with 28 Sentinel-1A images used as the data source for PS selection and deformation information extraction. The results show that the proposed method enhanced the density and quality of PS points on the under-construction cross-sea bridge compared to existing PS selection methods, thus offering higher reliability. Deformation analysis further revealed fluctuating deformation trends at characteristic points of the Shenzhen-Zhongshan Cross-Sea Bridge, indicating the occurrence of elastic deformation during its construction. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. A Remote Sensing Water Information Extraction Method Based on Unsupervised Form Using Probability Function to Describe the Frequency Histogram of NDWI: A Case Study of Qinghai Lake in China.

Author: Liu, Shiqi, Qiu, Jun, and Li, Fangfang
Subjects: DATA mining, REMOTE sensing, GREENHOUSE gases, PEARSON correlation (Statistics), BODIES of water, HISTOGRAMS
Abstract: With escalating human activities and the substantial emissions of greenhouse gases, global warming intensifies. This phenomenon has led to increased occurrences of various extreme hydrological events, precipitating significant changes in lakes and rivers across the Qinghai Tibet Plateau. Therefore, accurate information extraction about and delineation of water bodies are crucial for lake monitoring. This paper proposes a methodology based on the Normalized Difference Water Index (NDWI) and Gumbel distribution to determine optimal segmentation thresholds. Focusing on Qinghai Lake, this study utilizes multispectral characteristics from the US Landsat satellite for analysis. Comparative assessments with seven alternative methods are conducted to evaluate accuracy. Employing the proposed approach, information about water bodies in Qinghai Lake is extracted over 38 years, from 1986 to 2023, revealing trends in area variation. Analysis indicates a rising trend in Qinghai Lake's area following a turning point in 2004. To investigate this phenomenon, Pearson correlation analysis of temperature and precipitation over the past 38 years is used and unveils the fact that slight precipitation impacts on area and that there is a positive correlation between temperature and area. In conclusion, this study employs remote sensing data and statistical analysis to comprehensively investigate mechanisms driving changes in Qinghai Lake's water surface area, providing insights into ecological shifts in lake systems against the backdrop of global warming, thereby offering valuable references for understanding and addressing these changes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Decoding hotline's information with text-mining: A protocol for improving tobacco control in Shanghai.

Author: Tong Zhao, Zi-an He, Jiaqi Shao, Aksara Regmi, Lili Shi, and Yuyang Cai
Subjects: *SMOKING laws, *SMOKING prevention, *MEDICAL protocols, *SMOKING cessation, *POLICY sciences, *DATA mining, *HELPLINES, *GOVERNMENT policy, *TOBACCO, *HUMAN services programs, *DATA analysis, *RESOURCE allocation, *RESEARCH funding, *CONTENT analysis, *PUBLIC opinion, *THEMATIC analysis, *GOVERNMENT programs, *STATISTICS, *HEALTH promotion, *QUALITY assurance, *GOVERNMENT regulation
Abstract: Tobacco consumption in China remains the primary cause of preventable mortality, with Shanghai being particularly affected by issues related to secondhand smoke exposure. This study explores the role of the public service hotline 12345, a grassroots initiative in Shanghai, in capturing public sentiment and assessing the effectiveness of anti-smoking regulations. Our research aims to accurately and deeply understand the implementation and feedback of smoking control policies: by identifying high-frequency points and prominent issues in smoking control work based on the smoking control work order data received by the health hotline 12320. The results of this study will assist government enforcement agencies in improving smoking monitoring and clarify the direction for improving smoking control measures. Text-mining techniques were employed to analyze a dataset comprising 78011 call sheets, all related to tobacco control and collected from the hotline between 1 January 2015 and 31 December 2019. This methodological approach aims to uncover prevalent themes and sentiments in the public discourse on smoking and its regulation, as reflected in the hotline interactions. Our study identified hotspots and the issues of greatest concern to citizens. Additionally, it provided recommendations to enforcement agencies to enhance their capabilities, optimize the allocation of human resources for smoking control monitoring, reduce enforcement costs and support for anti-smoking campaigns, thereby contributing to more effective tobacco control policies in the region. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Application of data elements in the coupling of finance and technology on the digital electronic platform.

Author: Xie, Wenjun and Wang, Renxiang
Subjects: SUSTAINABLE development, HIGH technology industries, COMMUNICATION infrastructure, ELECTRONIC commerce, DATA mining
Abstract: With the continuous development of Internet technology, all areas of social economy have undergone profound changes. The circulation and application of data elements can significantly promote the construction of infrastructure such as big data centers and mobile base stations, and stimulate the market demand for digital consumption such as digital production, e-commerce and digital trade. Data mining technology will provide sustainable impetus for economic development. There is a clearer analysis of the role of data elements in the platform economy, but there is a lack of research in macroeconomics, especially in the coupling of finance and technology. This paper developed a coupling coordination and a coupling efficiency index system, using different methods to measure outcomes and outline the path to achieving high "quantity" and "quality" of finance-technology coupling development. Based on the data mining technology in e-commerce, the fuzzy set qualitative comparative analysis method is used to analyze the complex mechanisms and driving paths of data elements affecting the coordination degree and coupling efficiency of finance-technology coupling performance of China. We empirically document that data mining and data management are necessary to improve coupling coordination; in the absence of other conditions, data mining or data management can produce high coupling coordination, but not sufficient to improve coupling efficiency. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Unstructured Document Information Extraction Method with Multi-Faceted Domain Knowledge Graph Assistance for M2M Customs Risk Prevention and Screening Application.

Author: Tian, Fengchun, Wang, Haochen, Wan, Zhenlong, Liu, Ran, Liu, Ruilong, Lv, Di, and Lin, Yingcheng
Subjects: KNOWLEDGE graphs, DATA mining, TEXT recognition, BLENDED learning, SYSTEM identification, CONTENT marketing
Abstract: As a crucial national security defense line, the existing risk prevention and screening system of customs falls short in terms of intelligence and diversity for risk identification factors. Hence, the urgent issues to be addressed in the risk identification system include intelligent extraction technology for key information from Customs Unstructured Accompanying Documents (CUADs) and the reliability of the extraction results. In the customs scenario, OCR is employed for M2M interactions, but current models have difficulty adapting to diverse image qualities and complex customs document content. We propose a hybrid mutual learning knowledge distillation (HMLKD) method for optimizing a pre-trained OCR model's performance against such challenges. Additionally, current models lack effective incorporation of domain-specific knowledge, resulting in insufficient text recognition accuracy for practical customs risk identification. We propose a customs domain knowledge graph (CDKG) developed using CUAD knowledge and propose an integrated CDKG post-OCR correction method (iCDKG-PostOCR) based on CDKG. The results on real data demonstrate that the accuracies improve for code text fields to 97.70%, for character type fields to 96.55%, and for numerical type fields to 96.00%, with a confidence rate exceeding 99% for each. Furthermore, the Customs Health Certificate Extraction System (CHCES) developed using the proposed method has been implemented and verified at Tianjin Customs in China, where it has showcased outstanding operational performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. Understanding Current Demand for BIM Professionals in China through Recruitment Data Mining.

Author: Zhou, Simin, Yu, Rui, Pan, Min, Zuo, Jian, Tu, Bocun, and Dong, Na
Subjects: *DATA mining, *BUILDING information modeling, *JOB hunting, *DIGITAL transformation, *TEXT mining
Abstract: Building information modeling (BIM) is critical to the digital transformation and upgrading of the construction industry. With its continuous promotion, the demand for BIM professionals is increasing, which can be seen from the large number of job recruitment ads on the Internet. Mining the massive recruitment information is conducive to understanding current demand for BIM professionals. After 5,033 pieces of BIM-related recruitment information in China being collected and preprocessed, statistical analysis was applied to reveal the demand for BIM professionals at the macro level. Then cluster analysis was carried out to classify different posts, with their corresponding required skills being visualized through a word cloud. Subsequently, based on the extracted keywords, an index system including 11 first-level indicators and 61 second-level indicators was constructed to comprehensively evaluate the competencies of BIM professionals. Finally, correlation analysis was introduced to quantify the relationships between different skills and posts. Through the recruitment data mining, it is possible to understand the overall demand for BIM professionals, available BIM posts, and their requirements which can provide reference for both BIM professionals training and BIM job hunting. Furthermore, it sheds light on the application of text mining in construction industry. This study utilized text mining to extract information from a large quantity of online BIM recruitment data, in order to analyze the current demand for BIM professionals in China. The research findings provide valuable insights into three main subjects. First, for BIM practitioners it provides valuable insights into the current market trends for BIM positions and the essential skills needed, enabling them to engage in targeted learning efforts. Second, for employers it enhances recruitment efficiency by offering a clear picture of the requirement for different positions. Third, for industry and educational institutions the competency index system facilitates the development of well-founded BIM professionals cultivation programs, promoting systematic training for aspiring BIM professionals. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Data mining of reference intervals for serum creatinine: an improvement in glomerular filtration rate estimating equations based on Q-values.

Author: Ma, Yao, Yong, Zhenzhu, Wei, Lu, Yuan, Haichuan, Wan, Lihong, Pei, Xiaohua, Zhang, Feng, Wen, Guohua, Jin, Cheng, Gu, Yan, Zhang, Qun, Zhao, Weihong, and Zhu, Bei
Subjects: *GLOMERULAR filtration rate, *DATA mining, *KIDNEY physiology, *CREATININE, *KIDNEY function tests
Abstract: Glomerular filtration rate (GFR) estimating equations based on rescaled serum creatinine (SCr/Q) have shown better performance, where Q represents the median SCr for age- and sex-specific healthy populations. However, there remains a scarcity of investigations in China to determine this value. We aimed to develop Chinese age- and sex-specific reference intervals (RIs) and Q-values for SCr and to validate the equations incorporating new Q-values. We included 117,345 adults from five centers for establishing RIs and Q-values, and 3,692 participants with reference GFR (rGFR, 99mTc-DTPA renal dynamic imaging measurement) for validation. Appropriate age partitioning was determined using the decision tree method. Lower and upper reference limits and medians were calculated using the refineR algorithm, and Q-values were determined accordingly. We evaluated the full age spectrum (FAS) and European Kidney Function Consortium (EKFC) equations incorporating different Q-values considering bias, precision (interquartile range, IQR), and accuracy (percentage of estimates within ±20 % [P20] and ±30 % [P30] of rGFR). RIs for males were: 18–79 years, 55.53–92.50 μmol/L; ≥80 years, 54.41–96.43 μmol/L. RIs for females were: 18–59 years, 40.42–69.73 μmol/L; 60–79 years, 41.16–73.69 μmol/L; ≥80 years, 46.50–73.20 μmol/L. Q-values were set at 73.82 μmol/L (0.84 mg/dL) for males and 53.80 μmol/L (0.61 mg/dL) for females. After validation, we found that the adjusted equations exhibit less bias, improved precision and accuracy, and increased agreement of GFR categories. We determined Chinese age- and sex-specific RIs and Q-values for SCr. The adjustable Q-values provide an effective alternative to obtain valid equations for estimating GFR. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Sequence‐to‐sequence transfer transformer network for automatic flight plan generation.

Author: Yang, Yang, Qian, Shengsheng, Zhang, Minghua, and Cai, Kaiquan
Subjects: FLIGHT planning (Aeronautics), ARTIFICIAL neural networks, AIR traffic, TRANSFORMER models, AIR travel, MACHINE translating
Abstract: In this work, a machine translation framework is proposed to tackle the flight plan generation in the air transport field. Diverging from the traditional human expert‐based way, a novel sequence‐to‐sequence transfer transformer network to automatic flight plan generation with enhanced operational acceptability is presented. It allows the user to translate the departure and arrival airport pairs denoted as test sentences, into the flyable waypoint sequences denoted as the corresponding source sentences. The approach leverages deep neural networks to autonomously learn air transport specialized knowledge and human expert insights from industry legacy data. Moreover, a multi‐head attention mechanism is adopted to model the complex correlation between airport pairs. Besides, we introduce an innovative waypoint embedding layer to learn effective embeddings for waypoint sequences. Additionally, an extensive flight plan dataset is constructed utilizing real‐world data in China spanning from July to September 2019. Employing the proposed model, rigorous training and testing procedures are conducted on this dataset, yielding remarkably favourable outcomes based on automatic evaluation metrics that are BLEU and METEOR, which outperform other popular approaches. More importantly, the proposed approach achieves high performance in the operational validation and visualization, showing its application potential for real‐world air traffic operation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Agricultural price prediction based on data mining and attention-based gated recurrent unit: a case study on China's hog.

Author: Guo, Yan, Tang, Dezhao, Cai, Qiqi, Tang, Wei, Wu, Jinghua, and Tang, Qichao
Subjects: *FARM produce prices, *AGRICULTURAL prices, *BOX-Jenkins forecasting, *DATA mining, *RECURRENT neural networks, *AGRICULTURAL forecasts, *SUSTAINABLE agriculture, *FORECASTING
Abstract: Under the influence of the coronavirus disease and other factors, agricultural product prices show non-stationary and non-linear characteristics, making it increasingly difficult to forecast accurately. This paper proposes an innovative combinatorial model for Chinese hog price forecasting. First, the price is decomposed using the Seasonal and Trend decomposition using the Loess (STL) model. Next, the decomposed data are trained with the Long Short-term Memory (LSTM) and Seasonal Autoregressive Integrated Moving Average (SARIMA) models. Finally, the prepared data and the multivariate influence factors after Factor analysis are predicted using the gated recurrent neural network and attention mechanisms (AttGRU) to obtain the final prediction values. Compared with other models, the STL-FA-AttGRU model produced the lowest errors and achieved more accurate forecasts of hog prices. Therefore, the model proposed in this paper has the potential for other price forecasting, contributing to the development of precision and sustainable agriculture. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Lightweight Attentive Graph Neural Network with Conditional Random Field for Diagnosis of Anterior Cruciate Ligament Tear.

Author: Wang, Jiaoju, Luo, Jiewen, Liang, Jiehui, Cao, Yangbo, Feng, Jing, Tan, Lingjie, Wang, Zhengcheng, Li, Jingming, Hounye, Alphonse Houssou, Hou, Muzhou, and He, Jinshen
Subjects: KNEE radiography, PEARSON correlation (Statistics), ANTERIOR cruciate ligament injuries, RESEARCH funding, DATA mining, KRUSKAL-Wallis Test, MAGNETIC resonance imaging, NATURAL language processing, DESCRIPTIVE statistics, CHI-squared test, ARTIFICIAL neural networks, DEEP learning, DATA analysis software
Abstract: Anterior cruciate ligament (ACL) tears are prevalent orthopedic sports injuries and are difficult to precisely classify. Previous works have demonstrated the ability of deep learning (DL) to provide support for clinicians in ACL tear classification scenarios, but it requires a large quantity of labeled samples and incurs a high computational expense. This study aims to overcome the challenges brought by small and imbalanced data and achieve fast and accurate ACL tear classification based on magnetic resonance imaging (MRI) of the knee. We propose a lightweight attentive graph neural network (GNN) with a conditional random field (CRF), named the ACGNN, to classify ACL ruptures in knee MR images. A metric-based meta-learning strategy is introduced to conduct independent testing through multiple node classification tasks. We design a lightweight feature embedding network using a feature-based knowledge distillation method to extract features from the given images. Then, GNN layers are used to find the dependencies between samples and complete the classification process. The CRF is incorporated into each GNN layer to refine the affinities. To mitigate oversmoothing and overfitting issues, we apply self-boosting attention, node attention, and memory attention for graph initialization, node updating, and correlation across graph layers, respectively. Experiments demonstrated that our model provided excellent performance on both oblique coronal data and sagittal data with accuracies of 92.94% and 91.92%, respectively. Notably, our proposed method exhibited comparable performance to that of orthopedic surgeons during an internal clinical validation. This work shows the potential of our method to advance ACL diagnosis and facilitates the development of computer-aided diagnosis methods for use in clinical practice. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. QTL Mapping and Data Mining to Identify Genes Associated with Soybean Epicotyl Length Using Cultivated Soybean and Wild Soybean.

Author: Chen, Lin, Ma, Shengnan, Li, Fuxin, Li, Lanxin, Yu, Wenjun, Yu, Lin, Tang, Chunshuang, Liu, Chunyan, Xin, Dawei, Chen, Qingshan, and Wang, Jinhui
Subjects: *SOYBEAN, *LOCUS (Genetics), *DATA mining, *DATA mapping, *REGULATOR genes, *OILSEED plants
Abstract: Soybean (Glycine max) plants first emerged in China, and they have since been established as an economically important oil crop and a major source of daily protein for individuals throughout the world. Seed emergence height is the first factor that ensures seedling adaptability to field management practices, and it is closely related to epicotyl length. In the present study, the Suinong 14 and ZYD00006 soybean lines were used as parents to construct chromosome segment substitution lines (CSSLs) for quantitative trait loci (QTL) identification. Seven QTLs were identified using two years of epicotyl length measurement data. The insertion region of the ZYD00006 fragment was identified through whole genome resequencing, with candidate gene screening and validation being performed through RNA-Seq and qPCR, and Glyma.08G142400 was ultimately selected as an epicotyl length-related gene. Through combined analyses of phenotypic data from the study population, Glyma.08G142400 expression was found to be elevated in those varieties exhibiting longer epicotyl length. Haplotype data analyses revealed that epicotyl data were consistent with haplotype typing. In summary, the QTLs found to be associated with the epicotyl length identified herein provide a valuable foundation for future molecular marker-assisted breeding efforts aimed at improving soybean emergence height in the field, with the Glyma.08G142400 gene serving as a regulator of epicotyl length, offering new insight into the mechanisms that govern epicotyl development. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Out-of-set association analysis of lung cancer drugs and symptoms based on clinical case data mining.

Author: Hong, Mei, Zhao, Yi-Dong, Zhong, Tao-Li, Lu, Ming, Sun, Wen-Hao, Chen, Tian-Yuan, Hong, Nan, Zhu, Yao, and Yu, Da-Hai
Subjects: *LUNG cancer, *DATA mining, *ANTINEOPLASTIC agents, *COUGH, *CHINESE medicine, *TEXT mining
Abstract: BACKGROUND: There are 1.8 million lung cancer deaths worldwide, accounting for 18% of global cancer deaths, including 710,000 in China, accounting for 23.8% of all cancer deaths in China. OBJECTIVE: To explore the out-of-set association rules of lung cancer symptoms and drugs through text mining of traditional Chinese medicine (TCM) treatment of lung cancer, and form medical case analysis to analyze the experience of TCM syndrome differentiation in its treatment. METHODS: The medical records of all patients diagnosed with lung cancer in Nanjing Chest Hospital from January to December 2018 were collected, and the out-of-set association analysis was performed using the MedCase v5.2 TCM clinical scientific research auxiliary platform based on the frequent pattern growth enhanced association analysis algorithm. RESULTS: In terms of TCM treatment of lung cancer, the clinical symptoms with high correlation included cough, expectoration, chest distress, and white phlegm; and the drugs with high correlation included Pinellia ternata, licorice root, white Atractylodes rhizome, and Radix Ophiopogonis; with the prescriptions based on Erchen and Maimendong decoctions. CONCLUSION: This analytical study of the medical cases of TCM treatment for lung cancer was performed using data mining techniques, and the out-of-set association rules between clinical symptoms and drugs were analyzed, including the understanding of lung cancer in TCM. Moreover, the essence of experience in drug use was gathered, providing significant scientific guidance for the clinical treatment of lung cancer. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. 基于真实世界数据分析中医药非介入治疗肺恶性肿瘤的辨治规律.

Author: 卢思玮 and 程淼
Subjects: TREATMENT of lung tumors, CHINESE medicine, DATA mining, CLUSTER analysis (Statistics), QI (Chinese philosophy), ACADEMIC medical centers, FATIGUE (Physiology), HERBAL medicine, DESCRIPTIVE statistics, MEDICAL records, ACQUISITION of data, ANOREXIA nervosa, ACUPUNCTURE points, LUNG tumors, DATA analysis software, COUGH, DYSPNEA, SLEEP disorders, CONSTIPATION, THERAPEUTICS, SYMPTOMS
Abstract: Copyright of Chinese Journal of Cancer Biotherapy is the property of Editorial Office of Chinese Journal of Cancer Biotherapy and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

19. Mining the Micro-Trajectory of Two-Wheeled Non-Motorized Vehicles Based on the Improved YOLOx.

Author: Zhou, Dan, Zhao, Zhenzhong, Yang, Ruixin, Huang, Shiqian, and Wu, Zhilong
Subjects: *TRAFFIC safety, *DATA mining, *TRAFFIC accidents, *FEATURE extraction, *CITIES & towns, *INTELLIGENT transportation systems
Abstract: Two-wheeled non-motorized vehicles (TNVs) have become the primary mode of transportation for short-distance travel among residents in many underdeveloped cities in China due to their convenience and low cost. However, this trend also brings corresponding risks of traffic accidents. Therefore, it is necessary to analyze the driving behavior characteristics of TNVs through their trajectory data in order to provide guidance for traffic safety. Nevertheless, the compact size, agile steering, and high maneuverability of these TNVs pose substantial challenges in acquiring high-precision trajectories. These characteristics complicate the tracking and analysis processes essential for understanding their movement patterns. To tackle this challenge, we propose an enhanced You Only Look Once Version X (YOLOx) model, which incorporates a median pooling-Convolutional Block Attention Mechanism (M-CBAM). This model is specifically designed for the detection of TNVs, and aims to improve accuracy and efficiency in trajectory tracking. Furthermore, based on this enhanced YOLOx model, we have developed a micro-trajectory data mining framework specifically for TNVs. Initially, the paper establishes an aerial dataset dedicated to the detection of TNVs, which then serves as a foundational resource for training the detection model. Subsequently, an augmentation of the Convolutional Block Attention Mechanism (CBAM) is introduced, integrating median pooling to amplify the model's feature extraction capabilities. Subsequently, additional detection heads are integrated into the YOLOx model to elevate the detection rate of small-scale targets, particularly focusing on TNVs. Concurrently, the Deep Sort algorithm is utilized for the precise tracking of vehicle targets. The process culminates with the reconstruction of trajectories, which is achieved through a combination of video stabilization, coordinate mapping, and filtering denoising techniques. The experimental results derived from our self-constructed dataset reveal that the enhanced YOLOx model demonstrates superior detection performance in comparison to other analogous methods. The comprehensive framework accomplishes an average trajectory recall rate of 85% across three test videos. This significant achievement provides a reliable method for data acquisition, which is essential for investigating the micro-level operational mechanisms of TNVs. The results of this study can further contribute to the understanding and improvement of traffic safety on mixed-use roads. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Extraction of Information on Transversity GPDs from and Production on EIC of China.

Author: Xie, Ya-Ping, Goloskokov, S. V., and Chen, Xurong
Subjects: *DATA mining, *MESONS, *FACTORIZATION
Abstract: The General Parton Distributions (GPDs) are applied to study the hard Pseudoscalar Meson Production (PMP) at high energies. The PMP amplitudes are be obtained within the GPDs factorization. They are expressed in terms of GPDs' convolution functions, which are most essential in PMP reactions. We show that these convolution functions can be extracted from the PMP data in future EIC of China (EicC). Predictions of and production at typical EicC energies are performed. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Constructing a Mathematical Model of Product Color Design Based on Data Mining: Case Study of a Thermos Cup.

Author: Liu, Wei, Li, Jin, Rong, Hui, and Zhou, Ziqian
Subjects: COLOR in design, DATABASE design, DATA mining, CONSUMER preferences, PRODUCT design, MULTIPLE regression analysis
Abstract: In product design, color is the first element that acts on the human visual senses and significantly influences consumer decisions. This study aimed to analyze consumers' color preferences for products and explore the mathematical patterns of product color design. Firstly, sales data and images of popular thermos cups from Tmall and Jingdong (JD), two prominent e-commerce platforms in China, were obtained through data mining. Subsequently, this research focused on single-color thermos cups with high sales as the research subject, extracting the hue (H), saturation (S), and value (V) for each cup from the product images. Furthermore, a 3D scatter plot of HSV values was generated using Origin Pro, visually representing the consumers' color preferences. Finally, this study examined the relationships among HSV values of the popular product colors through multiple regression analysis and constructed a mathematical model for HSV. This method enables manufacturers to gain valuable insights into consumer color preferences, facilitating digital color design and enhancing design efficiency and accuracy. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. The development of new occupation practitioners in China's first-tier cities: A comparative analysis.

Author: Zhang, Yuxiang, Chen, Anhang, Li, Linzhen, and Zhang, Huiqin
Subjects: *CITIES & towns, *GEOGRAPHICAL perception, *COMPARATIVE studies, *DATABASES, *DATA mining
Abstract: Owing to the increasingly complex economic environment and difficult employment situation, a large number of new occupations have emerged in China, leading to job diversification. Currently, the overall development status of new occupations in China and the structural characteristics of new occupation practitioners in different cities are still unclear. This study first constructed a development index system for new occupation practitioners from five dimensions (group size, cultural appreciation, salary level, occupation perception, and environmental perception). Relevant data to compare and analyze the development status of new occupation practitioners were derived from the big data mining of China's mainstream recruitment platforms and the questionnaire survey of new professional practitioners which from four first-tier cities and 15 new first-tier cities in China. The results show that the development level of new occupation practitioners in the four first-tier cities is the highest, and the two new first-tier cities, Chengdu and Hangzhou, have outstanding performance. The cities with the best development level of new occupation practitioners in Eastern, Central, and Western China are Shanghai, Wuhan, and Chengdu, respectively. Most new occupation practitioners in China are confident about the future of their careers. However, more than half of the 19 cities are uncoordinated in the five dimensions of the development of new occupation practitioners, especially those cities with middle development levels. A good policy environment and social environment have not yet been formulated to ensure the sustainable development of new occupation practitioners. Finally, we proposed the following countermeasures and suggestions: (1) Establish a classified database of new occupation talents. (2) Implement a talent industry agglomeration strategy. (3) Pay attention to the coordinated development of new occupation practitioners in cities. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. Performance assessment and fitness analysis of athletes using decision tree and data mining techniques.

Author: Yu, Qiuqi
Subjects: *DATA mining, *DECISION trees, *APRIORI algorithm, *ARCHITECTURE students, *DATABASES, *PHYSICAL fitness
Abstract: Recently, the rise in student numbers has led to the establishment of many new colleges and universities in China. As a result, there has been a significant increase in data collection on students' athletic skills. To manage these data, educational institutions are implementing information management systems. However, tracking sports results remain challenging since sports information may not always be collected during sports teaching. This work presents a systematic strategy to address this problem by evaluating student athletes' abilities using an Apriori algorithm, decision tree (DT), and association rule. It covers the processes for collecting data, preprocessing, selecting features, and evaluating models. The efficiency of the DT algorithm in solving classification problems, including student achievement analysis, is emphasized. The association rule algorithm is applied to figure out the correlation between students' physical fitness and their involvement in physical education. The Apriori algorithm is introduced to reduce the amount of data needed in merging item sets. Lastly, the overall architecture of the college students' physical fitness analysis system is presented. It covers the insertion of sports test scores, the calculation of total scores, and the application of DT analysis for evaluating student achievements. The process involves standardizing database information, selecting a training instance set, and determining attributes based on information gain. The efficiency of the system is evaluated in terms of accuracy, precision, recall, and F1 score. In comparison with previous works, our recommended system can track and analyze students' athletic capability, fitness, and physical ability to create personalized workout routines and monitor their health in real-time. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Integrated framework to integrate Spark-based big data analytics and for health monitoring and recommendation in sports using XGBoost algorithm.

Author: Zhao, Yin, Ramos, Ma. Finipina, and Li, Bin
Subjects: *BIG data, *SUPPORT vector machines, *ATHLETIC ability, *TECHNOLOGICAL innovations, *K-nearest neighbor classification, *FEATURE selection
Abstract: In recent years, technological advancements have been replicated in various industries, including sports medicine. Recent developments, such as big data analytics and data mining, which have revolutionized medical services in sports, are apparent in this transformation. This technological shift is motivated by the need to enhance athletic performance, prevent injuries, and offer individualized health advice. Modern lifestyles have simultaneously increased people's attention to their health, creating a demand for better medical services. However, China's ability to provide superior medical care needs to be improved due to a lack of medical resources and an ever-increasing patient population. To address these challenges, this research paper presents an integrated framework that leverages Spark-based big data analytics and the XGBoost algorithm. The framework aims to provide a robust sports medical service encompassing real-time health monitoring and data-driven insights. Powered by the formidable distributed computing platform Spark, it adeptly manages extensive sports data generated during training and events, facilitating instant health evaluations. Incorporating the XGBoost algorithm for data mining amplifies health prediction and recommendation capabilities. Renowned for its predictive prowess, XGBoost excels in discerning intricate sports data patterns and trends. Its proficiency in tackling intricates feature selection and modeling tasks ensures precision and actionable insights. Empirical findings underscore substantial enhancements in sports medical services. When applied to chronic disease datasets, the XGBoost algorithm garnered an impressive 93% trust rate. In contrast to conventional methods like K-Nearest Neighbors (KNN), Random Forest (RF), Decision Trees (DT), Support Vector Machines (SVM), Naïve Bayes (NB), and Logistic Regression (LR), the proposed framework consistently outperforms these established techniques. This remarkable performance underscores the transformative potential of the integrated framework in revolutionizing sports medical services. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. A framework for enterprise assessment of carbon performance using support vector machines.

Author: Shou, Yijun
Subjects: *SUPPORT vector machines, *ENERGY consumption, *ASSOCIATION rule mining, *CARBON offsetting, *FEATURE selection, *K-nearest neighbor classification, *DATA mining
Abstract: In recent years, the escalating global concerns surrounding climate change have placed a growing emphasis on achieving dual objectives: reducing carbon emissions and achieving carbon neutrality. Businesses and organizations are under mounting pressure to align their operations with these crucial environmental goals. This paper introduces the concept of an enterprise carbon performance evaluation index system (ECPIS) to strike a balance between economic development and environmental protection and enhance overall enterprise management and development strategies. The ECPIS framework is constructed using machine learning and advanced data mining techniques, particularly support vector machines (SVM). Its core purpose is to provide enterprises with a systematic tool to gauge, analyze, and enhance their carbon performance, addressing dual carbon objectives. ECPIS development hinges on data mining techniques, extracting insights from diverse data sources to construct a comprehensive system that accommodates these dual carbon goals' intricacies. Its methodology includes data collection, preprocessing, feature selection, and data mining algorithms to unveil vital patterns and relationships within data. It conforms to international standards, establishing a tailored carbon performance index system aligned with China's national conditions. It validates carbon-related enterprise data and employs data mining's association rules to uncover pertinent carbon performance information. The results obtained from ECPIS are auspicious, boasting an experiential accuracy rate of 97.5%. This level of accuracy surpasses that achieved by other algorithms like K-Nearest Neighbors (KNN), Random Forest (RF), Decision Trees (DT), Naïve Bayes (NB), and Logistic Regression (LR). ECPIS stands out by considering various factors, including carbon emissions reduction, energy consumption, supply chain efficiency, and financial performance indicators. This multifaceted approach enables enterprises to gain a comprehensive understanding of their carbon performance and identify areas for improvement. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Expressway ETC Transaction Data Anomaly Detection Based on TL-XGBoost.

Author: Zou, Fumin, Shi, Rouyue, Luo, Yongyu, Hu, Zerong, Zhong, Huan, and Wang, Weihai
Subjects: INTRUSION detection systems (Computer security), TRAFFIC safety, EXPRESS highways, DATA mining
Abstract: China's widely adopted expressway ETC system provides a feasible foundation for realizing co-operative vehicle–infrastructure integration, and the accuracy of ETC data, which forms the basis of this scheme, will directly affect the safety of driving. Therefore, this study focuses on the abnormal data in an expressway ETC system. This study combines road network topology data and capture data to mine the abnormal patterns of ETC data, and it designs an abnormal identification model for expressway transaction data based on TL-XGBoost. This model categorizes expressway ETC abnormal data into four distinct classes: missing detections, opposite lane detection, duplicated detection and reverse trajectory detection. ETC transaction data from a southeastern Chinese province were used for experimentation. The results validate the model's effectiveness, achieving an accuracy of 98.14%, a precision of 97.59%, a recall of 95.44%, and an F1-score of 96.49%. Furthermore, this study conducts an analysis and offers insights into the potential causes of anomalies in expressway ETC data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. Mining the English application learning patterns of college students based on time series clustering.

Author: Niu, Lili
Subjects: *MOBILE learning, *ENGLISH language, *TIME series analysis, *COLLEGE students, *MOBILE operating systems, *MEMORIZATION
Abstract: As a convenient learning tool in the We Media era, mobile apps have been paid more and more attention by college students because of their accompanying timeliness and practicality. With the increasing number of English learning apps, many such apps provide college students with new ways to obtain learning resources and diversified learning modes. The related research in the field of mobile-assisted language learning at home and abroad has developed over nearly 20 years, basically following the route from theory to application in practice, but there have been few process studies on learners' individual language skill learning behaviors based on mobile platform data. In this study, the time series clustering method was adopted, and the learning behavior of college students in an English vocabulary learning app in China was selected for data mining. Firstly, taking the "single-day memorization amount" as the measurement index, the memorization records of college students in the whole use cycle were extracted and processed into trajectory data, and the KmL algorithm was used to cluster the trajectory of the memorization amount in the time series. According to the intra-class average trajectory, the characteristics of learning behavior changes among the different college students are summarized, and two learning modes are depicted. Secondly, through the experimental analysis, it was found that adopting the English learning model three weeks before an exam can effectively stimulate college students and improve their willingness to learn and continue using the app. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

28. Bioactive Flavonoid Constituents from Callicarpa rubella.

Author: Ding, Hui-Lian and Wang, Kui-Wu
Subjects: *ADVANCED glycation end-products, *RUBELLA, *DATA mining
Abstract: This article discusses the bioactive flavonoid constituents found in Callicarpa rubella, a plant traditionally used in China for medicinal purposes. The study isolated eleven flavonoids from the plant and identified their chemical structures. The inhibitory activity of these compounds against the formation of advanced glycation end products (AGEs) was tested in vitro. The article provides detailed information on the extraction and isolation process of these compounds, as well as NMR data for specific compounds. The results suggest that certain chemical groups are important for the compounds' activity, while others reduce their biological activity. [Extracted from the article]
Published: 2023
Full Text: View/download PDF

29. Trend analysis of traffic management based on literature data mining and graph analysis tools.

Author: Ding, Xiaoe, Liu, Wenke, Wang, Chengcheng, Kong, Delan, Tang, Wei, Xu, Run, and Zhang, Changyong
Subjects: DATABASE management, INTELLIGENT transportation systems, DEEP learning, DATA mining, TREND analysis, BIBLIOMETRICS, MACHINE learning
Abstract: Studites on traffic management is crucial for the development of intelligent transportation systems and smart cities. However, identifying the development stages of traffic management field based on bibliometric analysis is still lacking. In this study, CiteSpace and VOSviewer software are used to explore "traffic management" field by summarizing development process and predicting future research trend. A total of 3,028 relevant documents over the past 40 years were collected from Web of Science. Results show that (1) studies on traffic management were mainly published by researchers from USA (30.55%), China (20.90%), and some European countries; (2) the key traffic management research contents can be classified into four categories, that is, background requirements, traffic problems, method models, and control strategies; (3) the evolution process can be divided into four stages, that is, budding stage (1990–1994), development stage (1995–2003), calm stage (2004–2010), and maturation stage (2011–); (4) machine learning, deep learning and other intelligent algorithms have played more important roles in recent years, and connected vehicle management is also a potential development trend. Results suggest that cooperative vehicle‐infrastructure systems or machine learning‐based studies might be the hotspots on traffic management studies. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

30. Demand Analysis of English Language Service Value Based on Data Mining Ecosystem.

Author: Chang'en Shao
Subjects: ECOSYSTEM services, LANGUAGE services, ECONOMIC demand, VALUE (Economics), ENGLISH language, DATA mining
Abstract: INTRODUCTION: Language is a bridge between people, an indispensable component of information exchange and communication, and an essential part of social culture. As the most widely used language in the world, English occupies an essential position in the social language, generating value demands related to it, ranging from the needs of individuals for their development to the needs of social industries or fields for their development to the national language strategy. OBJECTIVES: The purpose of this paper is that due to the implementation of the "One Belt, One Road" policy, more and more cultures of the countries along the route are coming into China's vision, and at the same time, Chinese culture should also be promoted, and only through the correct use of language can we promote the Chinese civilization and other civilizations to seek common ground and put aside their differences, appreciate each other, and eliminate the clash of cultures in the process of mutual collision and fusion of different civilizations. METHODS: Based on the data mining ecosystem, this study examines the demand analysis of the value of English language services in today's environment, explores the explicit and implicit benefits regarding the value of English language services, and analyzes and explores the explicit and implicit economic benefits of language services and the economic, social, and cultural benefits they encompass at different levels. RESULTS: The research suggests that we need to focus on the value of multiform English language services, strengthen English language industry research and studies, and conduct a scientific English language economic program. CONCLUSION: Language services and language economy are two closely related concepts. Analyzing language services from multiple perspectives can reveal their explicit and implicit economic, social and cultural benefits. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

31. Global–Local Information Fusion Network for Road Extraction: Bridging the Gap in Accurate Road Segmentation in China.

Author: Wang, Xudong, Cai, Yujie, He, Kang, Wang, Sheng, Liu, Yan, and Dong, Yusen
Subjects: *INFORMATION networks, *CONVOLUTIONAL neural networks, *DATA mining, *DEEP learning, *BRIDGES, *ENVIRONMENTAL monitoring
Abstract: Road extraction is crucial in urban planning, rescue operations, and military applications. Compared to traditional methods, using deep learning for road extraction from remote sensing images has demonstrated unique advantages. However, previous convolutional neural networks (CNN)-based road extraction methods have had limited receptivity and failed to effectively capture long-distance road features. On the other hand, transformer-based methods have good global information-capturing capabilities, but face challenges in extracting road edge information. Additionally, existing excellent road extraction methods lack validation for the Chinese region. To address these issues, this paper proposes a novel road extraction model called the global–local information fusion network (GLNet). In this model, the global information extraction (GIE) module effectively integrates global contextual relationships, the local information extraction (LIE) module accurately captures road edge information, and the information fusion (IF) module combines the output features from both global and local branches to generate the final extraction results. Further, a series of experiments on two different Chinese road datasets with geographic robustness demonstrate that our model outperforms the state-of-the-art deep learning models for road extraction tasks in China. On the CHN6-CUG dataset, the overall accuracy (OA) and intersection over union (IoU) reach 97.49% and 63.27%, respectively, while on the RDCME dataset, OA and IoU reach 98.73% and 84.97%, respectively. These research results hold significant implications for road traffic, humanitarian rescue, and environmental monitoring, particularly in the context of the Chinese region. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

32. Prediction of PM 2.5 Concentration Using Spatiotemporal Data with Machine Learning Models.

Author: Ma, Xin, Chen, Tengfei, Ge, Rubing, Xv, Fan, Cui, Caocao, and Li, Junpeng
Subjects: *MACHINE learning, *REGRESSION trees, *STANDARD deviations, *SUPPORT vector machines, *CITIES & towns, *RANDOM forest algorithms
Abstract: Among the critical global crises curbing world development and sustainability, air quality degradation has been a long-lasting and increasingly urgent one and it has been sufficiently proven to pose severe threats to human health and social welfare. A higher level of model prediction accuracy can play a fundamental role in air quality assessment and enhancing human well-being. In this paper, four types of machine learning models—random forest model, ridge regression model, support vector machine model, extremely randomized trees model—were adopted to predict PM2.5 concentration in ten cities in the Jing-Jin-Ji region of north China based on multi-sources spatiotemporal data including air quality and meteorological data in time series. Data were fed into the model by using the rolling prediction method which is proven to improve prediction accuracy in our experiments. Lastly, the comparative experiments show that at the city level, RF and ExtraTrees models have better predictive results with lower mean absolute error (MAE), root mean square error (RMSE), and higher index of agreement (IA) compared to other selected models. For seasonality, level four models all have the best prediction performances in winter time and the worst in summer time, and RF models have the best prediction performance with the IA ranging from 0.93 to 0.98 with an MAE of 5.91 to 11.68 μg/m3. Consequently, the demonstration of how each model performs differently in each city and each season is expected to shed light on environmental policy implications. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

33. Understanding unmet medical needs through medical crowdfunding in China.

Author: Wu, Junhong and Peng, Yi
Subjects: *HEALTH services accessibility, *PUBLIC health, *MEDICAL care, *LEUKEMIA, *FUNDRAISING, *CONTENT mining, *CANCER patients, *DESCRIPTIVE statistics, *HEALTH insurance, *CROWDSOURCING, *MEDICAL needs assessment, *DATA mining
Abstract: Online medical crowdfunding has gained popularity in recent years in China. The objective of this study was to identify unmet medical needs in the public healthcare system through analysis of Chinese medical crowdfunding data. Text information extraction and statistical analysis based on large-scale data. From 19 June 2011 to 15 March 2020, data from 30,704 medical crowdfunding projects were collected from Tencent GongYi, which is one of the largest Chinese medical crowdfunding platforms. Text mining methods were used to extract data on the medical conditions and locations of the applicants of medical crowdfunding. In addition, 125 medical crowdfunding projects initiated by leukaemia patients in Chongqing and Nanyang were further investigated through manual data extraction, and the factors impacting the fundraising goals were explored using a generalised linear model. The most common conditions using medical crowdfunding to raise funds were as follows: cancer (31.87%), chronic conditions (18.14%), accidental injury (7.80%) and blood system–related conditions (7.75%). Treatments for cancer and blood system–related conditions are expensive and have serious long-term impacts on the lives of patients. Results showed that the cities of Nanyang and Chongqing had the largest number of crowdfunding projects. This study found that the medical conditions that prompted individuals to apply for crowdfunding were those with long treatment cycles, complexities and expensive medical or non-medical costs. Furthermore, discrepancies in health insurance policies between different regions and residents seeking treatments outside their insurance locations were also important factors that triggered medical crowdfunding applications. Adjusting health insurance policies accordingly may improve the efficiency of utilising health insurance resources and reduce the financial burden on patients. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

34. Corporate social responsibility of Internet enterprises based on data mining.

Author: Wenzhong Zhu, Yabo Shang, Sitong He, and Wen-Tsao Pan
Subjects: *SOCIAL responsibility of business, *DATA mining, *GREY relational analysis, *INTERNET speed, *INTERNET
Abstract: In the age of the Internet economy, Internet enterprises have attracted tremendous public attention, especially in China. In this paper, data mining through regression analysis, grey relational analysis, decision tree analysis and cluster analysis is implemented to further study the relationship between corporate social responsibility (CSR) and corporate financial performance (CFP) of Internet enterprises in China. This study collects and analyzes data of 20 Internet enterprises in China from the year of 2011 to 2016 and draws the following conclusions: (1) the relationship between CSR and CFP of the Internet enterprises is negative; (2) from the stakeholder perspective, CSR to shareholders, creditors and government is positively related to CFP; CSR to customers, suppliers and employees is not positively related to CFP; (3) through decision tree analysis, it is found that what affects the overall CSR performance of the Internet enterprises the most is CSR to customers and suppliers, while what affects the CFP of the Internet enterprises the most is CSR to creditors; (4) through cluster analysis, 20 enterprises can be divided into three types. This study has theoretical, methodological, practical and educational implications for future related research, business practitioners and educational institutions. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

35. Cause analysis of construction collapse accidents using association rule mining.

Author: Shao, Lijia, Guo, Shengyu, Dong, Yimeng, Niu, Hongying, and Zhang, Pan
Subjects: ASSOCIATION rule mining, BRIDGE failures, GOVERNMENT websites, MACHINING, DATA mining, PUBLIC records
Abstract: Purpose: The construction collapse is one of the most serious accidents since it has several attributes (e.g. accident type and consequence) and its occurrence involves various kinds of causal factors (e.g. human factors). The impact of causal factors on construction collapse accidents and the interrelationships among causal factors remain poorly explored. Thus, the purpose of this paper is to use association rule mining (ARM) for cause analysis of construction collapse accidents. Design/methodology/approach: An accident analytic framework is developed to determine the accident attributes and causal factors, and then ARM is introduced as the method for data mining. The data are from 620 historical accident records on government websites of China from 2010 to 2020. Through the generated association rules, the impact of causal factors and the interrelationships among causal factors are explored. Findings: Collapse accident is easily caused by human factors, material and machine condition and management factors. Furthermore, the results show a close interrelationship between many causal factors and construction scheme and organization. The earthwork collapse is greatly related to environmental condition and the scaffolding collapse is greatly related to material and machine condition. Practical implications: This study found relevant knowledge about the key causes for different types of construction collapses. Besides, several suggestions are further provided for construction units to prevent construction collapse accidents. Originality/value: This study uses data mining methods to extract knowledge about the causes of collapse accidents. The impact of causal factors on various types of construction collapse accidents and the interrelationships among causal factors are explained from historical accident data. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

36. Plant Population Classification Based on PointCNN in the Daliyabuyi Oasis, China.

Author: Li, Dinghao, Shi, Qingdong, Peng, Lei, and Wan, Yanbo
Subjects: PLANT classification, PLANT populations, DEEP learning, DATA mining, TAMARISKS, TREE height
Abstract: Populus euphratica and Tamarix chinensis hold significant importance in wind prevention, sand fixation, and biodiversity conservation. The precise extraction of these species can offer technical assistance for vegetation studies. This paper focuses on the Populus euphratica and Tamarix chinensis located within Daliyabuyi, utilizing PointCNN as the primary research method. After decorrelating and stretching the images, deep learning techniques were applied, successfully distinguishing between various vegetation types, thereby enhancing the precision of vegetation information extraction. On the validation dataset, the PointCNN model showcased a high degree of accuracy, with the respective regular accuracy rates for Populus euphratica and Tamarix chinensis being 92.106% and 91.936%. In comparison to two-dimensional deep learning models, the classification accuracy of the PointCNN model is superior. Additionally, this study extracted individual tree information for the Populus euphratica, such as tree height, crown width, crown area, and crown volume. A comparative analysis with the validation data attested to the accuracy of the extracted results. Furthermore, this research concluded that the batch size and block size in deep learning model training could influence classification outcomes. In summary, compared to 2D deep learning models, the point cloud deep learning approach of the PointCNN model exhibits higher accuracy and reliability in classifying and extracting information for poplars and tamarisks. These research findings offer valuable references and insights for remote sensing image processing and vegetation study domains. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

37. A Data-Driven Approach for the Ultra-Supercritical Boiler Combustion Optimization Considering Ambient Temperature Variation: A Case Study in China.

Author: Wang, Zhi, Yao, Guojia, Xue, Wenyuan, Cao, Shengxian, Xu, Shiming, and Peng, Xianyong
Subjects: COAL combustion, BOILERS, COMBUSTION, CLOSED loop systems, STEAM flow, THERMAL efficiency, DATA scrubbing, BOILER efficiency
Abstract: To reduce coal consumption, nitrogen oxide (NOx), and carbon emissions for coal-fired units, combustion optimization has become not only a hot issue for scientists but also a practical engineering for engineers. A data-driven multiple linear regression (MLR) model is proposed to solve the time-consuming problems of boiler online combustion optimization systems. Firstly, A whole year's worth of the historical operating data preprocessing procedure of a coal-fired boiler in a power station including data resampling, data cleaning, steady-state selection, and cluster analysis is performed. In order to meet the applicable conditions of the linear model, the historical operating data are divided into different sub-datasets (combination mode of coal mills, main steam flow, ambient temperature, lower heating value of coal). Secondly, the multi-objective optimization strategy of economical, carbon, and NOx emissions indexes is employed to select operating optimum data packets, and a new dataset is established that is better than the average value of the optimization target in each sub-dataset. On this basis, a stepwise regression algorithm (SRA) is used to select the specific manipulated variables (MVs) that are significant to the multiple optimization targets from 47 candidate MVs in each sub-dataset (different partitions have different types of MVs), and an MLR prediction model is developed. In order to further realize combustion optimization control, the MVs are optimized by employing the MLR model. According to the deviation between the optimal value and the real-time value of the MVs, a boiler combustion closed-loop control system is developed, which is connected with the DCS using the sum of the deviation signal and the corresponding original one. Then, a boiler combustion application test was carried out under some working conditions to verify the feasibility and effectiveness of the approach. The update time of the system signals running on industrial computers is less than 1 s and suitable for online applications. Finally, a full-scale test of the combustion optimization online control system (OCS) is executed. The results show that the boiler thermal efficiency increased by 0.39% based on standard coal, the NOx emissions reduced by 2.85% and the decarbonization effect is significant. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

38. Energy Efficiency State Identification for Cogeneration Units Based on Benchmark Value.

Author: Li, Xin, Gu, Yujiong, and Wang, Zijie
Subjects: ENERGY policy, GAUSSIAN mixture models, ENERGY consumption, COGENERATION of electric power & heat, ENTHALPY, CONSUMPTION (Economics), DATA mining
Abstract: In China, cogeneration units predominantly employ a flexible operation mechanism. However, it is possible that this could lead to a decline in performance and an increase in energy consumption. This paper introduces a methodology that utilizes the data mining technique to ascertain the benchmark value section of the energy efficiency status index for cogeneration units. The equal interval division method is utilized for the purpose of categorizing the operating conditions. The Gaussian mixture model is utilized to ascertain the benchmark value section in relation to the fluctuating operating conditions by estimating the probability of historical data. The methodology is verified by utilizing historical data from a functioning cogeneration unit. The findings suggest that the unit's total heat consumption can be decreased by 32.5–50 kJ·(kW·h)−1 when compared to the design-based approach. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

39. Text Analysis and Visualization Research on the Hetu Dangse During the Qing Dynasty of China.

Author: Zhiyu Wang, Jingyu Wu, Guang Yu, and Zhiping Song
Subjects: *HISTORY of archives, *SUPPORT vector machines, *DATABASES, *MATERIALS management, *MACHINE learning, *ARTIFICIAL intelligence, *DOCUMENTATION, *CATALOGS, *COOPERATIVE cataloging databases, *RESEARCH funding, *DATA analysis, *STATISTICAL correlation, *DATA analysis software, *ALGORITHMS, *DATA mining
Abstract: In traditional historical research, interpreting historical documents subjectively and manually causes problems such as one-sided understanding, selective analysis, and one-way knowledge connection. In this study, we aim to use machine learning to automatically analyze and explore historical documents from a text analysis and visualization perspective. This technology solves the problem of large-scale historical data analysis that is difficult for humans to read and intuitively understand. In this study, we use the historical documents of the Qing Dynasty Hetu Dangse, preserved in the Archives of Liaoning Province, as data analysis samples. China's Hetu Dangse is the largest Qing Dynasty thematic archive with Manchu and Chinese characters in the world. Through word frequency analysis, correlation analysis, co-word clustering, word2vec model, and SVM (Support Vector Machines) algorithms, we visualize historical documents, reveal the relationships between functions of the government departments in the Shengjing area of the Qing Dynasty, achieve the automatic classification of historical archives, improve the efficient use of historical materials as well as build connections between historical knowledge. Through this, archivists can be guided practically in historical materials' management and compilation. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

40. A data-driven method for enhancing the image-based automatic inspection of IC wire bonding defects.

Author: Chen, Junlong, Zhang, Zijun, and Wu, Feng
Subjects: CONVOLUTIONAL neural networks, X-ray imaging, DATA mining, ALGORITHMS
Abstract: Visually inspecting integrated circuit (IC) wire bonding defects is important to ensuring the product quality after the packaging process. The availability of IC X-ray images offers an unprecedented opportunity of studying the image-based automatic IC wire inspection. In this paper, a data-driven method consists of data pre-processing, feature engineering, and classification is developed to address such problem. The data pre-processing is composed of a chip identification algorithm for locating and separating IC chip image patches from the raw images as well as a wire segmentation algorithm for obtaining the wire region. Next, geometric features extracted from the segmented wires are fed into classification models for identifying defects. Five data mining methods are utilised to develop classification models. The vision detection system (VDS) and convolutional neural networks (CNN) are considered as benchmarks. In computational studies, the effectiveness of the developed method is validated by using X-ray images collected from a semiconductor back-end factory in Mainland China. A comparative analysis is conducted to determine the most suitable classifier for the developed method in the chip classification and the SVM model is finally selected. Advantages of the developed method are verified by benchmarking against the VDS and CNN. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

41. Wufengshan Expressway: a review of systems on China's first intelligent highway.

Author: Tang, Zimu, Peng, Xing, Su, Xiaolong, Zhou, Man, and An, Lin
Subjects: *GREENHOUSE gas mitigation, *VEHICULAR ad hoc networks, *EXPRESS highways, *ARTIFICIAL intelligence, *ROADS, *INTERNET of things
Abstract: Wufengshan Expressway, which opened in June 2021, is China's first intelligent highway. Its smart systems include 5G communications (the fifth-generation technology standard for broadband cellular networks), wireless chargers, artificial intelligence, the 'internet of things' and the 'internet of vehicles'. The use of these technologies has improved operational efficiency, safety and management. It has also helped to reduce greenhouse gas emissions and protect the environment. This paper introduces the various systems installed on the new road and will serve as a reference for future intelligent highways. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

42. Water productivity maximization and ecosystem monitoring to estimate tourism economic value.

Author: Maozheng Fu, Zhenrong Luo, Liying Feng, and Xiaoping Que
Subjects: VALUE (Economics), WATER security, WATER supply, RESOURCE exploitation, RESTORATION ecology, WATER consumption
Abstract: Water supply from a common pool resource based on productivity indicators for different uses is one of the goals of planning in dry areas. Productivity indicators are defined based on time, geographical location and hydrological conditions in the form of food security, economic benefits and ecosystem restoration. This study was conducted in order to evaluate the contrast between economic criteria and food security in the exploitation of water resources in Lu'an city in Anhui province of China. Probabilistic modeling based on the prediction of uncertain values using the Latin hypercube technique was used for hydrological variables and water resources. The method of data mining and trend analysis of dependent variables was also simulated to estimate economic values in the water cycle. Statistical information of 32 years from 1991 to 2022 has been collected and used as an annual average per population. The results showed that the economic value of water consumption in the tourism industry has increased compared to agriculture. The total water provided for food security is equal to 6.5 m3 per person, the excess of which can be allocated to other uses through weighting indicators based on ecosystem and quality. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

43. Metabolic profiling of the flower of Citrus aurantium L. var. amara Engl. in rats using ultra‐high‐performance liquid chromatography coupled to quadrupole time‐of‐flight tandem mass spectrometry with data mining strategy.

Author: Zhou, Huixian, Huang, Xinxin, Tan, Ting, and Luo, Yun
Subjects: *TANDEM mass spectrometry, *TIME-of-flight mass spectrometry, *DATA mining, *LIQUID chromatography, *QUADRUPOLE ion trap mass spectrometry, *ORAL drug administration, *BIOTRANSFORMATION (Metabolism), *POLLINATION
Abstract: Rationale: The flower of Citrus aurantium L. var. amara Engl. (FCAVA), an edible tea and herbal medicine with anti‐obesity effect, has attracted great attention in China. The structural elucidation of chemical components in FCAVA has been realized in our previous work. It is well known that metabolic profiling provided a structural basis to discover potential anti‐obesity ingredients in FCAVA. Nevertheless, there are no reports about in vivo metabolic profiles of FCAVA. Therefore, it is necessary to comprehensively identify in vivo substances of FCAVA. Methods: The identification of in vivo substances of FCAVA remains a challenge due to the strong interference of complex chemical components, biological matrices and metabolite isomers. In this work, ultra‐high‐performance liquid chromatography coupled to quadrupole time‐of‐flight tandem mass spectrometry (UHPLC‐QTOF‐MS/MS) analysis with a data mining strategy was established and applied for the metabolic profiling of FCAVA in rats. The data mining strategy, including diagnostic product ions and neutral loss filtering, improved structural elucidation of xenobiotics in rats after oral administration of FCAVA. Results: A total of 228 xenobiotics, including 80 prototypes (10 unambiguous confirmed with reference standards) and 148 metabolites, were tentatively characterized in rat plasma, urine and fecal samples. Among them, 35 xenobiotics were found in plasma, 124 in urine and 156 in feces. The main biotransformation pathway of FCAVA metabolism was deglycosylation, methylation, glucuronidation and sulfation. The main compounds absorbed into the blood were neohesperidin and naringin, which have been reported to show significant anti‐obesity effect. Conclusions: Collectively, this present study would be conducive to the discovery of active ingredients of FCAVA for the treatment of obesity and the development of quality control of FCAVA. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

44. Exploring Acute Pancreatitis Clinical Pathways Using a Novel Process Mining Method.

Author: Yang, Xue, Huang, Wei, Zhao, Weiling, Zhou, Xiaobo, Shi, Na, and Xia, Qing
Subjects: DATA science, COMPUTER assisted instruction, MEDICAL protocols, RESEARCH funding, PANCREATITIS, ELECTRONIC health records, SENSITIVITY & specificity (Statistics), ACUTE diseases, DATA mining
Abstract: Mining process models of medical behavior from electronic medical records is an effective way to optimize clinical pathways. However, clinical medical behavior is an extremely complex field with high nonlinearity and variability, and thus we need to adopt a more effective method. In this study, we developed a fuzzy process mining method for complex clinical pathways. Firstly, we designed a multi-level expert classification system with fuzzy values to preserve finer details. Secondly, we categorized medical events into long-term and temporary events for more specific data processing. Subsequently, we utilized electronic medical record (EMR) data of acute pancreatitis spanning 9 years, collected from a large general hospital in China, to evaluate the effectiveness of our method. The results demonstrated that our modeling process was simple and understandable, allowing for a more comprehensive representation of medical intricacies. Moreover, our method exhibited high patient coverage (>0.94) and discrimination (>0.838). These findings were corroborated by clinicians, affirming the accuracy and effectiveness of our approach. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

45. Patients' perspectives on irritable bowel syndrome: a qualitative analysis based on social media in China.

Author: Sun, Shaopeng, Chen, Jiajia, Li, Heng, Lou, Yijie, Chen, Lixia, and Lv, Bin
Subjects: *IRRITABLE colon, *PATIENTS' attitudes, *SOCIAL media, *DATA mining, *GROUNDED theory
Abstract: Aim: To explore the perspectives, experience, and concerns of patients with irritable bowel syndrome (IBS) in China. Methods: We used data mining to investigate posts shared in Baidu Tieba concerned with IBS; we collected the data through the crawler code, and mined the cleaned data's themes based on Latent Dirichlet allocation (LDA) and the Grounded theory. Results: We found 5746 network posts related to IBS. LDA analysis generated 20 topics, and grounded theory analysis established eight topics. Combining the two methods, we finally arranged the topics according to five concepts: difficulty in obtaining disease information; serious psychosocial problems; dissatisfied with the treatment; lack of social support; and low quality of life. Conclusion: Social media research improved patient-centric understanding of patients' experiences and perceptions. Our study may facilitate doctor-patient communication and assist in the formulation of medical policies. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

46. Spatio-Temporal Information Extraction and Geoparsing for Public Chinese Resumes.

Author: Li, Xiaolong, Zhang, Wu, Wang, Yanjie, Tan, Yongbin, and Xia, Jing
Subjects: *DATA mining, *NATURAL language processing
Abstract: As an important carrier of individual information, the resume is an important data source for studying the spatio-temporal evolutionary characteristics of individual and group behaviors. This study focuses on spatio-temporal information extraction and geoparsing from resumes to provide basic technical support for spatio-temporal research based on resume text. Most current studies on resume text information extraction are oriented toward recruitment work, such as the automated information extraction, classification, and recommendation of resumes. These studies ignore the spatio-temporal information of individual and group behaviors implied in resumes. Therefore, this study takes the public resumes of teachers in key universities in China as the research data, proposes a set of spatio-temporal information extraction solutions for electronic resumes of public figures, and designs a spatial entity geoparsing method, which can effectively extract and spatially locate spatio-temporal information in the resumes. To verify the effectiveness of the proposed method, text information extraction models such as BiLSTM-CRF, BERT-CRF, and BERT-BiLSTM-CRF are selected to conduct comparative experiments, and the spatial entity geoparsing method is verified. The experimental results show that the precision of the selected models on the named entity recognition task is 96.23% and the precision of the designed spatial entity geoparsing method is 97.91%. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

47. Understanding early experiences of Chinese frontline nurses during the COVID‐19 pandemic: A text mining and thematic analysis of social media information.

Author: Luo, Yunting, Feng, Xianqiong, Wang, Dandan, Zheng, Mingyue, and Reinhardt, Jan D.
Subjects: *SENTIMENT analysis, *POSITIVE psychology, *COVID-19, *NURSES' attitudes, *WORK, *SOCIAL media, *RESEARCH methodology, *SELF-control, *EXPRESSIVE arts therapy, *HOSPITAL nursing staff, *EXPERIENTIAL learning, *CHI-squared test, *RESEARCH funding, *TEXT messages, *THEMATIC analysis, *WRITTEN communication, *EMOTIONS, *COVID-19 pandemic, *DATA mining, *OPTIMISM
Abstract: This study aims to explore the early experiences of frontline nurses at the beginning of the COVID‐19 pandemic in China as expressed through social media posts. This study used an explanatory sequential mixed‐method design. Text mining was used for sentiment analysis. The chi‐square test was used to compare the differences in the composition ratio of sentiment classification of posts in different months. Word frequency was statistically analyzed. Further thematic analysis was also performed. The primary sentiments of the posts were discovered to be positive and neutral. The number of posts containing positive emotions was the lowest in January, peaked in March, and gradually declined in April 2020. The following nurse‐oriented narrative themes were developed: "To see and be seen," "Moving forward amid adversity and support," and "Returning to everyday life and constructing meaning." The sentiments of Chinese nurses in response to the pandemic fluctuated, with positive emotions in the early stage, but it could not be sustained. This study recommends nurses could be encouraged to engage in expressive writing while adhering to ethical guidelines. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

48. Analysis of the cluster efficacy and prescription characteristics of traditional Chinese medicine intervention for non-small cell lung cancer based on a clustering algorithm.

Author: Hong, Mei, Sun, Wen-Hao, Lu, Ming, Zhong, Tao-Li, Chen, Tian-Yuan, Zhao, Yi-Dong, Hong, Nan, Zhu, Yao, and Ding, Yi-Yan
Subjects: *NON-small-cell lung carcinoma, *CHINESE medicine, *CLUSTER analysis (Statistics), *DATA scrubbing, *LUNG cancer
Abstract: BACKGROUND: In recent years, malignant tumors have gradually become one of the main causes of death for Chinese residents, of which lung cancer ranks first in both the incidence and mortality in China. OBJECTIVE: To mine the text of traditional Chinese medicine (TCM) clinical medical cases after data cleaning, analyze it, and study the experience of TCM doctors in treating non-small cell lung cancer (NSCLC). METHODS: The applied approach was based on the data mining methods of decentralized and hierarchical system clustering of data from a drug and prescription database. This study involved 215 patients, 287 cases, and 147 types of clinical drugs. RESULTS: The data analysis of the clinical treatment of NSCLC in TCM showed that Erchen Decoction was the main method for the treatment of non-small cell lung cancer in clinical treatment of non-small cell lung cancer. Junjian recipes were close to each other, with Banzhilian, Lobelia, Shanci Mushroom, Hedyotis diffusa to anticancer and detoxify. CONCLUSION: This study analyzed the core TCM prescription for NSCLC by collecting the empirical essence and characteristics of specific medications. It has some guiding scientific significance for the clinical treatment of lung cancer. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

49. Data mining from process monitoring of typical polluting enterprise.

Author: Zhao, Wenya, Zhang, Peili, Chen, Da, Wang, Hao, Gu, Binghua, and Zhang, Jue
Subjects: DATA mining, WASTE gases, CHEMICAL oxygen demand, ENVIRONMENTAL management, FLUE gases, ELECTRIC conductivity, ELECTRICAL conductivity measurement, FECAL contamination
Abstract: With the increasing volume of environmental monitoring data, extracting valuable insights from multivariate time series sensor data can facilitate comprehensive information utilization and support informed decision-making in environmental management. However, there is a dearth of comprehensive research on multivariate data analysis for process monitoring in typical polluting enterprises. In this study, an artificial neural network model based on back-propagation algorithm (BP-ANN) was developed to predict the wastewater and exhaust gas emissions using IoT data obtained from process monitoring of a typical polluting enterprise located in Taizhou, Zhejiang Province, China. The results indicate that the model constructed has a high predictive coefficient of determination (R2) with values of 0.8510, 0.9565, 0.9561, 0.9677, and 0.9061 for chemical oxygen demand (COD), potential of hydrogen (pH), electrical conductivity (EC), flue gas emission (FGE), and non-methane hydrocarbon concentration (NMHC) respectively. For the first time, the variable importance measure (VIM)–assisted BP-ANN was employed to investigate the internal and external correlations between wastewater and exhaust gas treatment, thereby enhancing the interpretability of mapping features in the BP-ANN model. The predicted errors for pH and FGE have been demonstrated to fall within the range of − 0.62 ~ 0.30 and − 0.21 ~ 0.15 m3/s, respectively, with average relative errors of 1.05% and 9.60%, which is advantageous in detecting anomalous data and forecasting pollution indicator values. Our approach successfully addresses the challenge of segregating data analysis for wastewater disposal and exhaust gas disposal in the process monitoring of polluting enterprises, while also unearthing potential variables that significantly contribute to the BP-ANN model, thereby facilitating the selection and extraction of characteristic variables. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

50. Short-term Prediction Method of Reservoir Downstream Water Level Under Complicated Hydraulic Influence.

Author: Huang, Jingwei, Qin, Hui, Zhang, Yongchuan, Hou, Dongkai, Zhu, Sipeng, and Ren, Pingan
Subjects: WATER levels, RANDOM forest algorithms, BACKWATER, DATA mining, FORECASTING
Abstract: The downstream water level of a reservoir is influenced by its own discharge, changes in external hydraulic conditions, and the value of the previous period's downstream water level, and is very sensitive to hourly changes. However, the influence mechanisms of this change and an accurate prediction method have yet to be investigated. In this study, the downstream water level of Xiangjiaba reservoir in China's Jinsha river was used as a case study to analyze the impact of backwater effects caused by river rising during the flood season and the effect of sharp fluctuations caused by the peak regulation flow during the non-flood season. Moreover, an accurate prediction method at short-term two hourly scale is proposed. This study quantified the backwater effect caused by the rising tributaries of Hengjiang and Minjiang rivers. The random forest algorithm (RF) was used to downscale and rank multidimensional feature data, build different model factor sets, and build a downstream water level prediction model using five different methods. The results showed that the data mining model had the best fit and good prediction ability for the downstream water level of the Xiangjiaba reservoir under the influence of complicated hydraulic factors during the flood season, and can effectively control the fluctuation error during the peak regulation period. The research findings can be applied to other similar basins to improve the reservoir's short-term refined operational levels. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

906 results on '"Data mining"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources