Descriptor: "DATA mining" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"DATA mining"' showing total 28,070 results

Start Over Descriptor "DATA mining"

28,070 results on '"DATA mining"'

1. Supporting Information Visualization Research in an Academic Library: Lessons Learned from an Analysis of the Literature.

Author: Groenendyk, Michael and Neugebauer, Tomasz
Subjects: *WORLD Wide Web, *MOBILE apps, *DATA mining, *ACADEMIC libraries, *LIBRARIANS, *INFORMATION storage & retrieval systems, *RESEARCH, *ACCESS to information, *USER interfaces
Abstract: This paper summarizes librarian research on information visualization as well as general trends in the broader field, highlighting the most recent trends, important journals, and which subject disciplines are most involved with information visualization. By comparing librarian research to the broader field, the paper identifies opportunities for libraries to improve their information visualization support services. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. An Overview of TikTok's Technology and Issues: Data collection and security concerns pose problems for Congress.

Subjects: *SMARTPHONES, *ARTIFICIAL intelligence, *DATA mining, *ADMINISTRATIVE law
Abstract: The article presents the discussion on TikTok is a globally popular video-sharing smartphone application. Topics include app builds this feed through a "recommendation engine" that uses artificial intelligence (AI) technologies and data mining practices; and administration is still considering options to curtail TikTok's ability to operate in the US.
Published: 2024

3. Rainfall prediction using machine learning techniques.

Author: Shabu, S. L. Jany, Refonaa, J., Devi, D., Aishwarya, D., Babu, K. Krishna, and Reddy, K. Purshotham
Subjects: *RAINFALL, *CLIMATOLOGY, *DATA mining, *CROP yields, *RAIN forests
Abstract: India is a farming nation and its economy is to a great extent dependent on rainforest creation. Downpour estimates are vital and fundamental for all ranchers to examine crop yields. Unsurprising rainfall is the capacity to foresee the climate with the assistance of science and innovation. It is essential to know how much rainfall to utilize water assets, horticultural creation and water arranging proficiently. Various strategies for information mining can foresee rainfall. Information extraction is utilized to appraise rainfall. This article features probably the most well-known rainfall forecast calculations. Guileless Bayes, K-Near Neighbour Algorithm, and Certificate Tree are a portion of the calculations contrasted with this record. According to a relative perspective, it is feasible to break down how rainfall is accurately anticipated. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Prediction of arrhythmia from MIT-BIH database using support vector machine (SVM) and naive bayes (NB) classifiers.

Author: Vinutha, K. and Thirunavukkarasu, Usharani
Subjects: *SUPPORT vector machines, *DATABASES, *ARRHYTHMIA, *NAIVE Bayes classification, *DATA mining, *STATISTICS
Abstract: The primary purpose of this research is to use the Support Vector Machine (SVM) classifier and the Naive Bayes (NB) classifier to make arrhythmia predictions from the MIT-BIH database. With an alpha of 0.05, 95% confidence interval (CI), 80% power, and an enrollment ratio of 1, the proposed research employed SVM and NB machine learning algorithms to predict arrhythmia using the MIT-BIH dataset of 65 normal and 65 abnormal ECG signals downloaded from IEEE dataport in.xlsx format. We used the data mining programme WEKA 3.8.5 to determine how to categorise people with and without arrhythmia. IBM SPSS version 21 was used for the statistical analysis. When comparing SVM and NB classifiers, a statistically significant difference (p0.010) was found. Using WEKA's 10-fold cross-validation for training and testing, the SVM classifier outperformed the NB classifier with an 88.50% accuracy rate in classification (80.39 percent). [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Prediction of arrhythmia from MIT-BIH database using J48 and k-nearest neighbours (KNN) classifiers.

Author: Vinutha, K. and Thirunavukkarasu, Usharani
Subjects: *DATABASES, *MACHINE learning, *ARRHYTHMIA, *K-nearest neighbor classification, *VENTRICULAR arrhythmia, *DATA mining, *STATISTICS
Abstract: The primary goal of this research is to use J48 and K-Nearest Neighbor (KNN) classifiers to predict arrhythmia using the MIT-BIH database. With an alpha of 0.05, 95% confidence interval (CI), 80% power, and an enrollment ratio of 1, the proposed study used the J48 and KNN machine learning algorithms to predict arrhythmia using data from the MIT-BIH dataset consisting of healthy (n=65) and arrhythmia (n=65) ECG signals collected from IEEE dataport in.XLSX format. WEKA 3.8.5, a data mining tool, was used to distinguish between those with arrhythmia and those without. IBM SPSS version 21 was used for the statistical analysis. There was no discernible difference (p=0.025) between the J48 and KNN classifiers. Using WEKA's 10-fold cross validation to train, test, and verify the classifiers, we find that the J48 classifier is more accurate at classifying data (89.80 percent) than the KNN classifier (87.64 percent). [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Anemia classification in Gujarat using data mining.

Author: Thakor, Kajal, Parikh, Swapnil, Gandhi, Ankita, and Dabhi, Vipul
Subjects: *DATA mining, *ERYTHROCYTES, *CHILDBEARING age, *ANEMIA, *HEART failure, *PYTHON programming language
Abstract: It is possible to collect online data from huge databases by web mining. Anemia can be identified by low hemoglobin levels or a shortage of red blood cells. The most frequent cause is undernutrition, which affects young children, pregnant women, and women of reproductive age most frequently. Without treatment or attention, it may result in heart failure or an enlarged heart. In this study, we collect information regarding anemia in Gujarat from the internet and organize it according to several criteria. The dataset is created by using Python soup to scrape the web. We can further categorize the dataset we extract to reach a conclusion. The identification of anemia-related information provided by this research also helps to save lives. Data mining technologies forecast future trends, support organisations, and display facts in a way that is simple for people to grasp. For a variety of stakeholders, including policymakers, programme planners, health service providers, academicians, research scholars, and regular citizens, the data can be processed, analysed, and described by a specially trained algorithm, enabling them to make decisions based on the available evidence. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Data mining decision tree algorithm C4.5 classification of student personality characteristics.

Author: Selvida, Desilia, Pulungan, Annisa Fadhillah, and Elveny, Marischa
Subjects: *CLASSIFICATION algorithms, *ABILITY grouping (Education), *DATA mining, *DECISION trees, *ERROR rates, *ALGORITHMS
Abstract: The C4.5 algorithm still has weaknesses in predicting or classifying data if the amount is large. It is necessary to improve the performance of the C4.5 algorithm with the selected split attribute using application of the average gain value to perform the classification. The C4.5 algorithm is one of the Decision Tree methods in the classification process using entropy. The result of the classification obtained from the analysis can be a classification of 8 student data from 100 student data that is tested to produce information on Sanguine, Choleric, Melancholy, and Phlegmatic. From the result of the Decision Tree classification algorithm C4.5 has an accuracy rate of 86.36% with an application error rate or error of 13.64%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Implementation of data mining to predict student graduation using C4.5 algorithm method.

Author: Anggraeni, Dewi, Rizaldi, Nasution, Akmal, and Kholiq, Abdul
Subjects: *DATA mining, *DECISION trees, *STUDENT interests, *UNIVERSITIES & colleges, *GRADUATION (Education), *GRADUATE students
Abstract: Students graduating on time are an essential indicator for a higher education institution in supporting campus accreditation. Several factors cause students to graduate on time, namely the origin of the previous student's school and student interest. This study aims to predict student graduation based on the head of the last student's school and student interest so that higher education institutions can get the basis for decisions that will be taken in the future. The method used in analyzing student data and supporting criteria for predicting student graduation is the C4.5 algorithm. Then for the decision tree classifier, this research uses data mining. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Employee recruitment data mining application using the naïve bayes algorithm.

Author: Apridonal, Yori, Sembiring, Muhammad Ardiansyah, Sari, Rahayu Mayang, Meri, Mufrida, and Linda, Roza
Subjects: *EMPLOYEE recruitment, *DATA mining, *EMPLOYEE handbooks, *NEW employees, *RESEARCH personnel, *DIMENSIONS
Abstract: In a company, employees are the main movers of the company, and the role of an employee is very important to help run business processes in the company. Kumala Galindo Lestari Company recruits employees who are deemed to meet the qualifications required by the company. However, there are several obstacles in the employee recruitment process, where the file selection process and data collection of prospective new employees before the test are to determine a decision because the process is still manual and employee recruitment must be right on target because if it is wrong in the hiring decision it will cause performance that is not by management. company. The concept of data mining will make it easier to overcome these problems, so the classification method can find models that distinguish concepts or data classes to be able to estimate the class of an object. Therefore, the Naive Bayes algorithm can predict future opportunities based on previous experience. In this study, researchers took 41 data on prospective employees, using 4 criteria: Graduate, Salary Requests, Work Experience, and Classification. The results of this research are expected to help Kumala Gasindo Lestari Company in determining the appropriate and effective recruitment of employees. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. The application of the K-NN imputation method for handling missing values in a dataset.

Author: Syahrizal, Muhammad, Aripin, Soeb, Utomo, Dito Putro, Mesran, M., Sarwandi, S., and Hasibuan, Nelly Astuti
Subjects: *MISSING data (Statistics), *MACHINE learning, *K-nearest neighbor classification, *DATA mining, *DATA warehousing
Abstract: One of the things that is highly expected when collecting data is to produce complete data. In research, incomplete data will affect the results obtained. This is due to the non-maximum process carried out in the research. A dataset is a collection of data information that has been stored for a long time and becomes a large pile of data. Not infrequently, the dataset used in the research data presented is not complete. The missing value problem can be solved using data mining techniques. Data mining is the process of extracting information from a collection of data already stored in the data warehouse. Classification is the process of finding a common identity among different entities and classifying them into appropriate classes. Classification of large and complex data if performed manually would be difficult and take a long time. The K-Nearest Neighbor Imputation algorithm is a system that uses a supervised learning algorithm and aims to find new data patterns by connecting existing data patterns with new data. The conclusion that the authors drew is that the application of data mining to search for lost data using the K-Nearset Neighbor Imputation (KNNI) method is a process for generating new knowledge in the form of a comparison between the factors that influence the search for the missing data. The results of data mining using the K-Nearest Neighbor Imputation (KNNI) method are an arrangement of sequences of activities that support each other in the process. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. A comparative analysis of data normalization on data mining classification performance.

Author: Utomo, Dito Putro, Mesran, M., Sarwandi, S., Aripin, Soeb, Syahrizal, Muhammad, and Pristiwanto, P.
Subjects: *DATA mining, *DATA analysis, *ELECTRONIC data processing, *COMPARATIVE studies, *CLASSIFICATION
Abstract: Data are a collection of information in the form of facts. Information is stored in data from various origins. Data processing is an important step that is currently carried out. Data processing is commonly performed using data mining. However, data processing usually face barriers that keep it from fully running well because the data stored in the dataset sometimes are not in a normal form. One of the problems encountered in random data is that there is a considerable distance between data, which sets an impediment to data processing. This problem can be solved using normalization. Normalization is also generally referred to as simplification. Some algorithms such as the min-max normalization and Z-score algorithms can be used for normalization. The results of the testing on the use of the min-max normalization and Z-score algorithms for normalization revealed that the former had better performance than the latter. This was judged from the magnitude of the increase in accuracy obtained from the use of both algorithms, in which case min-max normalization gained an increase of 0.41%, while Z-score normalization did an increase of 0.14%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. A comparison of the performance of data mining classification algorithms on medical datasets with the application of data normalization.

Author: Mesran, M., Syahrizal, Muhammad, Sarwandi, S., Aripin, Soeb, Utomo, Dito Putro, and Karim, Abdul
Subjects: *DATA mining, *CLASSIFICATION algorithms, *MEDICAL coding, *RESEARCH personnel, *MEDICAL research, *ALGORITHMS
Abstract: Medical research has evolved considerably at this time. In the past decade, a growing number of researchers have been conducting numerous medical research works. Existing medical research generally uses medical datasets. The medical research that is currently being carried out is related to the use of computers and datasets. Data processing on datasets usually uses data mining techniques, one of which is classification. The results from a classification process are generally dependent on the model formed. Based on this, it is necessary to perform a comparison of classification algorithms in data mining. The purpose of this comparison is to find out which algorithm has the best performance. The data processing carried out on medical datasets faces various obstacles, including the problem of the distance between scattered values. Normalization is a process of simplifying the data contained in a dataset. Based on the test results from the comparison performed, it was found that the K-NN algorithm has better performance than the Naïve Bayes algorithm. The K-NN algorithm obtained a 95.30% accuracy before normalization and 95.44% after normalization. Meanwhile, the Naïve Bayes algorithm obtained an accuracy of 86.28% before normalization and 95.44% after normalization. It can be summarized that normalization was better with the use of the Naïve Bayes algorithm, with an increase of accuracy level of 6.21%. On the other hand, the K-NN algorithm increased only by 0.14%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Research on factors influencing the consumer repurchase intention: Data mining of consumers' online reviews based on machine learning.

Author: Zhang, Jianming, Zheng, Hao, Liu, Jie, and Shen, Wei
Subjects: *CONSUMER behavior, *CONSUMERS' reviews, *DATA mining, *THEORY of reasoned action, *QUALITY of service, *NATURAL language processing
Abstract: The fierce competition in the market makes it necessary for enterprises to not only consider how to increase consumers' purchase intention but also study to maintain high customer loyalty for continuous purchases. Taking the smartphone brands on the Jingdong platform (hereafter referred to as JD) as an example, the study collected 60,000 review data and using NLP technology for data mining, factors that may affect consumers' willingness to repurchase were extracted. Based on Theory of Reasoned Action (TRA), the questionnaire was made for empirical research. The results showed that the four factors, product attributes, service quality, brand image and price significantly affect consumers' repurchase intention, while service quality had the strongest effect among them, implications of the research are discussed. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Efficient truncated randomized SVD for mesh-free kernel methods.

Author: Noorizadegan, A., Chen, C.-S., Cavoretto, R., and De Rossi, A.
Subjects: *RADIAL basis functions, *POISSON'S equation, *PARTIAL differential equations, *INTERPOLATION algorithms, *SINGULAR value decomposition, *DATA mining, *SCIENTIFIC community
Abstract: This paper explores the utilization of randomized SVD (rSVD) in the context of kernel matrices arising from radial basis functions (RBFs) for the purpose of solving interpolation and Poisson problems. We propose a truncated version of rSVD, called trSVD, which yields a stable solution with a reduced condition number in comparison to the non-truncated variant, particularly when manipulating the scale or shape parameter of RBFs. Notably, trSVD exhibits exceptional proficiency in capturing the most significant singular values, enabling the extraction of critical information from the data. When compared to the conventional truncated SVD (tSVD), trSVD achieves comparable accuracy while demonstrating improved efficiency. Furthermore, we explore the potential of trSVD by employing scale parameter strategies, such as leave-one-out cross-validation and effective condition number. Then, we apply trSVD to solve a 2D Poisson equation, thereby showcasing its efficacy in handling partial differential equations. In summary, this study offers an efficient and accurate solver for RBF problems, demonstrating its practical applicability. The code implementation is provided to the scientific community for their access and reference. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. 中药复方治疗原发性骨质疏松用药规律及作用机制分析.

Author: 张景涛, 胡珉华, 刘世涛, 李树源, 江泽欣, 曾文星, 马路遥, and 周琦石
Abstract: BACKGROUND: Traditional Chinese medicine compound prescription has a long history in the treatment of primary osteoporosis, and the curative effect is definite, but the medication rule and mechanism are not clearOBJECTIVE: Using the methodology of data mining and network pharmacology, to explore and verify the law of drug use and molecular mechanism of modern traditional Chinese medicine in the treatment of primary osteoporosis. METHODS: The relevant documents included in CNKI, WanFang, VIP and PubMed were used as data sources, and the relevant data were statistically counted and extracted by Microsoft EXCEL2019, IBMSPSS25.0 and other software. The high-frequency drugs obtained from the data statistics were analyzed by association rules analysis and cluster analysis, and the core drug combination of traditional Chinese medicine compound prescription in the treatment of primary osteoporosis was obtained by combining the two results. The therapeutic mechanism of this combination was explained by network pharmacology and verified by molecular docking. RESULTS AND CONCLUSION: Finally, 151 articles were included and 207 prescriptions were selected, involving 285 flavors of Chinese herbs. (1) Ten groups of important drug combinations were obtained through the above two analyses, among which the core drug combination with the highest confidence and improvement was “Drynaria-Eucommia-Angelica.” The key components of the combination in the treatment of primary osteoporosis were quercetin, kaempferol, naringenin and so on. The core targets were SRC proto-oncogene, phosphoinositide-3-Kinase regulatory subunit 1 and RELA proto-oncogene. The main pathways were cancer signaling pathway, JAK-STAT signaling pathway, VEGF signaling pathway, and NF-κB signaling pathway. (2) The key active components were docked with the core targets, and the two showed a good combination. To conclude, Chinese herbal compound therapy in the treatment of primary osteoporosis can use a variety of active components to exert its efficacy through multiple signal pathways and acting on multiple targets, which can provide a theoretical basis for the research and development of new drugs for the follow-up treatment of primary osteoporosis. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Detection Algorithms for Simple Two-Group Comparisons Using Spontaneous Reporting Systems.

Author: Noguchi, Yoshihiro and Yoshimura, Tomoaki
Subjects: *MEDICAL sciences, *SIGNAL detection, *DRUG efficacy, *ALGORITHMS, *DATA mining, *MEDICATION safety
Abstract: Medical science has often used adult males as the standard to establish pathological conditions, their transitions, diagnostic methods, and treatment methods. However, it has recently become clear that sex differences exist in how risk factors contribute to the same disease, and these differences also exist in the efficacy of the same drug. Furthermore, the elderly and children have lower metabolic functions than adult males, and the results of clinical trials on adult males cannot be directly applied to these patients. Spontaneous reporting systems have become an important source of information for safety assessment, thereby reflecting drugs' actual use in specific populations and clinical settings. However, spontaneous reporting systems only register drug-related adverse events (AEs); thus, they cannot accurately capture the total number of patients using these drugs. Therefore, although various algorithms have been developed to exploit disproportionality and search for AE signals, there is no systematic literature on how to detect AE signals specific to the elderly and children or sex-specific signals. This review describes signal detection using data mining, considering traditional methods and the latest knowledge, and their limitations. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. High‐resolution reservoir prediction method based on data‐driven and model‐based approaches.

Author: ZeYang, Liu, Wei, Song, XiaoHong, Chen, WenJin, Li, Zhichao, Li, and GuoChang, Liu
Subjects: *DEEP learning, *SHALE oils, *NATURAL gas prospecting, *PETROLEUM prospecting, *OIL fields, *DATA mining
Abstract: The Jiyang depression in the southeastern part of the Bohai Bay Basin has a relatively large scale set of shale oil in the Paleogene Shahejie Formation, but the complex internal components lead to narrow frequency bands, low resolution and difficulty in reservoir information extraction. Impedance is important information for reservoir characterization, and how to predict high‐resolution impedance using available information is particularly important. Deep learning, known for its effectiveness in addressing non‐linear problems, has found extensive applications in various fields of oil and gas exploration. However, the challenges of overfitting and poor generalization persist due to the limited availability of training datasets. Besides, existing methods often use networks to solve a single problem in fact, deep learning can deal with a series of problems intelligently. In order to partially solve the above problems, an intelligent storage prediction network framework is proposed in this paper. Physical information is introduced to realize data‐driven and model‐based approaches, thus solving the problem of difficult construction of training datasets. The processing part accomplishes the high‐resolution processing of seismic records, thus solving the problems of narrow bandwidth and low resolution. Initial model constraints are introduced so as to obtain more stable inversion results. Finally, the well data is compared and analysed to identify and predict the lithology and complete the intelligent prediction of unconventional reservoirs. The results are compared with the traditional model‐driven inversion method, revealing that the approach presented in this paper exhibits higher resolution in predicting dolomite. This contributes to the establishment of a robust data foundation for reservoir evaluation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Global shutter CMOS vision sensors and event cameras for on‐chip dynamic information.

Author: Jaklin, Marko, García‐Lesta, Daniel, López, Paula, and Brea, Victor M.
Subjects: *IMAGE sensors, *CAMERAS, *INTELLIGENT sensors, *DATA mining
Abstract: The on‐chip extraction of dynamic information from a scene can be addressed with either frame‐based CMOS vision, also called smart image sensors, or dynamic vision sensors, also known as event cameras. When implemented with a pinned photodiode (PPD) as a 4‐transistor active pixel sensor (4T‐APS), the former brings about the benefits of low temporal noise and dark current but without high dynamic range (HDR). The latter comes with the benefits of HDR and a fast event detection rate with low power consumption. The drawback is the background activity noise, which leads to additional hardware or algorithms to keep it low. This paperanalyses the mismatch and noise of a global shutter 4T‐APS implementation with local HDR through an overflow capacitor and correlated double sampling (CDS) to provide low noise events through frame differencing. The aim is to narrow the gap with dynamic vision sensors in terms of event rate and dynamic range. We show that our solution would be competitive with event cameras in scenarios with slow moving objects and a relatively wide dynamic range (85 dB). [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Multiscale Characteristics and Connection Mechanisms of Attraction Networks: A Trajectory Data Mining Approach Leveraging Geotagged Data.

Author: Jiang, Hongqiang, Wei, Ye, Mei, Lin, and Wang, Zhaobo
Subjects: *SCALE-free network (Statistical physics), *GEOTAGGING, *DATA mining, *URBAN tourism, *TOURIST attractions, *MATTHEW effect
Abstract: Urban tourism is considered a complex system, and multiscale exploration of the organizational patterns of attraction networks has become a topical issue in urban tourism, so exploring the multiscale characteristics and connection mechanisms of attraction networks is important for understanding the linkages between attractions and even the future destination planning. This paper uses geotagging data to compare the links between attractions in Beijing, China during four different periods: the pre-Olympic period (2004–2007), the Olympic Games and subsequent 'heat period' (2008–2013), the post-Olympic period (2014–2019), and the COVID-19(Corona Virus Disease 2019) pandemic period (2020–2021). The aim is to better understand the evolution and patterns of attraction networks at different scales in Beijing and to provide insights for tourism planning in the destination. The results show that the macro, meso-, and microscales network characteristics of attraction networks have inherent logical relationships that can explain the commonalities and differences in the development process of tourism networks. The macroscale attraction network degree Matthew effect is significant in the four different periods and exhibits a morphological monocentric structure, suggesting that new entrants are more likely to be associated with attractions that already have high value. The mesoscale links attractions according to the common purpose of tourists, and the results of the community segmentation of the attraction networks in the four different periods suggest that the functional polycentric structure describes their clustering effect, and the weak links between clusters result from attractions bound by incomplete information and distance, and the functional polycentric structure with a generally more efficient network of clusters. The pattern structure at the microscale reveals the topological transformation relationship of the regional collaboration pattern, and the attraction network structure in the four different periods has a very similar importance profile structure suggesting that the attraction network has the same construction rules and evolution mechanism, which aids in understanding the attraction network pattern at both macro and micro scales. Important approaches and practical implications for planners and managers are presented. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Pipelined biomedical event extraction rivaling joint learning.

Author: Wu, Pengchao, Li, Xuefeng, Gu, Jinghang, Qian, Longhua, and Zhou, Guodong
Subjects: *DATA mining, *MACHINE learning, *IDENTIFICATION, *SEMANTIC computing
Abstract: • An approach for pipelined event extraction consists of trigger identification, argument role recognition, and event construction. • BERT-based models are applied to three sub-tasks in biomedical event extraction. • N-ary relation extraction can effectively determine the validity of a candidate Binding event. • A pipelined biomedical event extraction rivaling joint learning. Biomedical event extraction is an information extraction task to obtain events from biomedical text, whose targets include the type, the trigger, and the respective arguments involved in an event. Traditional biomedical event extraction usually adopts a pipelined approach, which contains trigger identification, argument role recognition, and finally event construction either using specific rules or by machine learning. In this paper, we propose an n-ary relation extraction method based on the BERT pre-training model to construct Binding events, in order to capture the semantic information about an event's context and its participants. The experimental results show that our method achieves promising results on the GE11 and GE13 corpora of the BioNLP shared task with F1 scores of 63.14% and 59.40%, respectively. It demonstrates that by significantly improving the performance of Binding events, the overall performance of the pipelined event extraction approach or even exceeds those of current joint learning methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Smart Control of DCT Proportional Solenoid Valve Based on Data Mining.

Author: Yang, Qing, Wu, Guangqiang, and Zhang, Subin
Subjects: *SOLENOIDS, *DATA mining, *MACHINE learning, *VALVES, *AUTOMATIC automobile transmissions, *FEEDFORWARD neural networks, *PRESSURE control, *MAGNETIC materials
Abstract: High performance of clutch control is essential for dual-clutch transmission (DCT) system to ensure good shifting smoothness of vehicle driving. The control performance of clutch in DCT driven by proportional solenoid valve depends on the output pressure control of the solenoid valve, while the output pressure of the solenoid valve is directly controlled by the energized current. Therefore, the relationship between working current and output pressure of the solenoid valve has significant impact on the clutch control and affects driving performance of the vehicle accordingly. However, the pressure-to-current (P/I) relationship of proportional solenoid valve has nonlinear hysteresis characteristic caused by magnetic materials, oil viscous friction, operation temperature, etc., which has negative effects on the accuracy and stability of solenoid valve pressure control. To cope with this problem, a machine learning model called long short-term memory (LSTM) for P/I of solenoid valve based on data mining is adopted in this paper, which is then used as feedforward compensation for closed-loop control of solenoid valve. The test result demonstrates that the machine learning model can effectively predict the output pressure at rising and falling phase of the same working current. Besides, this smart control method which has better applicability in engineering can effectively improve the control performance of proportional solenoid valve and further improve clutch control and vehicle driving performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Supporting Better Insights of Data Science Pipelines with Fine-grained Provenance.

Author: Chapman, Adriane, Lauro, Luca, Missier, Paolo, and Torlone, Riccardo
Subjects: *DATA science, *RELATIONAL databases, *PYTHON programming language, *DATA mining, *DATABASES, *MACHINE learning, *IMAGE recognition (Computer vision)
Published: 2024
Full Text: View/download PDF

23. The development of reference working cycles for agricultural tractors.

Author: Angelucci, Leonardo and Mattetti, Michele
Subjects: *FARM tractors, *AUTOMOBILE power trains, *HIDDEN Markov models, *TRACTOR industry, *ENERGY shortages, *ENERGY consumption, *AGRICULTURAL equipment
Abstract: Climate change and the current energy crisis are creating new challenges to agriculture and new technological solutions must be developed to increase agricultural machinery efficiency. Researchers and machinery manufacturers identified electrified powertrains as a possible solution to meet this demand. The development of field-effective electrified powertrains is challenging mostly due to the wide variability of operating conditions of agricultural tractors. While the automotive industry adopted reference driving cycles for the design and evaluation of hybrid powertrains, the tractor industry has not been able to easily record external load in real-world conditions as it requires dedicated systems that cannot be used under prolonged field usages. This study aims to provide a methodology for estimating a reference working cycle from a multi-year dataset using technologies available in current commercial tractors. Data were collected on a tractor used for 3 years of agricultural work. Data were first clustered into work states, then, for each state, signal features from on-tractor sensors were used to extract key factors to compute the reference work state. With an optimisation solver and a hidden Markov model, the reference working cycle that synthesised the real-world tractor use was calculated. This cycle was compared with established cycles for non-road mobile machinery. The new reference cycle better represented real-world tractor usage as it also complied with low engine operations, which are frequent in farming and mostly associated with machine setup. The new reference working cycle permits a reliable estimation of fuel consumption of real-world farming. • Reference working cycles are necessary for designing field-effective drivetrains. • A reference working cycle aiming to synthesise real-word usage was calculated. • The reference cycle makes the engine operates in its operating domain. • The cycle better reproduces real-world usage than established cycles for NRMM. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Multivariate analysis and data mining help predict asthma exacerbations.

Author: Mihaicuta, Stefan, Udrescu, Lucretia, Militaru, Adrian, Nadasan, Valentin, Tiotiu, Angelica, Bikov, Andras, Ursoniu, Sorin, Birza, Romina, Popa, Alina Mirela, and Frent, Stefan
Subjects: *DATA mining, *MULTIVARIATE analysis, *ASTHMA, *OCCUPATIONAL exposure, *DATA analysis
Abstract: Work-related asthma has become a highly prevalent occupational lung disorder. Our study aims to evaluate occupational exposure as a predictor for asthma exacerbation. We performed a retrospective evaluation of 584 consecutive patients diagnosed and treated for asthma between October 2017 and December 2019 in four clinics from Western Romania. We evaluated the enrolled patients for their asthma control level by employing the Asthma Control Test (ACT < 20 represents uncontrolled asthma), the medical record of asthma exacerbations, occupational exposure, and lung function (i.e. spirometry). Then, we used statistical and data mining methods to explore the most important predictors for asthma exacerbations. We identified essential predictors by calculating the odds ratios (OR) for the exacerbation in a logistic regression model. The average age was 45.42 ± 11.74 years (19–85 years), and 422 (72.26%) participants were females. 42.97% of participants had exacerbations in the past year, and 31.16% had a history of occupational exposure. In a multivariate model analysis adjusted for age and gender, the most important predictors for exacerbation were uncontrolled asthma (OR 4.79, p <.001), occupational exposure (OR 4.65, p <.001), and lung function impairment (FEV1 < 80%) (OR 1.15, p =.011). The ensemble machine learning experiments on combined patient features harnessed by our data mining approach reveal that the best predictor is professional exposure, followed by ACT. Machine learning ensemble methods and statistical analysis concordantly indicate that occupational exposure and ACT < 20 are strong predictors for asthma exacerbation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Identification of key TE associated with myocarditis based on RNA and single-cell sequencing data mining.

Author: Sixing Chen, Fei Jiang, Jinqiu Wu, Zhi Li, Xiongwei Fan, Xiushan Wu, Yongqing Li, Fang Li, Zhigang Jiang, and Yuequn Wang
Subjects: *RNA sequencing, *MYOCARDITIS, *DATA mining, *HEART cells, *IMMUNE complexes, *COXSACKIEVIRUSES, *CARDIOMYOPATHIES
Abstract: Cardiomyopathy is a severe cardiac condition characterized by complex immune regulatory mechanisms. While the role of immune genes is recognized, the specifics of their regulation in cardiomyopathy are not fully understood. Recent studies highlight the significance of transposable elements (TEs) in various diseases, particularly their potential to modulate immune responses. This paper utilizes publicly available databases to explore the role of TEs in myocarditis: RNA Seq data and single-cell sequencing data were analyzed, with a focus on the mouse model of experimental autoimmune myocarditis (EAM). The RNA-Seq analysis revealed substantial upregulation of a range of immune genes in cardiac tissue. Further investigation using single-cell sequencing of cardiac immune cells identified specific expression of certain transposable elements (TEs) across different types of immune cells in the heart. Additionally, there was an overall increase in the expression of the ERVB7-1. LTR-MM transposon across various cells in the EAM model, suggesting a widespread impact of this transposon on the immune response in this disease context. The findings of this study highlight the intricate interplay between transposable elements and the immune system in cardiomyopathy, providing new insights into the molecular mechanisms underlying this condition. The discovery of specific TEs expression in cardiac immune cells and the overall increase in ERVB7-1. LTR-MM expression across the EAM model underscore the potential of these elements in modulating immune responses and contribute to our understanding of cardiomyopathy's pathogenesis. These observations open avenues for further research into the role of TEs in cardiac disases and may lead to novel therapeutic strategies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. An information extraction method based on improved mixed text density web pages.

Author: Zhou, Yuan, Yin, Xiaojun, and Yan, Jingchen
Subjects: *DATA mining, *WEBSITES, *SEMIDEFINITE programming, *DATA extraction, *ARTIFICIAL joints, *RELAXATION techniques, *DENSITY
Abstract: To improve the effect of web page information extraction, this paper proposes an improved information extraction method of mixed text density grids. Under various power constraints of the relay node itself, this paper proposes a design scheme of joint beamforming and artificial noise based on safety and rate maximisation. Furthermore, with the help of semidefinite relaxation techniques and first‐order approximations, it can be approximated as a semidefinite programming problem that is easy to solve. In addition, this paper uses an iterative algorithm based on a continuous convex approximation to process data to improve the accuracy of web page data extraction. The experimental results show that the information extraction method based on improved mixed text density webpages proposed in this paper has a good information extraction effect. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. ERIM: An ensemble of rare itemset mining and its application in the automotive industry.

Author: Akdas, Devrim Naz, Birant, Derya, and Yildirim Taser, Pelin
Subjects: *AUTOMOBILE industry, *ANOMALY detection (Computer security), *BASIC needs, *ARTIFICIAL intelligence, *A priori
Abstract: Discovering previously unknown anomalies that are rare and dramatically differ from the majority of the data is a critical need for the automotive industry. Rare itemset mining (RIM), one of the pattern‐based methods, has been used for anomaly detection due to providing successful analysis results. However, several aspects still need to be explored, such as improving the mining process by identifying more targeted, valuable and reliable rare itemsets. Motivated by this fact, this study proposes a novel approach, named ensemble of rare itemset mining (ERIM), which investigates weak rare itemsets (WRIs) using different algorithms and aggregates these rules to obtain strong rare itemsets (SRIs). This study also combines four different RIM algorithms (Apriori Rare, Apriori Inverse, CORI and RP‐Growth) as base learners for the first time. The proposed ERIM approach is a general methodology that can be applied to any field, but, in this study, it was used in the automotive industry as a case study. In the experiments, ERIM was applied to a real‐world gear manufacturing dataset to discover anomalies in machine downtimes. The experimental results were evaluated in terms of the number of itemsets and the length of itemsets by giving some samples, as well. The results showed that the proposed ERIM approach gives more reliable common knowledge by jointly considering the relation between WRIs discovered by the base learners. The findings indicated that the proposed ERIM technique was successful in detecting anomalies whose support values are below 7.12. Furthermore, it is clear from the experimental results that the ERIM discovered the highest number of SRIs, 1403, each of which is a 3‐itemset. Finally, the results showed that our method performed 43.37% better on average than state‐of‐the‐art methods on the same dataset. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Development of PROMETHEE-Entropy data mining model for groundwater potentiality modeling: a case study of multifaceted geologic settings in south-western Nigeria.

Author: Mogaji, Kehinde Anthony and Atenidegbe, Olanrewaju Fred
Subjects: *DATA mining, *GROUNDWATER, *DECISION support systems, *GEOLOGICAL modeling, *RECEIVER operating characteristic curves, *GROUNDWATER management
Abstract: This work looks at developing an object-driven decision support system (DSS) model with the goal of improving the prediction accuracy of the present expert-driven DSS model in assessing groundwater potentiality. The database of remote sensing, geological, and geophysical information was constructed using the technological efficiency of GIS, data mining, and programming tools. Groundwater potential conditioning factors (GPCF) extracted from the datasets include lithology (Li), hydraulic conductivity (K), lineament density (Ld), transmissivity (T), and transverse resistance (TR) for groundwater potentiality mapping in a typical hard rock multifaceted geologic setting in south-western Nigeria. A Python-based entropy approach was used to objectively weight these factors. The weightage findings determined that the greatest and lowest given values for Ld and K were 0.6 and 0.03, respectively. The produced Python-based PROMETHEE-Entropy model algorithm was born through combining the weight findings with the Python-based PROMETHEE-II method. The groundwater potentiality model (GPM) map of the area was created using the model algorithm's outputs on the gridded raster of GPCF themes. Based on the suggested approach, the validated results of the created GPM maps using the Receiver Operating Characteristic (ROC) curve technique yielded an accuracy of 86%. An object-driven DSS model was created using the approaches that were used. The created object-driven model is a viable alternative to existing approaches in groundwater hydrology and aids in the automation of groundwater resource management in the research region. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Joint modeling of the longitudinal student mark and the competing events of degree completion and academic dropout.

Author: Kemda, Lionel Establet and Murray, Michael
Subjects: *ACADEMIC degrees, *DATA mining, *FINANCIAL aid, *COMPETING risks, *ACADEMIC achievement
Abstract: Within educational data mining, it is common to model students' academic performance using a linear regression model or the time to degree completion or dropout using the cause-specific hazard model. Yet to our knowledge, no studies have simultaneously modeled the longitudinal performance and the hazard models. We propose a joint modeling approach in which we estimate the effect of a student's semester longitudinal weighted mark obtained after attempting t credit points, on the hazard of degree completion and academic dropout. Evidence suggests that the joint modeling approach is substantially more efficient compared to the separate modeling of the longitudinal and the time-to-event outcomes. We observe similarities in the parameter estimates of the longitudinal submodels, but smaller standard errors of the estimates in the joint model. However, the parameter estimates of the competing risk models from both analysis methods are different. A unit increase in the average log weighted mark results in a 18.5 fold increase in the hazard associated with degree completion, but reduces the risk of dropout by 49 per cent. Being in a university-type residence, not having financial aid, and having a higher number of high school matriculation points all increase the hazard rate of degree completion. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. Volumetric brain MRI signatures of heart failure with preserved ejection fraction in the setting of dementia.

Author: Bermudez, Camilo, Kerley, Cailey I., Ramadass, Karthik, Farber-Eger, Eric H., Lin, Ya-Chen, Kang, Hakmook, Taylor, Warren D., Wells, Quinn S., and Landman, Bennett A.
Subjects: *VENTRICULAR ejection fraction, *HEART failure, *ELECTRONIC health records, *PARIETAL lobe, *CEREBRAL atrophy, *AMYGDALOID body, *NUCLEUS accumbens
Abstract: Heart failure with preserved ejection fraction (HFpEF) is an important, emerging risk factor for dementia, but it is not clear whether HFpEF contributes to a specific pattern of neuroanatomical changes in dementia. A major challenge to studying this is the relative paucity of datasets of patients with dementia, with/without HFpEF, and relevant neuroimaging. We sought to demonstrate the feasibility of using modern data mining tools to create and analyze clinical imaging datasets and identify the neuroanatomical signature of HFpEF-associated dementia. We leveraged the bioinformatics tools at Vanderbilt University Medical Center to identify patients with a diagnosis of dementia with and without comorbid HFpEF using the electronic health record. We identified high resolution, clinically-acquired neuroimaging data on 30 dementia patients with HFpEF (age 76.9 ± 8.12 years, 61% female) as well as 301 age- and sex-matched patients with dementia but without HFpEF to serve as comparators (age 76.2 ± 8.52 years, 60% female). We used automated image processing pipelines to parcellate the brain into 132 structures and quantify their volume. We found six regions with significant atrophy associated with HFpEF: accumbens area, amygdala, posterior insula, anterior orbital gyrus, angular gyrus, and cerebellar white matter. There were no regions with atrophy inversely associated with HFpEF. Patients with dementia and HFpEF have a distinct neuroimaging signature compared to patients with dementia only. Five of the six regions identified in are in the temporo-parietal region of the brain. Future studies should investigate mechanisms of injury associated with cerebrovascular disease leading to subsequent brain atrophy. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Predicting game-induced emotions using EEG, data mining and machine learning.

Author: Lim, Min Xuan and Teo, Jason
Abstract: Background: Emotion is a complex phenomenon that greatly affects human behavior and thinking in daily life. Electroencephalography (EEG), one of the human physiological signals, has been emphasized by most researchers in emotion recognition as its specific properties are closely associated with human emotion. However, the number of human emotion recognition studies using computer games as stimuli is still insufficient as there were no relevant publicly available datasets provided in the past decades. Most of the recent studies using the Gameemo public dataset have not clarified the relationship between the EEG signal's changes and the emotion elicited using computer games. Thus, this paper is proposed to introduce the use of data mining techniques in investigating the relationships between the frequency changes of EEG signals and the human emotion elicited when playing different kinds of computer games. The data acquisition stage, data pre-processing, data annotation and feature extraction stage were designed and conducted in this paper to obtain and extract the EEG features from the Gameemo dataset. The cross-subject and subject-based experiments were conducted to evaluate the classifiers' performance. The top 10 association rules generated by the RCAR classifier will be examined to determine the possible relationship between the EEG signal's frequency changes and game-induced emotions. Results: The RCAR classifier constructed for cross-subject experiment achieved highest accuracy, precision, recall and F1-score evaluated with over 90% in classifying the HAPV, HANV and LANV game-induced emotions. The 20 experiment cases' results from subject-based experiments supported that the SVM classifier could accurately classify the 4 emotion states with a kappa value over 0.62, demonstrating the SVM-based algorithm's capabilities in precisely determining the emotion label for each participant's EEG features' instance. Conclusion: The findings in this study fill the existing gap of game-induced emotion recognition field by providing an in-depth evaluation on the ruleset algorithm's performance and feasibility of applying the generated rules on the game-induced EEG data for justifying the emotional state prediction result. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Adverse event signal mining and serious adverse event influencing factor analysis of fulvestrant based on FAERS database.

Author: Yin, Guisen, Song, Guiling, Xue, Shuyi, and Liu, Fen
Abstract: Fulvestrant, as the first selective estrogen receptor degrader, is widely used in the endocrine treatment of breast cancer. However, in the real world, there is a lack of relevant reports on adverse reaction data mining for fulvestrant. To perform data mining on adverse events (AEs) associated with fulvestrant and explore the risk factors contributing to severe AEs, providing a reference for the rational use of fulvestrant in clinical practice. Retrieved adverse event report information associated with fulvestrant from the U.S. Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) database, covering the period from market introduction to September 30, 2023. Suspicious AEs were screened using the reporting odds ratio (ROR) and proportional reporting ratio methods based on disproportionality analysis. Univariate and multivariate logistic regression analyses were conducted on severe AEs to explore the risk factors associated with fulvestrant-induced severe AEs. A total of 6947 reports related to AEs associated with fulvestrant were obtained, including 5924 reports of severe AEs and 1023 reports of non-severe AEs. Using the disproportionality analysis method, a total of 210 valid AEs were identified for fulvestrant, with 45 AEs (21.43%) not listed in the product labeling, involving 11 systems and organs. The AEs associated with fulvestrant were sorted by frequency of occurrence, with neutropenia (325 cases) having the highest number of reports. By signal strength, injection site pruritus showed the strongest signal (ROR = 658.43). The results of the logistic regression analysis showed that concurrent use of medications with extremely high protein binding (≥ 98%) is an independent risk factor for severe AEs associated with fulvestrant. Age served as a protective factor for fulvestrant-related AEs. The co-administration of fulvestrant with CYP3A4 enzyme inhibitors did not show statistically significant correlation with the occurrence of severe AEs. Co-administration of drugs with extremely high protein binding (≥ 98%) may increase the risk of severe adverse reactions of fulvestrant. Meanwhile, age (60–74 years) may reduce the risk of severe AEs of fulvestrant. However, further clinical research is still needed to explore and verify whether there is interaction between fulvestrant and drugs with high protein binding through more clinical studies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. High Diversity of Long Terminal Repeat Retrotransposons in Compact Vertebrate Genomes: Insights from Genomes of Tetraodontiformes.

Author: Wang, Bingqing, Saleh, Ahmed A., Yang, Naisu, Asare, Emmanuel, Chen, Hong, Wang, Quan, Chen, Cai, Song, Chengyi, and Gao, Bo
Subjects: *RETROTRANSPOSONS, *PUFFERS (Fish), *VERTEBRATES, *DATA mining, *REVERSE transcriptase
Abstract: Simple Summary: Long terminal repeat retrotransposons (LTR-RTNs) are vital in genome evolution and diversity. The compact genomes of Tetraodontiformes provide an excellent model for studying LTR-RTN dynamics. An analysis of the genomes of ten tetraodontiform species revealed a total of 819 full-length LTR retrotransposon sequences classified into nine families spanning four distinct superfamilies. Among them, the Gypsy superfamily displayed the highest level of diversity. Takifugu stood out for having the highest abundance of LTR families and sequences. Evidence of recent LTR-RTN activity and multiple invasions was observed in specific tetraodontiform genomes. This investigation provides valuable insights into the evolution of LTR retrotransposons and their impact on the structure and evolution of compact tetraodontiform genomes. This study aimed to investigate the evolutionary profile (including diversity, activity, and abundance) of retrotransposons (RTNs) with long terminal repeats (LTRs) in ten species of Tetraodontiformes. These species, Arothron firmamentum, Lagocephalus sceleratus, Pao palembangensis, Takifugu bimaculatus, Takifugu flavidus, Takifugu ocellatus, Takifugu rubripes, Tetraodon nigroviridis, Mola mola, and Thamnaconus septentrionalis, are known for having the smallest genomes among vertebrates. Data mining revealed a high diversity and wide distribution of LTR retrotransposons (LTR-RTNs) in these compact vertebrate genomes, with varying abundances among species. A total of 819 full-length LTR-RTN sequences were identified across these genomes, categorized into nine families belonging to four different superfamilies: ERV (Orthoretrovirinae and Epsilon retrovirus), Copia, BEL-PAO, and Gypsy (Gmr, Mag, V-clade, CsRN1, and Barthez). The Gypsy superfamily exhibited the highest diversity. LTR family distribution varied among species, with Takifugu bimaculatus, Takifugu flavidus, Takifugu ocellatus, and Takifugu rubripes having the highest richness of LTR families and sequences. Additionally, evidence of recent invasions was observed in specific tetraodontiform genomes, suggesting potential transposition activity. This study provides insights into the evolution of LTR retrotransposons in Tetraodontiformes, enhancing our understanding of their impact on the structure and evolution of host genomes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Explainable AI: Machine Learning Interpretation in Blackcurrant Powders.

Author: Przybył, Krzysztof
Subjects: *MACHINE learning, *DEEP learning, *ARTIFICIAL intelligence, *RANDOM forest algorithms, *DECISION trees, *DATA mining, *POWDERS
Abstract: Recently, explainability in machine and deep learning has become an important area in the field of research as well as interest, both due to the increasing use of artificial intelligence (AI) methods and understanding of the decisions made by models. The explainability of artificial intelligence (XAI) is due to the increasing consciousness in, among other things, data mining, error elimination, and learning performance by various AI algorithms. Moreover, XAI will allow the decisions made by models in problems to be more transparent as well as effective. In this study, models from the 'glass box' group of Decision Tree, among others, and the 'black box' group of Random Forest, among others, were proposed to understand the identification of selected types of currant powders. The learning process of these models was carried out to determine accuracy indicators such as accuracy, precision, recall, and F1-score. It was visualized using Local Interpretable Model Agnostic Explanations (LIMEs) to predict the effectiveness of identifying specific types of blackcurrant powders based on texture descriptors such as entropy, contrast, correlation, dissimilarity, and homogeneity. Bagging (Bagging_100), Decision Tree (DT0), and Random Forest (RF7_gini) proved to be the most effective models in the framework of currant powder interpretability. The measures of classifier performance in terms of accuracy, precision, recall, and F1-score for Bagging_100, respectively, reached values of approximately 0.979. In comparison, DT0 reached values of 0.968, 0.972, 0.968, and 0.969, and RF7_gini reached values of 0.963, 0.964, 0.963, and 0.963. These models achieved classifier performance measures of greater than 96%. In the future, XAI using agnostic models can be an additional important tool to help analyze data, including food products, even online. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. DMAF-NET: Deep Multi-Scale Attention Fusion Network for Hyperspectral Image Classification with Limited Samples.

Author: Guo, Hufeng and Liu, Wenyi
Subjects: *IMAGE recognition (Computer vision), *PYRAMIDS, *DEEP learning, *CONVOLUTIONAL neural networks, *FEATURE extraction, *DATA mining
Abstract: In recent years, deep learning methods have achieved remarkable success in hyperspectral image classification (HSIC), and the utilization of convolutional neural networks (CNNs) has proven to be highly effective. However, there are still several critical issues that need to be addressed in the HSIC task, such as the lack of labeled training samples, which constrains the classification accuracy and generalization ability of CNNs. To address this problem, a deep multi-scale attention fusion network (DMAF-NET) is proposed in this paper. This network is based on multi-scale features and fully exploits the deep features of samples from multiple levels and different perspectives with an aim to enhance HSIC results using limited samples. The innovation of this article is mainly reflected in three aspects: Firstly, a novel baseline network for multi-scale feature extraction is designed with a pyramid structure and densely connected 3D octave convolutional network enabling the extraction of deep-level information from features at different granularities. Secondly, a multi-scale spatial–spectral attention module and a pyramidal multi-scale channel attention module are designed, respectively. This allows modeling of the comprehensive dependencies of coordinates and directions, local and global, in four dimensions. Finally, a multi-attention fusion module is designed to effectively combine feature mappings extracted from multiple branches. Extensive experiments on four popular datasets demonstrate that the proposed method can achieve high classification accuracy even with fewer labeled samples. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. RoUIE: A Method for Constructing Knowledge Graph of Power Equipment Based on Improved Universal Information Extraction.

Author: Ye, Zhenhao, Qi, Donglian, Liu, Hanlin, Yan, Yunfeng, Chen, Qihao, and Liu, Xiayu
Subjects: *KNOWLEDGE graphs, *DATA mining, *LANGUAGE models, *INTELLIGENCE levels, *INFORMATION storage & retrieval systems
Abstract: The current state evaluation of power equipment often focuses solely on changes in electrical quantities while neglecting basic equipment information as well as textual information such as system alerts, operation records, and defect records. Constructing a device-centric knowledge graph by extracting information from multiple sources related to power equipment is a valuable approach to enhance the intelligence level of asset management. Through the collection of pertinent authentic datasets, we have established a dataset for the state evaluation of power equipment, encompassing 35 types of relationships. To better suit the characteristics of concentrated relationship representations and varying lengths in textual descriptions, we propose a generative model called RoUIE, which is a method for constructing a knowledge graph of power equipment based on improved Universal Information Extraction (UIE). This model first utilizes a pre-trained language model based on rotational position encoding as the text encoder in the fine-tuning stage. Subsequently, we innovatively leverage the Distribution Focal Loss (DFL) to replace Binary Cross-Entropy Loss (BCE) as the loss function, further enhancing the model's extraction performance. The experimental results demonstrate that compared to the UIE model and mainstream joint extraction benchmark models, RoUIE exhibits superior performance on the dataset we constructed. On a general Chinese dataset, the proposed model also outperforms baseline models, showcasing the model's universal applicability. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Study on Early Identification of Rainfall-Induced Accumulation Landslide Hazards in the Three Gorges Reservoir Area.

Author: Wu, Zhen, Ye, Runqing, Yang, Shishi, Wen, Tianlong, Huang, Jue, and Chen, Yao
Subjects: *LANDSLIDES, *LANDSLIDE hazard analysis, *NATURAL disaster warning systems, *HAZARD mitigation, *EMERGENCY management, *SHEAR strength of soils, *RAINFALL, *GORGES, *DATA mining
Abstract: The early identification of potential hazards is crucial for landslide early warning and prevention and is a key focus and challenging issue in landslide disaster research. The challenges of traditional investigation and identification methods include identifying potential hazards of landslides triggered by heavy rainfall and mapping areas susceptible to landslides based on rainfall conditions. This article focuses on the problem of early identification of rainfall-induced accumulation landslide hazards and an early identification method is proposed, which is "first identifying the accumulation that is prone to landslides and then determining the associated rainfall conditions". This method is based on identifying the distribution and thickness of accumulation, analyzing the rainfall conditions that trigger landslides with varying characteristics, and establishing rainfall thresholds for landslides with different accumulation characteristics, ultimately aiming to achieve early identification of accumulation landslide hazards. In this study, we focus on the Zigui section of the Three Gorges Reservoir as study the area, and eight main factors that influence the distribution and thickness of accumulation are extracted from multi-source data, then the relative thickness information extraction model of accumulation is established by using the BP neural network method. The accumulation distribution and relative thickness map of the study area are generated, and the study area is divided into rocky area (less than 1 m), thin (1 to 5 m), medium (5 to 10 m), and thick area (thicker than 10 m) according to accumulation thickness. Rainfall is a significant trigger for landslide hazards. It increases the weight of the sliding mass and decreases the shear strength of soil and rock layers, thus contributing to landslide events. Data on 101 rainfall-induced accumulation landslides in the Three Gorges Reservoir area and rainfall data for the 10 days prior to each landslide event were collected. The critical rainfall thresholds corresponding to a 90% probability of landslide occurrence with different characteristics were determined using the I-D threshold curve method. Prediction maps of accumulation landslide hazards under various rainfall conditions were generated by analyzing the rainfall threshold for landslides in the Three Gorges Reservoir area, serving as a basis for early identification of rainfall-induced accumulation landslides in the region. The research provides a method for the early identification of landslides caused by heavy rainfall, delineating landslide hazards under different rainfall conditions, and providing a basis for scientific responses, work arrangements, and disaster prevention and mitigation of landslides caused by heavy rainfall. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Boosting HPC data analysis performance with the ParSoDA-Py library.

Author: Belcastro, Loris, Giampà, Salvatore, Marozzo, Fabrizio, Talia, Domenico, Trunfio, Paolo, Badia, Rosa M., Ejarque, Jorge, and Mammadli, Nihad
Subjects: *DATA analysis, *PYTHON programming language, *DATA mining, *DATA libraries, *LIBRARY technical services, *HIGH performance computing, *BIG data
Abstract: Developing and executing large-scale data analysis applications in parallel and distributed environments can be a complex and time-consuming task. Developers often find themselves diverted from their application logic to handle technical details about the underlying runtime and related issues. To simplify this process, ParSoDA, a Java library, has been proposed to facilitate the development of parallel data mining applications executed on HPC systems. It simplifies the process by providing built-in scalability mechanisms relying on the Hadoop and Spark frameworks. This paper presents ParSoDA-Py, the Python version of the ParSoDA library, which allows for further support of commonly used runtimes and libraries for big data analysis. After a complete library redesign, ParSoDA can be now easily integrated with other Python-based distributed runtimes for HPC systems, such as COMPSs and Apache Spark, and with the large ecosystem of Python-based data processing libraries. The paper discusses the adaptation process, which takes into consideration the new technical requirements, and evaluates both usability and scalability through some case study applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. An Ensemble Learning-Enhanced Smart Prediction Model for Financial Credit Risks.

Author: Zhang, Li and Wang, Lin
Subjects: *CREDIT risk, *ARTIFICIAL neural networks, *FINANCIAL risk, *BIG data, *CREDIT analysis, *PREDICTION models
Abstract: The credit risk assessment acts as an important part in daily affairs for financial institutions. But in the era of big data, the growing business volume makes it an urgent demand to develop digital ways of credit risk assessment. Currently, the machine learning is universally employed to establish various data-driven models for this purpose. However, machine learning models generally suffer from limited ability of feature representation and robustness, and cannot deal with more complex financial security scenarios. To deal with this issue, this work introduces ensemble learning to construct a stronger credit risk prediction model via integration of several basic machine learning models. Thus, an ensemble learning-enhanced smart prediction model for financial credit risk is proposed in this paper. Three classification-based machine learning models (support vector machine, artificial neural network and radial basis function) are selected as the basic classifiers, and "voting" strategy is utilized to integrate them into a novel strong classifier. A real-world financial credit dataset released by a Chinese commercial bank was selected as the experimental scenario. The obtained results show that the proposal has better prediction accuracy compared with basic machine learning models without ensemble learning. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. Dual Cross-Attention Multi-Stage Embedding Network for Low-Light Image Enhancement.

Author: Fan, Junyu, Li, Jinjiang, and Hua, Zhen
Subjects: *IMAGE intensifiers, *COMPUTER vision, *TRANSFORMER models, *DATA mining, *FEATURE extraction, *VIRTUAL networks, *SPEECH synthesis
Abstract: The low-light image enhancement task aims to improve the visibility of information in the dark to obtain more data and utilize it, while also improving the visual quality of the image. In this paper, we propose a dual cross-attention multi-stage embedding network (DCMENet) for fast and accurate enhancement of low-light images into high-quality images with high visibility. The problem that enhanced images tend to have more noise in them, which affects the image quality, is improved by introducing an attention mechanism in the encoder–decoder structure. In addition, the encoder–decoder can focus most of its attention on the dark areas of the image and better attend to the detailed features in the image that are obscured by the dark areas. In particular, the poor performance of the Transformer when the dataset size is small is solved by fusing the CNN-Attention and Transformer in the encoder. Considering the purpose of the low-light image enhancement task, we raise the importance of recovering image detail information to the same level as reconstructing the lighting. For features such as texture details in images, cascade extraction using spatial attention and pixel attention can reduce the model complexity while the performance is also improved. Finally, the global features obtained by the encoder–decoder are fused into the shallow feature extraction structure to reconstruct the illumination while guiding the network for the focused extraction of information in the dark. The proposed DCMENet achieves the best results in both objective quality assessment and subjective evaluation, while for the computer vision tasks working in low-light environments as well, the enhanced images using the DCMENet proposed in this paper show the best performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. Deep learning and big data mining for Metal–Organic frameworks with high performance for simultaneous desulfurization and carbon capture.

Author: Guan, Kexin, Xu, Fangyi, Huang, Xiaoshan, Li, Yu, Guo, Shuya, Situ, Yizhen, Chen, You, Hu, Jianming, Liu, Zili, Liang, Hong, Zhu, Xin, Wu, Yufang, and Qiao, Zhiwei
Subjects: *DEEP learning, *METAL-organic frameworks, *CARBON sequestration, *DATA mining, *MACHINE learning, *FLUE gas desulfurization
Abstract: [Display omitted] Carbon capture and desulfurization of flue gases are crucial for the achievement of carbon neutrality and sustainable development. In this work, the "one-step" adsorption technology with high-performance metal–organic frameworks (MOFs) was proposed to simultaneously capture the SO 2 and CO 2. Four machine learning algorithms were used to predict the performance indicators (N CO2+SO2 , S CO2+SO2/N2 , and TSN) of MOFs, with Multi-Layer Perceptron Regression (MLPR) showing better performance (R 2 = 0.93). To address sparse data of MOF chemical descriptors, we introduced the Deep Factorization Machines (DeepFM) model, outperforming MLPR with a higher R 2 of 0.95. Then, sensitivity analysis was employed to find that the adsorption heat and porosity were the key factors for SO 2 and CO 2 capture performance of MOF, while the influence of open alkali metal sites also stood out. Furthermore, we established a kinetic model to batch simulate the breakthrough curves of TOP 1000 MOFs to investigate their dynamic adsorption separation performance for SO 2 /CO 2 /N 2. The TOP 20 MOFs screened by the dynamic performance highly overlap with those screened by the static performance, with 76 % containing open alkali metal sites. This integrated approach of computational screening, machine learning, and dynamic analysis significantly advances the development of efficient MOF adsorbents for flue gas treatment. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. Development of an early warning system for higher education institutions by predicting first‐year student academic performance.

Author: Çırak, Cem Recai, Akıllı, Hakan, and Ekinci, Yeliz
Abstract: In this study, an early warning system predicting first‐year undergraduate student academic performance is developed for higher education institutions. The significant factors that affect first‐year student success are derived and discussed such that they can be used for policy developments by related bodies. The dataset used in experimental analyses includes 11,698 freshman students' data. The problem is constructed as classification models predicting whether a student will be successful or unsuccessful at the end of the first year. A total of 69 input variables are utilized in the models. Naive Bayes, decision tree and random forest algorithms are compared over model prediction performances. Random forest models outperformed others and reached 90.2% accuracy. Findings show that the models including the fall semester CGPA variable performed dramatically better. Moreover, the student's programme name and university placement exam score are identified as the other most significant variables. A critical discussion based on the findings is provided. The developed model may be used as an early warning system, such that necessary actions can be taken after the second week of the spring semester for students predicted to be unsuccessful to increase their success and prevent attrition. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. Context-aware knowledge selection and reliable model recommendation with ACCORDION.

Author: Ahmed, Yasmine, Telmer, Cheryl A., Gaoxiang Zhou, and Miskov-Zivanov, Natasa
Subjects: *EXPERT systems, *LITERARY sources, *GRAPH algorithms, *BIOLOGICAL systems, *NATURAL language processing
Abstract: New discoveries and knowledge are summarized in thousands of published papers per year per scientific domain, making it incomprehensible for scientists to account for all available knowledge relevant for their studies. In this paper, we present ACCORDION (ACCelerating and Optimizing model RecommenDatIONs), a novel methodology and an expert system that retrieves and selects relevant knowledge from literature and databases to recommend models with correct structure and accurate behavior, enabling mechanistic explanations and predictions, and advancing understanding. ACCORDION introduces an approach that integrates knowledge retrieval, graph algorithms, clustering, simulation, and formal analysis. Here, we focus on biological systems, although the proposed methodology is applicable in other domains. We used ACCORDION in nine benchmark case studies and compared its performance with other previously published tools. We show that ACCORDION is: comprehensive, retrieving relevant knowledge from a range of literature sources through machine reading engines; very effective, reducing the error of the initial baseline model by more than 80%, recommending models that closely recapitulate desired behavior, and outperforming previously published tools; selective, recommending only the most relevant, contextspecific, and useful subset (15%-20%) of candidate knowledge in literature; diverse, accounting for several distinct criteria to recommend more than one solution, thus enabling alternative explanations or intervention directions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Telehealth-Based Information Retrieval and Extraction for Analysis of Clinical Characteristics and Symptom Patterns in Mild COVID-19 Patients.

Author: Jahaj, Edison, Gallos, Parisis, Tziomaka, Melina, Kallipolitis, Athanasios, Pasias, Apostolos, Panagopoulos, Christos, Menychtas, Andreas, Dimopoulou, Ioanna, Kotanidou, Anastasia, Maglogiannis, Ilias, and Vassiliou, Alice Georgia
Subjects: *COVID-19, *MEDICAL care, *FEVER, *DATA mining, *INFORMATION retrieval, *SYMPTOMS
Abstract: Clinical characteristics of COVID-19 patients have been mostly described in hospitalised patients, yet most are managed in an outpatient setting. The COVID-19 pandemic transformed healthcare delivery models and accelerated the implementation and adoption of telemedicine solutions. We employed a modular remote monitoring system with multi-modal data collection, aggregation, and analytics features to monitor mild COVID-19 patients and report their characteristics and symptoms. At enrolment, the patients were equipped with wearables, which were associated with their accounts, provided the respective in-system consents, and, in parallel, reported the demographics and patient characteristics. The patients monitored their vitals and symptoms daily during a 14-day monitoring period. Vital signs were entered either manually or automatically through wearables. We enrolled 162 patients from February to May 2022. The median age was 51 (42–60) years; 44% were male, 22% had at least one comorbidity, and 73.5% were fully vaccinated. The vitals of the patients were within normal range throughout the monitoring period. Thirteen patients were asymptomatic, while the rest had at least one symptom for a median of 11 (7–16) days. Fatigue was the most common symptom, followed by fever and cough. Loss of taste and smell was the longest-lasting symptom. Age positively correlated with the duration of fatigue, anorexia, and low-grade fever. Comorbidities, the number of administered doses, the days since the last dose, and the days since the positive test did not seem to affect the number of sick days or symptomatology. The i-COVID platform allowed us to provide remote monitoring and reporting of COVID-19 outpatients. We were able to report their clinical characteristics while simultaneously helping reduce the spread of the virus through hospitals by minimising hospital visits. The monitoring platform also offered advanced knowledge extraction and analytic capabilities to detect health condition deterioration and automatically trigger personalised support workflows. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. A Survey of the Applications of Text Mining for the Food Domain.

Author: Xiong, Shufeng, Tian, Wenjie, Si, Haiping, Zhang, Guipei, and Shi, Lei
Subjects: *TEXT mining, *SENTIMENT analysis, *DATA mining, *MINE safety, *NUTRITIONAL requirements, *FOOD safety, *FOOD recall
Abstract: In the food domain, text mining techniques are extensively employed to derive valuable insights from large volumes of text data, facilitating applications such as aiding food recalls, offering personalized recipes, and reinforcing food safety regulation. To provide researchers and practitioners with a comprehensive understanding of the latest technology and application scenarios of text mining in the food domain, the pertinent literature is reviewed and analyzed. Initially, the fundamental concepts, principles, and primary tasks of text mining, encompassing text categorization, sentiment analysis, and entity recognition, are elucidated. Subsequently, an analysis of diverse types of data sources within the food domain and the characteristics of text data mining is conducted, spanning social media, reviews, recipe websites, and food safety reports. Furthermore, the applications of text mining in the food domain are scrutinized from the perspective of various scenarios, including leveraging consumer food reviews and feedback to enhance product quality, providing personalized recipe recommendations based on user preferences and dietary requirements, and employing text mining for food safety and fraud monitoring. Lastly, the opportunities and challenges associated with the adoption of text mining techniques in the food domain are summarized and evaluated. In conclusion, text mining holds considerable potential for application in the food domain, thereby propelling the advancement of the food industry and upholding food safety standards. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. A Joint Communication and Computation Design for Probabilistic Semantic Communications.

Author: Zhao, Zhouxiang, Yang, Zhaohui, Chen, Mingzhe, Zhang, Zhaoyang, and Poor, H. Vincent
Subjects: *KNOWLEDGE graphs, *NONSMOOTH optimization, *EXTRACTION techniques, *RESOURCE allocation, *DATA mining, *MULTIPLE access protocols (Computer network protocols), *MIMO systems, *DATA extraction
Abstract: In this paper, the problem of joint transmission and computation resource allocation for a multi-user probabilistic semantic communication (PSC) network is investigated. In the considered model, users employ semantic information extraction techniques to compress their large-sized data before transmitting them to a multi-antenna base station (BS). Our model represents large-sized data through substantial knowledge graphs, utilizing shared probability graphs between the users and the BS for efficient semantic compression. The resource allocation problem is formulated as an optimization problem with the objective of maximizing the sum of the equivalent rate of all users, considering the total power budget and semantic resource limit constraints. The computation load considered in the PSC network is formulated as a non-smooth piecewise function with respect to the semantic compression ratio. To tackle this non-convex non-smooth optimization challenge, a three-stage algorithm is proposed, where the solutions for the received beamforming matrix of the BS, the transmit power of each user, and the semantic compression ratio of each user are obtained stage by stage. The numerical results validate the effectiveness of our proposed scheme. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

47. A Spatiotemporal Probabilistic Graphical Model Based on Adaptive Expectation-Maximization Attention for Individual Trajectory Reconstruction Considering Incomplete Observations.

Author: Sun, Xuan, Guo, Jianyuan, Qin, Yong, Zheng, Xuanchuan, Xiong, Shifeng, He, Jie, Sun, Qi, and Jia, Limin
Subjects: *EXPECTATION-maximization algorithms, *MAXIMUM likelihood statistics, *GRAPHICAL modeling (Statistics), *MISSING data (Statistics), *DATA mining
Abstract: Spatiotemporal information on individual trajectories in urban rail transit is important for operational strategy adjustment, personalized recommendation, and emergency command decision-making. However, due to the lack of journey observations, it is difficult to accurately infer unknown information from trajectories based only on AFC and AVL data. To address the problem, this paper proposes a spatiotemporal probabilistic graphical model based on adaptive expectation maximization attention (STPGM-AEMA) to achieve the reconstruction of individual trajectories. The approach consists of three steps: first, the potential train alternative set and the egress time alternative set of individuals are obtained through data mining and combinatorial enumeration. Then, global and local potential variables are introduced to construct a spatiotemporal probabilistic graphical model, provide the inference process for unknown events, and state information about individual trajectories. Further, considering the effect of missing data, an attention mechanism-enhanced expectation-maximization algorithm is proposed to achieve maximum likelihood estimation of individual trajectories. Finally, typical datasets of origin-destination pairs and actual individual trajectory tracking data are used to validate the effectiveness of the proposed method. The results show that the STPGM-AEMA method is more than 95% accurate in recovering missing information in the observed data, which is at least 15% more accurate than the traditional methods (i.e., PTAM-MLE and MPTAM-EM). [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Damage Severity Assessment of Multi-Layer Complex Structures Based on a Damage Information Extraction Method with Ladder Feature Mining.

Author: Tu, Jiajie, Yan, Jiajia, Ji, Xiaojin, Liu, Qijian, and Qing, Xinlin
Subjects: *DATA mining, *DEBONDING, *LAMB waves, *SILICONE rubber, *STRUCTURAL engineering
Abstract: Multi-layer complex structures are widely used in large-scale engineering structures because of their diverse combinations of properties and excellent overall performance. However, multi-layer complex structures are prone to interlaminar debonding damage during use. Therefore, it is necessary to monitor debonding damage in engineering applications to determine structural integrity. In this paper, a damage information extraction method with ladder feature mining for Lamb waves is proposed. The method is able to optimize and screen effective damage information through ladder-type damage extraction. It is suitable for evaluating the severity of debonding damage in aluminum-foamed silicone rubber, a novel multi-layer complex structure. The proposed method contains ladder feature mining stages of damage information selection and damage feature fusion, realizing a multi-level damage information extraction process from coarse to fine. The results show that the accuracy of damage severity assessment by the damage information extraction method with ladder feature mining is improved by more than 5% compared to other methods. The effectiveness and accuracy of the method in assessing the damage severity of multi-layer complex structures are demonstrated, providing a new perspective and solution for damage monitoring of multi-layer complex structures. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Big data research in nursing: A bibliometric exploration of themes and publications.

Author: Li, Bo, Du, Kun, Qu, Guanchen, and Tang, Naifu
Subjects: *SERIAL publications, *DATA science, *DATA mining, *RESEARCH funding, *PREDICTION models, *ARTIFICIAL intelligence, *SOFTWARE analytics, *NATURAL language processing, *DESCRIPTIVE statistics, *THEMATIC analysis, *NURSING research, *BIBLIOMETRICS, *DEEP learning, *MACHINE learning, *DATA analysis software, *CLOUD computing, *INTERNET of things
Abstract: Aims: To comprehend the current research hotspots and emerging trends in big data research within the global nursing domain. Design: Bibliometric analysis. Methods: The quality articles for analysis indexed by the science core collection were obtained from the Web of Science database as of February 10, 2023.The descriptive, visual analysis and text mining were realized by CiteSpace and VOSviewer. Results: The research on big data in the nursing field has experienced steady growth over the past decade. A total of 45 core authors and 17 core journals around the world have contributed to this field. The author's keyword analysis has revealed five distinct clusters of research focus. These encompass machine/deep learning and artificial intelligence, natural language processing, big data analytics and data science, IoT and cloud computing, and the development of prediction models through data mining. Furthermore, a comparative examination was conducted with data spanning from 1980 to 2016, and an extended analysis was performed covering the years from 1980 to 2019. This bibliometric mapping comparison allowed for the identification of prevailing research trends and the pinpointing of potential future research hotspots within the field. Conclusions: The fusion of data mining and nursing research has steadily advanced and become more refined over time. Technologically, it has expanded from initial natural language processing to encompass machine learning, deep learning, artificial intelligence, and data mining approach that amalgamates multiple technologies. Professionally, it has progressed from addressing patient safety and pressure ulcers to encompassing chronic diseases, critical care, emergency response, community and nursing home settings, and specific diseases (Cardiovascular diseases, diabetes, stroke, etc.). The convergence of IoT, cloud computing, fog computing, and big data processing has opened new avenues for research in geriatric nursing management and community care. However, a global imbalance exists in utilizing big data in nursing research, emphasizing the need to enhance data science literacy among clinical staff worldwide to advance this field. Clinical Relevance: This study focused on the thematic trends and evolution of research on the big data in nursing research. Moreover, this study may contribute to the understanding of researchers, journals, and countries around the world and generate the possible collaborations of them to promote the development of big data in nursing science. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Updates to the Alliance of Genome Resources central infrastructure.

Author: Consortium, The Alliance of Genome Resources
Subjects: *BIOLOGICAL models, *DATABASES, *COMPUTER software, *DATA mining, *DATABASE management, *DATA curation, *ARTIFICIAL intelligence, *INFORMATION resources, *FISHES, *PROFESSIONS, *MICE, *RATS, *INFORMATION services, *INFORMATION retrieval, *CAENORHABDITIS elegans, *INSECTS, *ONTOLOGIES (Information retrieval), *MACHINE learning, *GENOMES, *GENETICS, *YEAST, *ANURA
Abstract: The Alliance of Genome Resources (Alliance) is an extensible coalition of knowledgebases focused on the genetics and genomics of intensively studied model organisms. The Alliance is organized as individual knowledge centers with strong connections to their research communities and a centralized software infrastructure, discussed here. Model organisms currently represented in the Alliance are budding yeast, Caenorhabditis elegans , Drosophila , zebrafish, frog, laboratory mouse, laboratory rat, and the Gene Ontology Consortium. The project is in a rapid development phase to harmonize knowledge, store it, analyze it, and present it to the community through a web portal, direct downloads, and application programming interfaces (APIs). Here, we focus on developments over the last 2 years. Specifically, we added and enhanced tools for browsing the genome (JBrowse), downloading sequences, mining complex data (AllianceMine), visualizing pathways, full-text searching of the literature (Textpresso), and sequence similarity searching (SequenceServer). We enhanced existing interactive data tables and added an interactive table of paralogs to complement our representation of orthology. To support individual model organism communities, we implemented species-specific "landing pages" and will add disease-specific portals soon; in addition, we support a common community forum implemented in Discourse software. We describe our progress toward a central persistent database to support curation, the data modeling that underpins harmonization, and progress toward a state-of-the-art literature curation system with integrated artificial intelligence and machine learning (AI/ML). [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

28,070 results on '"DATA mining"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources