Database: 3 selected / Topic: 3 selected - Searchworks@Jio Institute Digital Library Search Results

Showing total 594 results

Start Over Topic algorithms Topic data mining Topic machine learning Database Complementary Index Database Gale General OneFile Database Supplemental Index

594 results

1. Investigators from Midwest Orthopaedics at Rush Target Machine Learning (Paper 19: Evidence-based Machine Learning Algorithm To Predict Failure Following Cartilage Preservation Procedures In the Knee)

Subjects: Papermaking machinery, Machine learning, Data mining, Algorithms, Data warehousing/data mining, Algorithm, Health, Health care industry
Abstract: 2023 MAY 28 (NewsRx) -- By a News Reporter-Staff News Editor at Medical Devices & Surgical Technology Week -- Fresh data on Machine Learning are presented in a new report. [...]
Published: 2023

2. NEURIPS PAPERS AIM TO IMPROVE UNDERSTANDING AND ROBUSTNESS OF MACHINE LEARNING ALGORITHMS

Subjects: Data mining, Algorithms, Machine learning, Pellet fusion, Data warehousing/data mining, Algorithm, News, opinion and commentary
Abstract: LIVERMORE, CA -- The following information was released by Lawrence Livermore National Laboratory (LLNL): The 34th Conference on Neural Information Processing Systems (NeurIPS) is featuring two papers advancing the reliability [...]
Published: 2020

3. BSI issues position paper on the emergence of artificial intelligence and machine learning algorithms in healthcare

Subjects: Algorithms, Data mining, Artificial intelligence, Professional associations, Machine learning, Medical equipment, Business, international, Association for the Advancement of Medical Instrumentation
Abstract: London: The British Standards Institution has issued the following news release:BSI, the business standards company, has undertaken research in collaboration with the US standards organization for medical devices, the Association [...]
Published: 2019

4. Research on Chinese Medical Entity Recognition Based on Multi-Neural Network Fusion and Improved Tri-Training Algorithm.

Author: Qi, Renlong, Lv, Pengtao, Zhang, Qinghui, and Wu, Meng
Subjects: SUPERVISED learning, CONVOLUTIONAL neural networks, MEDICAL informatics, DATA mining, ALGORITHMS, MACHINE learning, MEDICAL research
Abstract: Chinese medical texts contain a large number of medically named entities. Automatic recognition of these medical entities from medical texts is the key to developing medical informatics. In the field of Chinese medical information extraction, annotated Chinese medical text data are very few. In the named entity recognition task, there is insufficient labeled data, which leads to low model recognition performance. Therefore, this paper proposes a Chinese medical entity recognition model based on multi-neural network fusion and the improved Tri-Training algorithm. The model performs semi-supervised learning by improving the Tri-Training algorithm. According to the characteristics of the medical entity recognition task and medical data, the method in this paper is improved in terms of the division of the initial sub-training set, the construction of the base classifier, and the integration of the learning voting method. In addition, this paper also proposes a multi-neural network fusion entity recognition model for base classifier construction. The model learns feature information jointly by combining Iterated Dilated Convolutional Neural Network (IDCNN) and BiLSTM. Through experimental verification, the model proposed in this paper outperforms other models and improves the performance of the Chinese medical entity recognition model by incorporating and improving the semi-supervised learning algorithm. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

5. Systematic review of content analysis algorithms based on deep neural networks.

Author: Rezaeenour, Jalal, Ahmadi, Mahnaz, Jelodar, Hamed, and Shahrooei, Roshan
Subjects: ARTIFICIAL neural networks, DEEP learning, MACHINE learning, INFORMATION technology, NATURAL language processing, ALGORITHMS
Abstract: Today according to social media, the internet, Etc. Data is rapidly produced and occupies a large space in systems that have resulted in enormous data warehouses; the progress in information technology has significantly increased the speed and ease of data flow.text mining is one of the most important methods for extracting a useful model through extracting and adapting knowledge from data sets. However, many studies have been conducted based on the usage of deep learning for text processing and text mining issues.The idea and method of text mining are one of the fields that seek to extract useful information from unstructured textual data that is used very today. Deep learning and machine learning techniques in classification and text mining and their type are discussed in this paper as well. Neural networks of various kinds, namely, ANN, RNN, CNN, and LSTM, are the subject of study to select the best technique. In this study, we conducted a Systematic Literature Review to extract and associate the algorithms and features that have been used in this area. Based on our search criteria, we retrieved 130 relevant studies from electronic databases between 1997 and 2021; we have selected 43 studies for further analysis using inclusion and exclusion criteria in Section 3.2. According to this study, hybrid LSTM is the most widely used deep learning algorithm in these studies, and SVM in machine learning method high accuracy in result shown. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

6. High-Dimensional Data Analysis Using Parameter Free Algorithm Data Point Positioning Analysis.

Author: Mustapha, S. M. F. D. Syed
Subjects: PATTERN recognition systems, DATA analysis, K-means clustering, DATA mining, ALGORITHMS, MACHINE learning, HIGH-dimensional model representation
Abstract: Clustering is an effective statistical data analysis technique; it has several applications, including data mining, pattern recognition, image analysis, bioinformatics, and machine learning. Clustering helps to partition data into groups of objects with distinct characteristics. Most of the methods for clustering use manually selected parameters to find the clusters from the dataset. Consequently, it can be very challenging and time-consuming to extract the optimal parameters for clustering a dataset. Moreover, some clustering methods are inadequate for locating clusters in high-dimensional data. To address these concerns systematically, this paper introduces a novel selection-free clustering technique named data point positioning analysis (DPPA). The proposed method is straightforward since it calculates 1-NN and Max-NN by analyzing the data point placements without the requirement of an initial manual parameter assignment. This method is validated using two well-known publicly available datasets used in several clustering algorithms. To compare the performance of the proposed method, this study also investigated four popular clustering algorithms (DBSCAN, affinity propagation, Mean Shift, and K-means), where the proposed method provides higher performance in finding the cluster without using any manually selected parameters. The experimental finding demonstrated that the proposed DPPA algorithm is less time-consuming compared to the existing traditional methods and achieves higher performance without using any manually selected parameters. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. A differential privacy protecting K-means clustering algorithm based on contour coefficients.

Author: Zhang, Yaling, Liu, Na, and Wang, Shangping
Subjects: K-means clustering, INFORMATION storage & retrieval systems, MACHINE learning, COMPUTER algorithms, DATA analysis
Abstract: This paper, based on differential privacy protecting K-means clustering algorithm, realizes privacy protection by adding data-disturbing Laplace noise to cluster center point. In order to solve the problem of Laplace noise randomness which causes the center point to deviate, especially when poor availability of clustering results appears because of small privacy budget parameters, an improved differential privacy protecting K-means clustering algorithm was raised in this paper. The improved algorithm uses the contour coefficients to quantitatively evaluate the clustering effect of each iteration and add different noise to different clusters. In order to be adapted to the huge number of data, this paper provides an algorithm design in MapReduce Framework. Experimental finding shows that the new algorithm improves the availability of the algorithm clustering results under the condition of ensuring individual privacy without significantly increasing its operating time. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

8. The Potential of MicroRNAs as Non-Invasive Prostate Cancer Biomarkers: A Systematic Literature Review Based on a Machine Learning Approach.

Author: Bevacqua, Emilia, Ammirato, Salvatore, Cione, Erika, Curcio, Rosita, Dolce, Vincenza, and Tucci, Paola
Subjects: PROSTATE tumors treatment, DISEASE progression, SYSTEMATIC reviews, MICRORNA, EARLY detection of cancer, MACHINE learning, TUMOR markers, DATA analytics, PROSTATE tumors, DATA mining, ALGORITHMS
Abstract: Simple Summary: Prostate cancer (PCa) is the most common cancer in men worldwide. Screening and diagnosis are based on prostate-specific antigen (PSA) blood testing and digital rectal examination. Nevertheless, these methods are not specific and have a high risk of mistaken results. This has led to overtreatment and unnecessary radical therapy; thus, better prognostic tools are urgently needed. In this view, microRNAs (miRs) appear as potential non-invasive biomarkers for PCa diagnosis, prognosis, and therapy. As the scientific literature available in this field is huge and very often controversial, we identified and discussed three topics that characterize the investigated research area by combining the big data from the literature together with a novel machine learning approach. By analyzing the papers clustered into these topics we have offered a deeper understanding of the current research, which helps to contribute to the advancement of this research field. Background: Prostate cancer (PCa) is the second leading cause of cancer-related deaths in men. Although the prostate-specific antigen (PSA) test is used in clinical practice for screening and/or early detection of PCa, it is not specific, thus resulting in high false-positive rates. MicroRNAs (miRs) provide an opportunity as biomarkers for diagnosis, prognosis, and recurrence of PCa. Because the size of the literature on it is increasing and often controversial, this study aims to consolidate the state-of-art of relevant published research. Methods: A Systematic Literature Review (SLR) approach was applied to analyze a set of 213 scientific publications through a text mining method that makes use of the Latent Dirichlet Allocation (LDA) algorithm. Results and Conclusions: The result of this activity, performed through the MySLR digital platform, allowed us to identify a set of three relevant topics characterizing the investigated research area. We analyzed and discussed all the papers clustered into them. We highlighted that several miRs are associated with PCa progression, and that their detection in patients' urine seems to be the more reliable and promising non-invasive tool for PCa diagnosis. Finally, we proposed some future research directions to help future scientists advance the field further. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

9. Digital‐first assessments: A security framework.

Author: LaFlair, Geoffrey T., Langenfeld, Thomas, Baig, Basim, Horie, André Kenji, Attali, Yigal, and von Davier, Alina A.
Subjects: NATIONAL competency-based educational tests, COMPUTER software, ENGLISH language, RESEARCH evaluation, DIGITAL technology, MACHINE learning, LEARNING, ENGINEERING, PSYCHOMETRICS, DATA security, AUTOMATION, QUALITY assurance, CERTIFICATION, PROFESSIONAL licensure examinations, DATA mining, ALGORITHMS
Abstract: Background: Digital‐first assessments leverage the affordances of technology in all elements of the assessment process: from design and development to score reporting and evaluation to create test taker‐centric assessments. Objectives: The goal of this paper is to describe the engineering, machine learning, and psychometric processes and technologies of a test security framework (part of a larger ecosystem; Burstein et al., 2021) that can be used to create systems that protect the integrity of test scores. Methods: We use the Duolingo English Test to exemplify the processes and technologies that are presented. This includes methods for actively detecting and deterring malicious behaviour (e.g., a custom desktop app). It also includes methods for passively detecting and deterring malicious behaviour (e.g., a large item bank created through automatic generation methods). We describe the certification process that each test administration undergoes, which includes both automated and human review. Additionally, we describe our quality assurance dashboard which leverages psychometric data mining techniques to monitor test quality and inform decisions about item pool maintenance. Results and Conclusions: As assessment developers transition to online delivery and to a design approach that places the test taker at the centre, it becomes increasingly important to take advantage of the tools and methodological advances in different fields (e.g., engineering, machine learning, psychometrics). These tools and methods are essential to maintaining the security of assessments so that the score reliability is sustained and the interpretations and uses of test scores remain valid. Lay Description: What is known about this topic?: As more and more testing programmes transition to test taker‐centric administrations, effective measures to prevent cheating and protect content are critical to ensure the validity and integrity of scores.Two of the most common forms of cheating in online testing are (a) having someone other than the person who has registered take the test, and (b) stealing content and providing it to others to assist them in achieving a higher score. What does this paper add?: In designing a test taker‐centric digital‐first assessment, a test security framework must inform decisions from end‐to‐end (i.e., registration, onboarding, communications regarding test taker behaviours, test preparation and practice, test administration, and post‐administrative activities including scoring).Security is safeguarded through active and passive design methods; active methods include having test takers attest that they will follow the rules governing testing and informing them that they will be videoed during testing; passive methods include a computer adaptive design that limits item exposure and test overlap rates, development of a large item pool using automated item generation, and applying artificial intelligence to review test administration videos to flag unauthorized behaviours for human review. Implications: With more educational and assessment programmes transitioning to online digital models, the paper presents a comprehensive review of security issues and identifies an integrated approach for preventing cheating and other unauthorized behaviours. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

10. Hybrid Clustering Algorithm Based on Improved Density Peak Clustering.

Author: Guo, Limin, Qin, Weijia, Cai, Zhi, and Su, Xing
Subjects: MACHINE learning, DENSITY, ALGORITHMS, BIG data
Abstract: In the era of big data, unsupervised learning algorithms such as clustering are particularly prominent. In recent years, there have been significant advancements in clustering algorithm research. The Clustering by Density Peaks algorithm is known as Clustering by Fast Search and Find of Density Peaks (density peak clustering). This clustering algorithm, proposed in Science in 2014, automatically finds cluster centers. It is simple, efficient, does not require iterative computation, and is suitable for large-scale and high-dimensional data. However, DPC and most of its refinements have several drawbacks. The method primarily considers the overall structure of the data, often resulting in the oversight of many clusters. The choice of truncation distance affects the calculation of local density values, and varying dataset sizes may necessitate different computational methods, impacting the quality of clustering results. In addition, the initial assignment of labels can cause a 'chain reaction', i.e., if one data point is incorrectly labeled, it may lead to more subsequent data points being incorrectly labeled. In this paper, we propose an improved density peak clustering method, DPC-MS, which uses the mean-shift algorithm to find local density extremes, making the accuracy of the algorithm independent of the parameter dc. After finding the local density extreme points, the allocation strategy of the DPC algorithm is employed to assign the remaining points to appropriate local density extreme points, forming the final clusters. The robustness of this method in handling uncertain dataset sizes adds some application value, and several experiments were conducted on synthetic and real datasets to evaluate the performance of the proposed method. The results show that the proposed method outperforms some of the more recent methods in most cases. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. How Recommendation Algorithms Know What You'll Like.

Author: Vitanova, Mirjana Kocaleva, Miteva, Marija, Gelova, Elena Karamazova, and Zlatanovska, Biljana
Subjects: ALGORITHMS, INTERNET stores, DATA mining, MACHINE learning, ONLINE shopping, ONLINE algorithms
Abstract: One of the most used statistical techniques that include machine learning and data mining for predicting future outcomes with help of data that already exist is known as predictive algorithm. Predictive models are not stable, and they build assumption based on past and present actions. In the paper we are going to introduce Amazon online store and how algorithms know what we like, so they can recommend products to us by their own. One of the biggest innovations in online shopping - first introduced by Amazon - was automatic recommendation generation. The more accurate prediction algorithms are, the more online stores will sell their products. For that reason, prediction algorithms are of great significance for online stores. [ABSTRACT FROM AUTHOR]
Published: 2023

12. Standardising Breast Radiotherapy Structure Naming Conventions: A Machine Learning Approach.

Author: Haidar, Ali, Field, Matthew, Batumalai, Vikneswary, Cloak, Kirrily, Al Mouiee, Daniel, Chlap, Phillip, Huang, Xiaoshui, Chin, Vicky, Aly, Farhannah, Carolan, Martin, Sykes, Jonathan, Vinod, Shalini K., Delaney, Geoffrey P., and Holloway, Lois
Subjects: SPECIALTY hospitals, HUMAN body, MACHINE learning, RETROSPECTIVE studies, ARTIFICIAL intelligence, CANCER treatment, TERMS & phrases, RESEARCH funding, RADIOTHERAPY, DATA analysis, ARTIFICIAL neural networks, RECEIVER operating characteristic curves, THREE-dimensional printing, BREAST tumors, ONCOLOGY, ALGORITHMS, LONGITUDINAL method, RADIATION dosimetry, DATA mining
Abstract: Simple Summary: In radiotherapy treatment, organs at risk and target volumes are contoured by the clinicians to prepare a dosimetry plan. In retrospective data, these structures are not often standardised to universal names across the patients plans, which is required to enable data mining and analysis. In this paper, a new method was proposed and evaluated to automatically standardise radiotherapy structures names using machine learning algorithms. The proposed approach was deployed over a dataset with 1613 patients collected from Liverpool & Macarthur Cancer Therapy Centres, New South Wales, Australia. It was concluded that machine learning techniques can standardise the dosimetry plan structures, taking into consideration the integration of multiple modalities representing each structure during the training process. In progressing the use of big data in health systems, standardised nomenclature is required to enable data pooling and analyses. In many radiotherapy planning systems and their data archives, target volumes (TV) and organ-at-risk (OAR) structure nomenclature has not been standardised. Machine learning (ML) has been utilised to standardise volumes nomenclature in retrospective datasets. However, only subsets of the structures have been targeted. Within this paper, we proposed a new approach for standardising all the structures nomenclature by using multi-modal artificial neural networks. A cohort consisting of 1613 breast cancer patients treated with radiotherapy was identified from Liverpool & Macarthur Cancer Therapy Centres, NSW, Australia. Four types of volume characteristics were generated to represent each target and OAR volume: textual features, geometric features, dosimetry features, and imaging data. Five datasets were created from the original cohort, the first four represented different subsets of volumes and the last one represented the whole list of volumes. For each dataset, 15 sets of combinations of features were generated to investigate the effect of using different characteristics on the standardisation performance. The best model reported 99.416% classification accuracy over the hold-out sample when used to standardise all the nomenclatures in a breast cancer radiotherapy plan into 21 classes. Our results showed that ML based automation methods can be used for standardising naming conventions in a radiotherapy plan taking into consideration the inclusion of multiple modalities to better represent each volume. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

13. A STUDY ON OPTIMIZING ERROR DETECTION AND CORRECTION STRATEGIES IN PHYSICAL EDUCATION AND SPORT TEACHING USING DATA MINING ALGORITHMS.

Author: ZIYAO GAO, SHENGFEI HU, GUO YU, and YINHUI LI
Subjects: DATA mining, PHYSICAL education, PHYSICAL training & conditioning, MACHINE learning, ALGORITHMS, SPORTS psychology
Abstract: In the fiercely competitive realm of sports and physical education, the application of data mining algorithms has emerged as a vital solution. Machine learning has streamlined processes, offering a seamless means of elevating the quality of education and training provided to students, particularly in the context of sports. This technological support empowers the sports education system to make more informed decisions pertaining to the physical development of aspiring athletes. In this comprehensive study, a blended approach of qualitative methods has been leveraged to gather intricate insights, enriching the overall understanding of the subject. Additionally, an in-depth exploration of articles and journals has been undertaken to scrutinize the practical implementation of data algorithm techniques geared towards enhancing physical training. The resultant findings underscore a substantial and tangible nexus between data algorithms and the domain of sports education. Of paramount significance is the central role played by data mining algorithms in augmenting performance. Notably, the National Sports Board (NSB) has extensively harnessed this technology to meticulously monitor players' on-field performance, ultimately leading to a granular comprehension of each player's capabilities. This paper emphasizes the methods of optimizing mistake detection and its joining systems for increasing the punishment in the operational procedures. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

14. Algorithm selection using edge ML and case-based reasoning.

Author: Ali, Rahman, Zada, Muhammad Sadiq Hassan, Khatak, Asad Masood, and Hussain, Jamil
Subjects: CASE-based reasoning, DECISION trees, CLASSIFICATION algorithms, MACHINE learning, ALGORITHMS, DATA mining, EMPIRICAL research, FEATURE extraction
Abstract: In practical data mining, a wide range of classification algorithms is employed for prediction tasks. However, selecting the best algorithm poses a challenging task for machine learning practitioners and experts, primarily due to the inherent variability in the characteristics of classification problems, referred to as datasets, and the unpredictable performance of these algorithms. Dataset characteristics are quantified in terms of meta-features, while classifier performance is evaluated using various performance metrics. The assessment of classifiers through empirical methods across multiple classification datasets, while considering multiple performance metrics, presents a computationally expensive and time-consuming obstacle in the pursuit of selecting the optimal algorithm. Furthermore, the scarcity of sufficient training data, denoted by dimensions representing the number of datasets and the feature space described by meta-feature perspectives, adds further complexity to the process of algorithm selection using classical machine learning methods. This research paper presents an integrated framework called eML-CBR that combines edge edge-ML and case-based reasoning methodologies to accurately address the algorithm selection problem. It adapts a multi-level, multi-view case-based reasoning methodology, considering data from diverse feature dimensions and the algorithms from multiple performance aspects, that distributes computations to both cloud edges and centralized nodes. On the edge, the first-level reasoning employs machine learning methods to recommend a family of classification algorithms, while at the second level, it recommends a list of the top-k algorithms within that family. This list is further refined by an algorithm conflict resolver module. The eML-CBR framework offers a suite of contributions, including integrated algorithm selection, multi-view meta-feature extraction, innovative performance criteria, improved algorithm recommendation, data scarcity mitigation through incremental learning, and an open-source CBR module, reshaping research paradigms. The CBR module, trained on 100 datasets and tested with 52 datasets using 9 decision tree algorithms, achieved an accuracy of 94% for correct classifier recommendations within the top k=3 algorithms, making it highly suitable for practical classification applications. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

15. Improve Quality and Efficiency of Textile Process using Data-driven Machine Learning in Industry 4.0.

Author: Chia-Yun Lee, Jia-Ying Lin, and Ray-I Chang
Subjects: MACHINE learning, BIG data, ARTIFICIAL intelligence, DATA mining, ALGORITHMS
Abstract: The capabilities of self-awareness, self-prediction, and self-maintenance are important for textile factory in Industry 4.0. One of the most important issues is to intellectualize the way of setting operation parameters as the cyber-physical system (CPS), instead of using traditional trial and error method. To achieve these goals, this paper focuses on the relationship between key operation parameter and defect for machine learning to design an operation parameters recommender system (OPRS) in the textile industry. From the perspective of data science, this paper in- tegrates historic manufacturing process data, such as machine operation parameters from warping, sizing, beaming and weaving process, and management experience data, such as textile inspection results from quality control section. Then, the regression models are applied to predict the textile operation parameters. This research also uses the clas- sification models to predict the quality of textile. Based on the ten-fold cross-validation testing, experimental results show that our model can achieve 90.8% accuracy on quality level prediction and the best regression model for predict- ing weaving operation parameters can reduce the mean square error (MSE) to 0.01%. By combining the above two models, proposed OPRS can provide a completed analysis data of operation parameters. It provides good performance when comparing with previous stochastic methods. As the proposed OPRS can support technician setting operation parameters more precisely even for a new type of yarn, it can help to fix the tech skills gap in the textile manufacturing process. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

16. A reconstruction method for cross-cut shredded documents based on the extreme learning machine algorithm.

Author: Zhang, Zhenghui, Zou, Juan, Yang, Shengxiang, Zheng, Jinhua, Gong, Dunwei, and Pei, Tingrui
Subjects: MACHINE learning, DATA mining, INFORMATION technology security, DISTRIBUTED algorithms, ALGORITHMS, COMPUTER assisted language instruction
Abstract: Reconstruction of cross-cut shredded text documents (RCCSTD) has important applications for information security and judicial evidence collection. The traditional method of manual construction is a very time-consuming task, so the use of computer-assisted efficient reconstruction is a crucial research topic. Fragment consensus information extraction and fragment pair compatibility measurement are two fundamental processes in RCCSTD. Due to the limitations of the existing classical methods of these two steps, only documents with specific structures or characteristics can be spliced, and pairing error is larger when the cutting is more fine-grained. In order to reconstruct the fragments more effectively, this paper improves the extraction method for consensus information and constructs a new global pairwise compatibility measurement model based on the extreme learning machine algorithm. The purpose of the algorithm's design is to exploit all available information and computationally suggest matches to increase the algorithm's ability to discriminate between data in various complex situations, then find the best neighbor of each fragment for splicing according to pairwise compatibility. The overall performance of our approach in several practical experiments is illustrated. The results indicate that the matching accuracy of the proposed algorithm is better than that of the previously published classical algorithms and still ensures a higher matching accuracy in the noisy datasets, which can provide a feasible method for RCCSTD intelligent systems in real scenarios. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

17. An Adaptive Bandwidth Management Algorithm for Next-Generation Vehicular Networks.

Author: Huang, Chenn-Jung, Hu, Kai-Wen, and Cheng, Hao-Wen
Subjects: TERAHERTZ technology, BANDWIDTH allocation, IN-vehicle entertainment equipment, OPTICAL communications, BANDWIDTHS, ALGORITHMS, VEHICULAR ad hoc networks, NEXT generation networks
Abstract: The popularity of video services such as video call or video on-demand has made it impossible for people to live without them in their daily lives. It can be anticipated that the explosive growth of vehicular communication owing to the widespread use of in-vehicle video infotainment applications in the future will result in increasing fragmentation and congestion of the wireless transmission spectrum. Accordingly, effective bandwidth management algorithms are demanded to achieve efficient communication and stable scalability in next-generation vehicular networks. To the best of our current knowledge, a noticeable gap remains in the existing literature regarding the application of the latest advancements in network communication technologies. Specifically, this gap is evident in the lack of exploration regarding how cutting-edge technologies can be effectively employed to optimize bandwidth allocation, especially in the realm of video service applications within the forthcoming vehicular networks. In light of this void, this paper presents a seamless integration of cutting-edge 6G communication technologies, such as terahertz (THz) and visible light communication (VLC), with the existing 5G millimeter-wave and sub-6 GHz base stations. This integration facilitates the creation of a network environment characterized by high transmission rates and extensive coverage. Our primary aim is to ensure the uninterrupted playback of real-time video applications for vehicle users. These video applications encompass video conferencing, live video, and on-demand video services. The outcomes of our simulations convincingly indicate that the proposed strategy adeptly addresses the challenge of bandwidth competition among vehicle users. Moreover, it notably boosts the efficient utilization of bandwidth from less crowded base stations, optimizes the fulfillment of bandwidth prerequisites for various video applications, and elevates the overall video quality experienced by users. Consequently, our findings serve as a successful validation of the practicality and effectiveness of the proposed methodology. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

18. Identifying the Regions of a Space with the Self-Parameterized Recursively Assessed Decomposition Algorithm (SPRADA).

Author: Molinié, Dylan, Madani, Kurosh, Amarger, Véronique, and Chebira, Abdennasser
Subjects: ANOMALY detection (Computer security), ALGORITHMS, INDUSTRIALISM, HUMAN behavior models, MANUFACTURING processes
Abstract: This paper introduces a non-parametric methodology based on classical unsupervised clustering techniques to automatically identify the main regions of a space, without requiring the objective number of clusters, so as to identify the major regular states of unknown industrial systems. Indeed, useful knowledge on real industrial processes entails the identification of their regular states, and their historically encountered anomalies. Since both should form compact and salient groups of data, unsupervised clustering generally performs this task fairly accurately; however, this often requires the number of clusters upstream, knowledge which is rarely available. As such, the proposed algorithm operates a first partitioning of the space, then it estimates the integrity of the clusters, and splits them again and again until every cluster obtains an acceptable integrity; finally, a step of merging based on the clusters' empirical distributions is performed to refine the partitioning. Applied to real industrial data obtained in the scope of a European project, this methodology proved able to automatically identify the main regular states of the system. Results show the robustness of the proposed approach in the fully-automatic and non-parametric identification of the main regions of a space, knowledge which is useful to industrial anomaly detection and behavioral modeling. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

19. أساليب التغلب على الهدر البياناتي في التعليم: مراجعة منهجية

Author: أمينة سعد الدوسر ي and لينا أحمد الفران ي
Subjects: DATA mining, DATABASES, RESEARCH personnel, MACHINE learning, ARTIFICIAL intelligence
Abstract: Copyright of Scientific Journal of King Faisal University, Humanities & Management Sciences is the property of Association of Arab Universities and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2023
Full Text: View/download PDF

20. Influence of emotions on the aggressive driving behavior of online -car-hailing drivers based on association rule mining.

Author: Ma, Yongfeng, Xing, Yaqian, Wu, Ying, and Chen, Shuyan
Subjects: RESEARCH funding, DATA mining, AUTOMOBILE driving, ANGER, ACCELERATION (Mechanics), EMOTIONS, INTERNET, AGGRESSION (Psychology), HAPPINESS, MACHINE learning, AVERSION, ALGORITHMS, PHYSIOLOGICAL effects of acceleration
Abstract: Emotion is an important factor that can lead to the occurrence of aggressive driving. This paper proposes an association rule mining-based method for analysing contributing factors associated with aggressive driving behaviour among online car-hailing drivers. We collected drivers' emotion data in real time in a natural driving setting. The findings show that 29 of the top 50 association rules for aggressive driving are related to emotions, revealing a strong relationship between driver emotions and aggressive driving behaviour. The emotions of anger, surprised, happy and disgusted are frequently associated with aggressive driving behaviour. Negative emotions combined with other factors (for example, driving at high speeds and high acceleration rates and with no passengers in the vehicle) are more likely to lead to aggressive driving behaviour than negative emotions alone. The results of this study provide practical implications for the supervision and training of car-hailing drivers. PRACTITIONER SUMMARY: Based on the association rule mining method, we found a close connection between drivers' emotional states and the manifestation of aggressive driving behaviours. The findings indicate that the combination of negative emotions and various contributing factors significantly amplifies the likelihood of aggressive driving. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. On Combining Instance Selection and Discretisation: A Comparative Study of Two Combination Orders.

Author: Sue, Kuen-Liang, Tsai, Chih-Fong, and Yan, Tzu-Ming
Subjects: DATA mining, MACHINE learning, ORDER picking systems, COMPARATIVE studies, ALGORITHMS
Abstract: Data discretisation focuses on converting continuous attribute values to discrete ones which are closer to a knowledge-level representation that is easier to understand, use, and explain than continuous values. On the other hand, instance selection aims at filtering out noisy or unrepresentative data samples from a given training dataset before constructing a learning model. In practice, some domain datasets may require processing with both discretisation and instance selection at the same time. In such cases, the order in which discretisation and instance selection are combined will result in differences in the processed datasets. For example, discretisation can be performed first based on the original dataset, after which the instance selection algorithm is used to evaluate the discrete type of data for selection, whereas the alternative is to perform instance selection first based on the continuous type of data, then using the discretiser to transfer the attribute type of values of a reduced dataset. However, this issue has not been investigated before. The aim of this paper is to compare the performance of a classifier trained and tested over datasets processed by these combination orders. Specifically, the minimum description length principle (MDLP) and ChiMerge are used for discretisation, and IB3, DROP3 and GA for instance selection. The experimental results obtained using ten different domain datasets show that executing instance selection first and discretisation second perform the best, which can be used as the guideline for the datasets that require performing both steps. In particular, combining DROP3 and MDLP can provide classification accuracy of 0.85 and AUC of 0.8, which can be regarded as the representative baseline for future related researches. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Improved SVRG for finite sum structure optimization with application to binary classification.

Author: Shao, Guangmei, Xue, Wei, Yu, Gaohang, and Zheng, Xiao
Subjects: CONVEX functions, SMOOTHNESS of functions, ALGORITHMS, DATA mining, MACHINE learning, BINARY number system
Abstract: This paper looks at a stochastic variance reduced gradient (SVRG) method for minimizing the sum of a finite number of smooth convex functions, which has been involved widely in the field of machine learning and data mining. Inspired by the excellent performance of two-point stepsize gradient method in batch learning, in this paper we present an improved SVRG algorithm, named stochastic two-point stepsize gradient method. Under some mild conditions, the proposed method achieves a linear convergence rate O(ρk) for smooth and strongly convex functions, where ρ ∈ (0.68,1). Simulation experiments on several benchmark data sets are reported to demonstrate the performance of the proposed method. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

23. Role of biological Data Mining and Machine Learning Techniques in Detecting and Diagnosing the Novel Coronavirus (COVID-19): A Systematic Review.

Author: Albahri, A. S., Hamid, Rula A., Alwan, Jwan k., Al-qays, Z.T., Zaidan, A. A., Zaidan, B. B., Albahri, A O. S., AlAmoodi, A. H., Khlaf, Jamal Mawlood, Almahdi, E. M., Thabet, Eman, Hadi, Suha M., Mohammed, K I., Alsalem, M. A., Al-Obaidi, Jameel R., and Madhloom, H.T.
Subjects: ALGORITHMS, ARTIFICIAL intelligence, MACHINE learning, MEDLINE, ONLINE information services, DATA mining, SYSTEMATIC reviews, COVID-19
Abstract: Coronaviruses (CoVs) are a large family of viruses that are common in many animal species, including camels, cattle, cats and bats. Animal CoVs, such as Middle East respiratory syndrome-CoV, severe acute respiratory syndrome (SARS)-CoV, and the new virus named SARS-CoV-2, rarely infect and spread among humans. On January 30, 2020, the International Health Regulations Emergency Committee of the World Health Organisation declared the outbreak of the resulting disease from this new CoV called 'COVID-19', as a 'public health emergency of international concern'. This global pandemic has affected almost the whole planet and caused the death of more than 315,131 patients as of the date of this article. In this context, publishers, journals and researchers are urged to research different domains and stop the spread of this deadly virus. The increasing interest in developing artificial intelligence (AI) applications has addressed several medical problems. However, such applications remain insufficient given the high potential threat posed by this virus to global public health. This systematic review addresses automated AI applications based on data mining and machine learning (ML) algorithms for detecting and diagnosing COVID-19. We aimed to obtain an overview of this critical virus, address the limitations of utilising data mining and ML algorithms, and provide the health sector with the benefits of this technique. We used five databases, namely, IEEE Xplore, Web of Science, PubMed, ScienceDirect and Scopus and performed three sequences of search queries between 2010 and 2020. Accurate exclusion criteria and selection strategy were applied to screen the obtained 1305 articles. Only eight articles were fully evaluated and included in this review, and this number only emphasised the insufficiency of research in this important area. After analysing all included studies, the results were distributed following the year of publication and the commonly used data mining and ML algorithms. The results found in all papers were discussed to find the gaps in all reviewed papers. Characteristics, such as motivations, challenges, limitations, recommendations, case studies, and features and classes used, were analysed in detail. This study reviewed the state-of-the-art techniques for CoV prediction algorithms based on data mining and ML assessment. The reliability and acceptability of extracted information and datasets from implemented technologies in the literature were considered. Findings showed that researchers must proceed with insights they gain, focus on identifying solutions for CoV problems, and introduce new improvements. The growing emphasis on data mining and ML techniques in medical fields can provide the right environment for change and improvement. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

24. Enhancing recall in automated record screening: A resampling algorithm.

Author: Zhipeng Hou and Tipton, Elizabeth
Subjects: AUTOMATIC identification, ALGORITHMS, HUMAN error, PROBABILITY theory, TRACKING algorithms, TEXT mining
Abstract: Literature screening is the process of identifying all relevant records from a pool of candidate paper records in systematic review, meta-analysis, and other research synthesis tasks. This process is time consuming, expensive, and prone to human error. Screening prioritization methods attempt to help reviewers identify most relevant records while only screening a proportion of candidate records with high priority. In previous studies, screening prioritization is often referred to as automatic literature screening or automatic literature identification. Numerous screening prioritization methods have been proposed in recent years. However, there is a lack of screening prioritization methods with reliable performance. Our objective is to develop a screening prioritization algorithm with reliable performance for practical use, for example, an algorithm that guarantees an 80% chance of identifying at least 80% of the relevant records. Based on a target-based method proposed in Cormack and Grossman, we propose a screening prioritization algorithm using sampling with replacement. The algorithm is a wrapper algorithm that can work with any current screening prioritization algorithm to guarantee the performance. We prove, with mathematics and probability theory, that the algorithm guarantees the performance. We also run numeric experiments to test the performance of our algorithm when applied in practice. The numeric experiment results show this algorithm achieve reliable performance under different circumstances. The proposed screening prioritization algorithm can be reliably used in real world research synthesis tasks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. A Review on Electronic Health Record Text-Mining for Biomedical Name Entity Recognition in Healthcare Domain.

Author: Ahmad, Pir Noman, Shah, Adnan Muhammad, and Lee, KangYoon
Subjects: SUBJECT headings, MEDICAL information storage & retrieval systems, NATURAL language processing, MACHINE learning, ARTIFICIAL intelligence, BIOINFORMATICS, TERMS & phrases, INFORMATION retrieval, CLINICAL medicine, ELECTRONIC health records, DATA mining, ALGORITHMS
Abstract: Biomedical-named entity recognition (bNER) is critical in biomedical informatics. It identifies biomedical entities with special meanings, such as people, places, and organizations, as predefined semantic types in electronic health records (EHR). bNER is essential for discovering novel knowledge using computational methods and Information Technology. Early bNER systems were configured manually to include domain-specific features and rules. However, these systems were limited in handling the complexity of the biomedical text. Recent advances in deep learning (DL) have led to the development of more powerful bNER systems. DL-based bNER systems can learn the patterns of biomedical text automatically, making them more robust and efficient than traditional rule-based systems. This paper reviews the healthcare domain of bNER, using DL techniques and artificial intelligence in clinical records, for mining treatment prediction. bNER-based tools are categorized systematically and represent the distribution of input, context, and tag (encoder/decoder). Furthermore, to create a labeled dataset for our machine learning sentiment analyzer to analyze the sentiment of a set of tweets, we used a manual coding approach and the multi-task learning method to bias the training signals with domain knowledge inductively. To conclude, we discuss the challenges facing bNER systems and future directions in the healthcare field. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

26. An ensemble approach to outlier detection using some conventional clustering algorithms.

Author: Saha, Akash, Chatterjee, Agneet, Ghosh, Soulib, Kumar, Neeraj, and Sarkar, Ram
Subjects: OUTLIER detection, ALGORITHMS, MACHINE learning, DATA mining, KALMAN filtering
Abstract: Outlier detection is an important requirement in data mining and machine learning. When data mining and machine learning algorithms are applied on the datasets with outliers, it leads to erroneous conclusion about the data. Therefore, researchers have been working in this field to remove outliers from dataset so that meaningful information from the datasets can be retrieved. In this paper, we take a cluster based ensemble approach for outlier detection, the backbone of which are some conventional clustering algorithms. Keeping in mind the drawbacks of supervised and semi supervised learning, we have relied on unsupervised learning algorithms. For our cluster based ensemble approach, we use three clustering algorithms, namely K-means, K-means++, and Fuzzy C-means. Our model intelligently combines results from individual clustering algorithms, assigning probabilities to each data point in order to decide its belongingness to a certain cluster. We have proposed a technique to assign a membership value to a data point in case of hard clustering algorithms, as we want to keep the flexibility of combining hard and soft clustering algorithms. From the probabilities assigned by the ensemble model, we then identify the outliers from the dataset. After removing these data points from the dataset, we obtain better values of cluster validity indices, thus reaffirming that removal of outliers has resulted in more stringent clusters of data. We have used five different cluster validity indices in our work to measure the goodness of the clusters formed, considering eight widely used datasets for evaluation of the proposed model amongst which three are large datasets. We have noticed a significant improvement in the cluster validity indices after applying our outlier detection algorithm. The experimental results prove that the proposed method is empirically sound. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

27. A novel feature selection approach with Pareto optimality for multi-label data.

Author: Li, Guohe, Li, Yong, Zheng, Yifeng, Li, Ying, Hong, Yunfeng, and Zhou, Xiaoming
Subjects: FEATURE selection, DATA mining, MACHINE learning, ALGORITHMS
Abstract: Multi-label learning has widely applied in machine learning and data mining. The purpose of feature selection is to select an approximately optimal feature subset to characterize the original feature space. Similar to single-label data, feature selection is an import preprocessing step to enhance the performance of multi-label classification model. In this paper, we propose a multi-label feature selection approach with Pareto optimality for continuous data, called MLFSPO. It maps multi-label features to high-dimensional space to evaluate the correlation between features and labels by utilizing the Hilbert-Schmidt Independence Criterion (HSIC). Then, the feature subset obtains by combining the Pareto optimization with feature ordering criteria and label weighting. Eventually, extensive experimental results on publicly available data sets show the effectiveness of the proposed algorithm in multi-label tasks. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

28. Adaptive Hierarchical Density-Based Spatial Clustering Algorithm for Streaming Applications.

Author: Vijayan, Darveen and Aziz, Izzatdin
Subjects: MACHINE learning, SPANNING trees, ALGORITHMS, DEEP learning, DATA mining
Abstract: Clustering algorithms are commonly used in the mining of static data. Some examples include data mining for relationships between variables and data segmentation into components. The use of a clustering algorithm for real-time data is much less common. This is due to a variety of factors, including the algorithm's high computation cost. In other words, the algorithm may be impractical for real-time or near-real-time implementation. Furthermore, clustering algorithms necessitate the tuning of hyperparameters in order to fit the dataset. In this paper, we approach clustering moving points using our proposed Adaptive Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) algorithm, which is an implementation of an adaptive approach to building the minimum spanning tree. We switch between the Boruvka and the Prim algorithms as a means to build the minimum spanning tree, which is one of the most expensive components of the HDBSCAN. The Adaptive HDBSCAN yields an improvement in execution time by 5.31% without depreciating the accuracy of the algorithm. The motivation for this research stems from the desire to cluster moving points on video. Cameras are used to monitor crowds and improve public safety. We can identify potential risks due to overcrowding and movements of groups of people by understanding the movements and flow of crowds. Surveillance equipment combined with deep learning algorithms can assist in addressing this issue by detecting people or objects, and the Adaptive HDBSCAN is used to cluster these items in real time to generate information about the clusters. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

29. A clustering algorithm based on density decreased chain for data with arbitrary shapes and densities.

Author: Li, Ruijia and Cai, Zhiling
Subjects: DENSITY, ALGORITHMS
Abstract: Density-based clustering has received increasing attention for its ability to handle clusters of arbitrary shapes. However, it still has difficulties in mining clusters of arbitrary densities, especially the clusters of sparse regions in the presence of dense regions. To address this problem, this paper presents a new concept called density decreased chain on the mutual k-NN graph. It starts with the local density center whose density is the highest in the data points connected to this center. Based on the density decreased chain, the concept of the core point is redefined. The density of the core point is close to that of the local density center on the same density decreased chain as the core point. According to its definition, the core point in the data with arbitrary densities can be well identified because the local density centers exist in both sparse and dense regions. Further, intra-cluster density decreased chain is defined to mine subclusters in the core points. After forming the subclusters, the remaining data point is hierarchically assigned to one of these subclusters by the density decreased chains containing this remaining data point. The experiments illustrate the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

30. The Use of Clustering and Classification Methods in Machine Learning and Comparison of Some Algorithms of the Methods.

Author: Mulla, Guhdar A. A. and Demir, Yıldırım
Subjects: MACHINE learning, ALGORITHMS, DECISION trees, ARTIFICIAL neural networks, DATA mining
Abstract: In this article, two machine learning methods such as classification and clustering are used for decision tree (DT), artificial neural network (ANN), and K-nearest neighbors algorithms. The datasets were used to evaluate the effectiveness of the clustering method and the data mining tool. Weather data were used to compare algorithms and methods in the study. This study showed that the best model was DT according to accuracy and precision measures but the best model according to F-measure and receiver operating characteristic curve area measures was ANN. Waikato Environment for Knowledge Analysis, a data mining tool, is utilized in this paper to carry out the clustering. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

31. A NEW APPROACH FOR SOLVING THE EUCLIDEAN STEINER TREE PROBLEM BASED ON DATA CLUSTERING.

Author: Turan, Duygu S. and Ordin, Burak
Subjects: COMBINATORIAL optimization, MEDICINE, DATA mining, ALGORITHMS, CLUSTER analysis (Statistics)
Abstract: Steiner tree problem is a combinatorial optimization problem, which has many important applications in different areas such as medicine, communication, engineering. Several different algorithms have been developed for the solving of the problem that is NP-Complete. In this paper, new solution algorithms based on data clustering problem are proposed for approximate solution of Steiner tree problem. One of the main purposes of this study is to solve the combinatorial optimization problem with approaches in the field of data mining. The proposed methods are applied on 28 data sets in the literature and the results of preliminary numerical experiments are reported. Steiner tree problem is NP-Complete so that optimal solutions for some of these datasets, especially large scale data sets, can not obtain and are unknown. Therefore, importance of improving algorithms to find approximate solutions for such problems gradually increases. [ABSTRACT FROM AUTHOR]
Published: 2023

32. Gene Expression Programming as a data classification tool. A review.

Author: Jędrzejowicz, Joanna and Jędrzejowicz, Piotr
Subjects: GENE expression profiling, ALGORITHMS, MACHINE learning, SUBMANIFOLDS, DATA mining
Abstract: The paper reviews classification algorithms based on Gene Expression Programming (GEP) used for mining the real-life datasets. Our aim is to show, chronologically, most important developments as well as the current state-of-the-art in the area of GEP-based classifiers, with a view to attract further real life applications. We begin with reviewing approaches to building basic, stand alone, GEP classifiers and eventually, combining them into the classifier ensemble. In the following part of the paper we describe and illustrate with example several hybrid solutions where GEP is integrated with other methods. Next, we review specialized and dedicated methods including multiple criteria and incremental GEP-based classification tools. Final part of the paper reviews specialized GEP-based classifiers used to mine the real-life datasets. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

33. Meta-Heuristic Feature Optimization for ontology-based data security in a campus workplace with robotic assistance.

Author: Gong, Suning, Dinesh Jackson Samuel, R., Pandian, Sanjeevi, Kumar, Priyan Malarvizhi, Pandey, Hari Mohan, and Srivastava, Gautam
Subjects: WORK environment, SEMANTICS, RESEARCH evaluation, ARTIFICIAL intelligence, MACHINE learning, ROBOTICS, SOFTWARE architecture, DATA security, INTELLECT, INFORMATION retrieval, ONTOLOGIES (Information retrieval), DATA mining, ALGORITHMS
Abstract: BACKGROUND: For campus workplace secure text mining, robotic assistance with feature optimization is essential. The space model of the vector is usually used to represent texts. Besides, there are still two drawbacks to this basic approach: the curse and lack of semantic knowledge. OBJECTIVES: This paper proposes a new Meta-Heuristic Feature Optimization (MHFO) method for data security in the campus workplace with robotic assistance. Firstly, the terms of the space vector model have been mapped to the concepts of data protection ontology, which statistically calculate conceptual frequency weights by term various weights. Furthermore, according to the designs of data protection ontology, the weight of theoretical identification is allocated. The dimensionality of functional areas is reduced significantly by combining standard frequency weights and weights based on data protection ontology. In addition, semantic knowledge is integrated into this process. RESULTS: The results show that the development of the characteristics of this process significantly improves campus workplace secure text mining. CONCLUSION: The experimental results show that the development of the features of the concept hierarchy structure process significantly enhances data security of campus workplace text mining with robotic assistance. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

34. A Novel Comparison of Charotar Region Wheat Variety Classification Techniques using Purely Tree-based Data Mining Algorithms.

Author: Raj, M.P. and Saini, Jatinderkumar R.
Subjects: DATA mining, WHEAT, MACHINE learning, ALGORITHMS, CLASSIFICATION, DURUM wheat
Abstract: Techniques for classifying data using data mining are now a day prevalent in agriculture. The method of classifying seeds involves grouping various seed varieties according to their morphological characteristics. To accomplish categorization of the typical Charotar region (generally comprising Anand and Kheda districts of the Gujarat State of India) Gujarat Wheat (GW) varieties (TRITICUM – AESTIVUM) viz. GW 273, GW 496, GW 322, LOK-1, and GDW 1255 (TRITICUM – DURUM), Weka Explorer was used. The features used are area, perimeter, solidity, aspect ratio, major and minor axis of seed kernel, Hue, Saturation, Value, and SF1 (empirical). Features reduction was done using Information Gain (IG) and its modified version Gain Ratio (GR). This paper compares performance of Tree based data mining algorithms in classifying wheat varieties. For classification we used purely tree-based machine learning algorithms viz. J48, Random Forest, Hoeffding Tree, Logistic Model Tree (LMT), and REPTree. LMT- logistics regression method gives higher accuracy 96.4% compared to other classifiers. Hoeffding Tree classifiers stood second with 96% accuracy. For validation 10-fold cross validation was used. By reducing the number of folds in cross validation performance of most algorithms decreased except J48. The percentage of correctly classified instance increased for all algorithms when features were selected by GR except for J48. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. RUCIB: a novel rule-based classifier based on BRADO algorithm.

Author: Morovatian, Iman, Basiri, Alireza, and Rezaei, Samira
Subjects: SUPERVISED learning, DATABASES, ALGORITHMS, MACHINE learning, RANDOM forest algorithms
Abstract: Classification is a widely used supervised learning technique that enables models to discover the relationship between a set of features and a specified label using available data. Its applications span various fields such as engineering, telecommunication, astronomy, and medicine. In this paper, we propose a novel rule-based classifier called RUCIB (RUle-based Classifier Inspired by BRADO), which draws inspiration from the socio-inspired swarm intelligence algorithm known as BRADO. RUCIB introduces two key aspects: the ability to accommodate multiple values for features within a rule and the capability to explore all data features simultaneously. To evaluate the performance of RUCIB, we conducted experiments using ten databases sourced from the UCI machine learning database repository. In terms of classification accuracy, we compared RUCIB to ten well-known classifiers. Our results demonstrate that, on average, RUCIB outperforms Naive Bayes, SVM, PART, Hoeffding Tree, C4.5, ID3, Random Forest, CORER, CN2, and RACER by 9.32%, 8.97%, 7.58%, 7.4%, 7.34%, 7.34%, 7.22%, 5.06%, 5.01%, and 1.92%, respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Advances in Machine Learning for Sensing and Condition Monitoring.

Author: Ao, Sio-Iong, Gelman, Len, Karimi, Hamid Reza, and Tiboni, Monica
Subjects: DATA mining, MACHINE learning, DEEP learning, ACQUISITION of data, SENSES, ALGORITHMS
Abstract: In order to overcome the complexities encountered in sensing devices with data collection, transmission, storage and analysis toward condition monitoring, estimation and control system purposes, machine learning algorithms have gained popularity to analyze and interpret big sensory data in modern industry. This paper put forward a comprehensive survey on the advances in the technology of machine learning algorithms and their most recent applications in the sensing and condition monitoring fields. Current case studies of developing tailor-made data mining and deep learning algorithms from practical aspects are carefully selected and discussed. The characteristics and contributions of these algorithms to the sensing and monitoring fields are elaborated. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

37. 多类不平衡数据分类方法综述.

Author: 李　昂, 韩　萌, 穆栋梁, 高智慧, and 刘淑娟
Subjects: MACHINE learning, DATA mining, FEATURE selection, CLASSIFICATION, ALGORITHMS
Abstract: Copyright of Application Research of Computers / Jisuanji Yingyong Yanjiu is the property of Application Research of Computers Edition and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2022
Full Text: View/download PDF

38. An Integrated Classification and Association Rule Technique for Early-Stage Diabetes Risk Prediction.

Author: Khafaga, Doaa Sami, Alharbi, Amal H., Mohamed, Israa, and Hosny, Khalid M.
Subjects: DIAGNOSIS of diabetes, DIABETES prevention, DIABETES risk factors, DECISION trees, MACHINE learning, RISK assessment, PREDICTION models, ARTIFICIAL neural networks, EARLY diagnosis, EARLY medical intervention, ALGORITHMS
Abstract: The number of diabetic patients is increasing yearly worldwide, requiring the need for a quick intervention to help these people. Mortality rates are higher for diabetic patients with other serious health complications. Thus, early prediction for such diseases positively impacts healthcare quality and can prevent serious health complications later. This paper constructs an efficient prediction system for predicting diabetes in its early stage. The proposed system starts with a Local Outlier Factor (LOF)-based outlier detection technique to detect outlier data. A Balanced Bagging Classifier (BBC) technique is used to balance data distribution. Finally, integration between association rules and classification algorithms is used to develop a prediction model based on real data. Four classification algorithms were utilized in addition to an a priori algorithm that discovered relationships between various factors. The named algorithms are Artificial Neural Network (ANN), Decision Trees (DT), Support Vector Machines (SVM), and K Nearest Neighbor (KNN) for data classification. Results revealed that KNN provided the highest accuracy of 97.36% compared to the other applied algorithms. An a priori algorithm extracted association rules based on the Lift matrix. Four association rules from 12 attributes with the highest correlation and information gain scores relative to the class attribute were produced. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

39. Distributed Function Mining for Gene Expression Programming Based on Fast Reduction.

Author: Deng, Song, Yue, Dong, Yang, Le-chan, Fu, Xiong, and Feng, Ya-zhou
Subjects: GENE expression, BIOLOGICAL evolution, GENETICS, DATA mining, ALGORITHMS, COMPARATIVE studies
Abstract: For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

40. The Significance of Parameters' Optimization in Fair Benchmarking of Software Defects' Prediction Performances.

Author: Ghunaim, Hussam and Dichter, Julius
Subjects: SOFTWARE engineering, COMPUTER performance, ALGORITHMS, SUPPORT vector machines, MULTILAYER perceptrons, MATHEMATICAL optimization
Abstract: Software engineering research in general and software defects' prediction research in particular are facing serious challenges to their reliability and validity. The major reason is that many of the published research outcomes contradict each other. This phenomenon is mainly caused by the lack of research standards as it exists in many well-established scientific and engineering disciplines. The scope of this paper is to focus on fair benchmarking of the defects' prediction models. By experimenting three prediction algorithms, we found that the quality of the resultant predictions would significantly fluctuate as parameters' values changed. Therefore, any published research results not based on optimized prediction algorithms methods can cause inaccurate and misleading benchmarking and recommendations. Thus, we propose optimizing parameters as an essential research standard to conduct reliable and valid benchmarking. We believe if this standard were approved by interested software quality practitioners and research communities, it will present a vital role in soothing the severity of this phenomenon. The three prediction algorithms we used in our analysis were Support Vector Machine SVM, Multilayer Perceptron MLP, and Naïve Bayesian NB. We used KNIME as a data mining platform to design and run all optimization loops on open source Eclipse 2.0 data set. [ABSTRACT FROM AUTHOR]
Published: 2016

41. A personalized channel recommendation and scheduling system considering both section video clips and full video clips.

Author: Lee, SeungGwan and Lee, DaeHo
Subjects: BROADCASTING industry, VIDEO excerpts, TECHNOLOGY convergence, RECOMMENDER systems, PREDICTION theory
Abstract: With the convergence of various broadcasting systems, the amount of content available in mobile terminals including IPTV has significantly increased. In this paper, we propose a system that enables users to schedule programs considering both section video clips and full video clips based on the user detection method with similar preference. And, since the system constituting the contents can be classified according to the program, the proposed method can store a program desired by the user, and thus create and schedule a kind of individual channel. Experimental results show that the proposed method has a higher prediction accuracy; this is accomplished by comparing existing channel recommendation methods with the program recommendation methods proposed in this paper. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

42. Mining telemonitored physiological data and patient-reported outcomes of congestive heart failure patients.

Author: Mlakar, Miha, Puddu, Paolo Emilio, Somrak, Maja, Bonfiglio, Silvio, Luštrek, Mitja, and null, null
Subjects: CONGESTIVE heart failure treatment, TELEMEDICINE, HEALTH outcome assessment, WEARABLE technology, DATA mining
Abstract: This paper addresses patient-reported outcomes (PROs) and telemonitoring in congestive heart failure (CHF), both increasingly important topics. The interest in CHF trials is shifting from hard end-points such as hospitalization and mortality, to softer end-points such health-related quality of life. However, the relation of these softer end-points to objective parameters is not well studied. Telemonitoring is suitable for collecting both patient-reported outcomes and objective parameters. Most telemonitoring studies, however, do not take full advantage of the available sensor technology and intelligent data analysis. The Chiron clinical observational study was performed among 24 CHF patients (17 men and 7 women, age 62.9 ± 9.4 years, 15 NYHA class II and 9 class III, 10 of ishaemic, aetiology, 6 dilated, 2 valvular, and 6 of multiple aetiologies or cardiomyopathy) in Italy and UK. A large number of physiological and ambient parameters were collected by wearable and other devices, together with PROs describing how well the patients felt, over 1,086 days of observation. The resulting data were mined for relations between the objective parameters and the PROs. The objective parameters (humidity, ambient temperature, blood pressure, SpO2, and sweeting intensity) could predict the PROs with accuracies up to 86% and AUC up to 0.83, making this the first report providing evidence for ambient and physiological parameters to be objectively related to PROs in CHF patients. We also analyzed the relations in the predictive models, gaining some insights into what affects the feeling of health, which was also generally not attempted in previous investigations. The paper strongly points to the possibility of using PROs as primary end-points in future trials. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

43. Sentiment classification in social media data by combining triplet belief functions.

Subjects: SENTIMENT analysis, DEEP learning, CONFIDENCE intervals, SOCIAL media, CONSUMER attitudes, MACHINE learning, BUSINESS, GROUP decision making, STATISTICAL correlation, DATA mining, ALGORITHMS
Abstract: Sentiment analysis is an emerging technique that caters for semantic orientation and opinion mining. It is increasingly used to analyze online reviews and posts for identifying people's opinions and attitudes to products and events in order to improve business performance of companies and aid to make better organizing strategies of events. This paper presents an innovative approach to combining the outputs of sentiment classifiers under the framework of belief functions. It consists of the formulation of sentiment classifier outputs in the triplet evidence structure and the development of general formulas for combining triplet functions derived from sentiment classification results via three evidential combination rules along with comparative analyses. The empirical studies have been conducted on examining the effectiveness of our method for sentiment classification individually and in combination, and the results demonstrate that the best combined classifiers by our method outperforms the best individual classifiers over five review datasets. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

44. Research on Disease and Pest Prediction Model Based on Sparse Clustering Algorithm.

Author: Cao, Shan and Li, Xiaodong
Subjects: PREDICTION models, ALGORITHMS, PESTS, MACHINE learning, DATA mining
Abstract: Clustering is one of the most common technologies in data mining. The size, dimension and sparsity of data are all different aspects that restrict clustering analysis. Clustering machine learning is called unsupervised learning, which is different from classification in that the data objects used in clustering analysis have no class marks, so they need to be calculated by the clustering learning algorithm, while the classified data objects need class marks. At present, most of the sparse data algorithms with high attribute dimensions are oriented to binary data, and there is no evaluation method of clustering results, which brings great limitations to their application. The BP neural network, RBF neural network and Elman neural network model with feedback function, which are mature and widely used in pest prediction models, are studied. The purpose of this paper is to study diseases and pests based on sparse clustering algorithm. In this paper, algorithm formulas, models and data graphs are established to study. From the research, it can be seen that the damage index of pests and diseases is very high, reaching about 50. 54%. Through the research in this paper, we can know that the future will lay a certain foundation for the research of diseases and pests. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

45. Bag Constrained Structure Pattern Mining for Multi-Graph Classification.

Author: Wu, Jia, Zhu, Xingquan, Zhang, Chengqi, and Yu, Philip S.
Subjects: DATA mining, MULTIGRAPH, MACHINE learning, SUPERVISED learning, EMAIL systems, ALGORITHMS
Abstract: This paper formulates a multi-graph learning task. In our problem setting, a bag contains a number of graphs and a class label. A bag is labeled positive if at least one graph in the bag is positive, and negative otherwise. In addition, the genuine label of each graph in a positive bag is unknown, and all graphs in a negative bag are negative. The aim of multi-graph learning is to build a learning model from a number of labeled training bags to predict previously unseen test bags with maximum accuracy. This problem setting is essentially different from existing multi-instance learning (MIL), where instances in MIL share well-defined feature values, but no features are available to represent graphs in a multi-graph bag. To solve the problem, we propose a Multi-Graph Feature based Learning ( gMGFL) algorithm that explores and selects a set of discriminative subgraphs as features to transfer each bag into a single instance, with the bag label being propagated to the transferred instance. As a result, the multi-graph bags form a labeled training instance set, so generic learning algorithms, such as decision trees, can be used to derive learning models for multi-graph classification. Experiments and comparisons on real-world multi-graph tasks demonstrate the algorithm performance. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

46. Enhancement of fraternal K-median algorithm with CNN for high dropout probabilities to evolve optimal time-complexity.

Author: Nagaraj, Balakrishnan, Arunkumar, Rajendran, Nisi, K., and Vijayakumar, Ponnusamy
Subjects: ALGORITHMS, CONVOLUTIONAL neural networks, DEEP learning, MACHINE learning, PROBABILITY theory
Abstract: Machine learning era began to rule almost all the technologies, which influences the improvement of the performance due to its intelligent computing methodologies. Especially the Deep Learning algorithm plays a vital role in computing a human-like decision, which is considered to be the superior breakthrough technology of the century. Deep Learning algorithms generate a massive sum of features which is stacked and learned by many other neurons of the network in the term of links. Links initiated from the input and ends in the output of the network connecting many neurons on its path. The significant limitation of this network is its thirst towards the high computation power. This paper represents a methodology to make the system consume less computation requirement during its training or testing phase. In this process, an effective clustering algorithm (fraternal K-median clustering) is used as the preprocessing strategies, and as the second phase the dropout regularization procedure is implemented (in the Convolutional Neural Network, a type of Deep Learning algorithm) to eliminate most of the insignificant data. The dropout strategies used in the process helps in the improvement of accuracy by making the network overfit the decision of CNN, obtaining state-of-the-art results. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

47. A differential evolution-based algorithm with maturity extension for feature selection in intrusion detection system.

Author: Faris, Mohammed, Mahmud, Mohd Nazri, Mohd Salleh, Mohd Fadzli, and Alsharaa, Baseem
Subjects: FEATURE selection, DATA mining, WIRELESS sensor networks, ALGORITHMS, DIFFERENTIAL evolution, MACHINE learning, INTRUSION detection systems (Computer security)
Abstract: Feature Selection (FS) is critically important for optimising the performance of Intrusion Detection Systems (IDSs) used in Wireless Sensor Networks (WSNs). However, selecting an optimal number of relevant features from massive IDS data sets has been an ongoing FS research problem. Many approaches use Machine Learning (ML), such as nature-inspired population-based and Differential Evolution (DE)-based algorithms for FS. However, the main drawback of DE is that it has a premature convergence issue, which leads to false convergence. In this paper, a novel mutation strategy is proposed to alleviate premature convergence and to develop a modified DE-based FS algorithm called DE with maturity extension (DE-ME). The novel mutation is achieved by changing the mutation factor in the DE\rnd\1 based on K-Nearest Neighbour (KNN) as the most efficient implementation without adding further complexity to the design process. The proposed DE-ME algorithm improves the FS operation with significant performance improvements in terms of overcoming premature convergence and increasing accuracy for selection. The DE-ME algorithm was used to select the most effective features from 41 features in the Network Security Laboratory Knowledge Discovery in Databases (NSL-KDD) dataset. The accuracy of DE-ME is high at 99.66 %, its False Positive Rate (FPR) is low at 0.464 %, and the true positive rate (TPR) achieved is 99.80 %. In addition, the number of features selected by DE-ME is 6 out of 41 features in the dataset. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

48. Understanding topic duration in Twitter learning communities using data mining.

Author: Arslan, Okan, Xing, Wanli, Inan, Fethi A., and Du, Hanxiang
Subjects: RELIABILITY (Personality trait), PROFESSIONAL employee training, INTERNET, NATURAL language processing, ACQUISITION of data, MACHINE learning, RANDOM forest algorithms, LEARNING strategies, QUALITATIVE research, RESEARCH funding, DATA analysis, LOGISTIC regression analysis, PREDICTION models, DATA mining, ALGORITHMS
Abstract: Background: There has been increasing interest in online professional learning networks in a variety of social media platforms, especially in Twitter. Twitter offers immediacy, personalization, and support of networks to increase professional knowledge and the sense of membership. Knowing the topics discussed in Twitter and the factors that affect the duration of a topic would help to sustain and reconstruct Twitter‐based professional learning activities. Objectives: The purpose of this study is to analyse the topics discussed and what factors affect the duration of a specific topic in 6 years within a virtual professional learning network (VPLN) using #Edchat in Twitter, based on media richness features. Methods: Internet‐mediated research and digital methods are used for data collection and analysis. Various text, natural language processing, and machine learning algorithms were used along with the quantitative multilevel models. This study examined 504,998 tweets posted by 72,342 unique users by using #Edchat. Results: There were 150 topics discussed over the 6 years and multilevel random intercept regression model revealed that a specific topic discussed in the #Edchat VPLN is discussed longer when it has more tweets, rather than retweets, posted by a high number of different users along with moderate text, high or moderate mentions with more hashtags. Takeaways: The study developed an automated social media richness feature extraction framework that can be adapted for other theoretical applications in educational context. Emergent topics discussed in Twitter among #Edchat VPLN members for professional development were identified. It extends the social media richness theory for educational context and explore factors that affect an online professional learning activity in Twitter. Lay Description: What is already known about this topic: Professional Learning Communities have started to use both synchronous and asynchronous Web 2.0 and social media platforms where they may exchange information and develop a sense of belonging with others who share common interests.Synchronous online activity in social networking sites, such as Twitter, can be an effective, informal and a free way to convey information and create and develop personalized networks.Twitter offers immediacy, personalization, and support of networks to increase professional knowledge and the sense of membership.There have been a great number of hashtags in Twitter for educational context and those hashtags are used by teachers to find information and resources and gain new perspectives and ideas from their colleagues or experts.The richness of a social media post varies according to how it is structured and constructed. What this paper adds: By using #edchat in Twitter, educators discussed 150 topics in 6 years.The most discussed topics are Creating, changing school culture; Classroom management, teaching methods; Classroom settings, Educational technologies; Support – Needs; Students' subject skills and interest; and School environment.A specific topic stays longer when it has more tweets, rather than retweets, posted by a high number of different users.Topics that have moderate text, high or moderate mentions with more hashtags are discussed longer on Twitter.The duration of a topic can be changed according to the educators' behaviours in a synchronous online chat. Implications for practice and/or policy: Importance of social media for professional learning and how to sustain a topic for better learning and understanding.Developing and applying an automated computational discourse analysis social media learning.Applying media richness theory to understand the affordances of social media learning.Quantifying the influence of various factors on the discourse in social media learning. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

49. A strategy to estimate the optimal low-rank in incremental SVD-based algorithms for recommender systems.

Author: Bahrkazemi, Maryam and Mohammadi, Maryam
Subjects: RECOMMENDER systems, SINGULAR value decomposition, INFORMATION filtering, ALGORITHMS, DATA mining, MACHINE learning
Abstract: Recommender systems apply machine learning and data mining techniques for filtering unseen information, and they can provide an opportunity to predict whether a user would be interested in a given item. The main types of recommender systems are collaborative filtering (CF) and content-based filtering, which suffer from scalability and data sparsity resulting in poor quality recommendations and reduced coverage. There are two incremental algorithms based on Singular Value Decomposition (SVD) with high scalability for recommender systems which are named the incremental SVD algorithm and incremental Approximating the Singular Value Decomposition (ApproSVD) algorithm. In both mentioned methods, the estimated value of rank for approximating the recommender systems' data matrix is chosen experimentally in the related literature. In this paper, we investigate the role of singular values for estimating a more reliable amount of rank in the mentioned dimensionality reduction techniques to improve the recommender systems' performance. In other words, we offered a strategy for choosing the optimal rank that approximates the data matrix more accurately in incremental algorithms with the help of singular values. The numerical results illustrate that the suggested strategy improves the accuracy of the recommendations and run times of both algorithms when employs for Movielens, Netflix, and Jester dataset. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

50. Hybrid Reptile Search Algorithm and Remora Optimization Algorithm for Optimization Tasks and Data Clustering.

Author: Almotairi, Khaled H. and Abualigah, Laith
Subjects: SEARCH algorithms, MATHEMATICAL optimization, REPTILES, MACHINE learning, DATA mining
Abstract: Data clustering is a complex data mining problem that clusters a massive amount of data objects into a predefined number of clusters; in other words, it finds symmetric and asymmetric objects. Various optimization methods have been used to solve different machine learning problems. They usually suffer from local optimal problems and unbalance between the search mechanisms. This paper proposes a novel hybrid optimization method for solving various optimization problems. The proposed method is called HRSA, which combines the original Reptile Search Algorithm (RSA) and Remora Optimization Algorithm (ROA) and handles these mechanisms' search processes by a novel transition method. The proposed HRSA method aims to avoid the main weaknesses raised by the original methods and find better solutions. The proposed HRSA is tested on solving various complicated optimization problems—twenty-three benchmark test functions and eight data clustering problems. The obtained results illustrate that the proposed HRSA method performs significantly better than the original and comparative state-of-the-art methods. The proposed method overwhelmed all the comparative methods according to the mathematical problems. It obtained promising results in solving the clustering problems. Thus, HRSA has a remarkable efficacy when employed for various clustering problems. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

594 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources