Language: english / Publication Year Range: Last 10 years / Topic: clustering and machine learning - Searchworks@Jio Institute Digital Library Search Results

Showing total 1,557 results

Start Over Topic clustering Topic machine learning Publication Year Range Last 10 years Language english

1,557 results

1. Contraction Clustering (RASTER) : A Big Data Algorithm for Density-Based Clustering in Constant Memory and Linear Time

Author: Ulm, Gregor, Gustavsson, Emil, Jirstrand, Mats, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Nicosia, Giuseppe, editor, Pardalos, Panos, editor, Giuffrida, Giovanni, editor, and Umeton, Renato, editor
Published: 2018
Full Text: View/download PDF

2. Machine learning and engineering feature approaches to detect events perturbing the indoor microclimate in Ringebu and Heddal stave churches (Norway)

Author: Miglioranza, Pietro, Scanu, Andrea, Simionato, Giuseppe, Sinigaglia, Nicholas, and Califano, America
Published: 2024
Full Text: View/download PDF

3. A clustering approach for data quality results of research information systems

Author: Edris Abadi, Reza, Ershadi, Mohammad Javad, and Niaki, Seyed Taghi Akhavan
Published: 2023
Full Text: View/download PDF

4. Quality of hire: expanding the multi-level fit employee selection using machine learning

Author: Shet, Sateesh and Nair, Binesh
Published: 2023
Full Text: View/download PDF

5. Food price dynamics and regional clusters: machine learning analysis of egg prices in China

Author: Liu, Chang, Zhou, Lin, Höschle, Lisa, and Yu, Xiaohua
Published: 2023
Full Text: View/download PDF

6. Privacy concerns in tourism: a systematic literature review using machine learning approach and bibliometric analysis.

Author: Sharma, Hitesh, Srivastava, Praveen Ranjan, Jasimuddin, Sajjad M., Zhang, Zuopeng Justin, and Jebabli, Ikram
Subjects: BIBLIOMETRICS, MACHINE learning, TECHNOLOGICAL innovations, PRIVACY, CONSUMER behavior, SUSTAINABLE tourism, SCIENCE publishing
Abstract: Copyright of Tourism Review is the property of Emerald Publishing Limited and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

7. Clustering the countries for quantifying the status of Covid-19 through time series analysis

Author: Erandathi, Madurapperumage, Chung Wang, William Yu, and Hsieh, Chih-Chia
Published: 2022
Full Text: View/download PDF

8. Archives of Data Science, Series A. Vol. 1,1: Special Issue: Selected Papers of the 3rd German-Polish Symposium on Data Analysis and Applications

Author: Geyer-Schulz, Andreas and Pociecha, Józef
Subjects: Data Analysis, Economics, statistische Simulation, Educational Economics, Bildungsökonomie, Clustering, Machine Learning, Datenanalyse, Conjoint Analyse, ddc:330, Conjoint Analysis, Statistical Simulation, Maschinelles Lernen
Abstract: The first volume of Archives of Data Science, Series A is a special issue of a selection of contributions which have been originally presented at the {\em 3rd Bilateral German-Polish Symposium on Data Analysis and Its Applications} (GPSDAA 2013). All selected papers fit into the emerging field of data science consisting of the mathematical sciences (computer science, mathematics, operations research, and statistics) and an application domain (e.g. marketing, biology, economics, engineering).
Published: 2017
Full Text: View/download PDF

9. Mining publication papers via text miningEvaluation and Results.

Author: Ibrahim, Ahmed S., Saad, Sally, and MostafaAref
Subjects: MACHINE learning, TEXT mining, NATURAL language processing, DATA mining, AUTOMATION
Abstract: Data nowadays is the language of technologies as every process needs a data to be processed the input is data and the output also is data. Analyzing the data is a significant task especially with the increasing production of the data particularly data as a text, it would be difficult to manually analyze the data, extract information and detect the hidden patterns from unstructured text. Datamining is automated technique for gathering or deriving a new high-quality information and uncover the relations among the data. Text mining is one of main branches of the data mining however data mining this paper, an is more comprehensive overview for mining the publication papers via text mining techniques and their results and evaluation would be presentedas following: the first approachis keywords extraction using natural language processing (NLP) approach, the second approach named entity recognition and the last approach is document clustering where machine learning techniques are applied to the both of them. [ABSTRACT FROM AUTHOR]
Published: 2021

10. EXPLORING AN LSTM-SARIMA ROUTINE FOR CORE INFLATION FORECASTING.

Author: Krukovets, Dmytro
Subjects: INFLATION forecasting, MACHINE learning, RECURRENT neural networks, RANDOM walks, WARPING machines
Abstract: The object of the research is the Core Inflation Forecasting. The paper investigates the performance of the novel model routine in the exercise of the Core Inflation Forecasting. It aggregates 300+ components into 6 by the similarity of their dynamics using an updated DTW algorithm fine-tuned for monthly time series and the K-Means algorithm for grouping. Then the SARIMA model extracts linear and seasonal components, which is followed by an LSTM model that captures non-linearities and interdependencies. It solves the problem of high-quality inflation forecasting using a disaggregated dataset. While standard and traditional econometric techniques are focused on the limited sets of data that consists just a couple of variables, proposed methodology is able to capture richer part of the volatility comprising more information. The model is compared with a huge pool of other models, simple ones like Random Walk and SARIMA, to ML models like XGBoost, Random Forest and simple LSTM. While all Data Science model shows decent performance, the DTW+K-Means+SARIMA+LSTM routine gives the best RMSE over 1-month ahead and 2-month ahead forecasts, which proves the high quality of the proposed forecasting model and solves the key problem of the paper. It is explained by the model’s capability to capture both linear/seasonal patterns from the data using SARIMA part as long as it non-linear and interdependent using LSTM approach. Models are fitted for the case of Ukraine as long as they’ve been estimated on the corresponding data and may be actively used for further inflation forecasting. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. RDF graph mining for cluster-based theme identification

Author: Eddamiri, Siham, Benghabrit, Asmaa, and Zemmouri, Elmoukhtar
Published: 2020
Full Text: View/download PDF

12. Cognitive analytics management of the customer lifetime value: an artificial neural network approach

Author: De Marco, Marco, Fantozzi, Paolo, Fornaro, Claudio, Laura, Luigi, and Miloso, Antonio
Published: 2021
Full Text: View/download PDF

13. Revealing representative day-types in transport networks using traffic data clustering.

Author: Cebecauer, Matej, Jenelius, Erik, Gundlegård, David, and Burghout, Wilco
Subjects: INTELLIGENT transportation systems, TRAFFIC patterns, MACHINE learning, EVALUATION methodology, FORECASTING
Abstract: Recognition of spatio-temporal traffic patterns at the network-wide level plays an important role in data-driven intelligent transport systems (ITS) and is a basis for applications such as short-term prediction and scenario-based traffic management. Common practice in the transport literature is to rely on well-known general unsupervised machine-learning methods (e.g., k-means, hierarchical, spectral, DBSCAN) to select the most representative structure and number of day-types based solely on internal evaluation indices. These are easy to calculate but are limited since they only use information in the clustered dataset itself. In addition, the quality of clustering should ideally be demonstrated by external validation criteria, by expert assessment or the performance in its intended application. The main contribution of this paper is to test and compare the common practice of internal validation with external validation criteria represented by the application to short-term prediction, which also serves as a proxy for more general traffic management applications. When compared to external evaluation using short-term prediction, internal evaluation methods have a tendency to underestimate the number of representative day-types needed for the application. Additionally, the paper investigates the impact of using dimensionality reduction. By using just 0.1% of the original dataset dimensions, very similar clustering and prediction performance can be achieved, with up to 20 times lower computational costs, depending on the clustering method. K-means and agglomerative clustering may be the most scalable methods, using up to 60 times fewer computational resources for very similar prediction performance to the p-median clustering. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Clustering helps to improve price prediction in online booking systems

Author: Trang, Le Hong, Huy, Tran Duong, and Le, Anh Ngoc
Published: 2021
Full Text: View/download PDF

15. A Survey Paper on Data Analysis by using Model KMeans Clustering.

Author: Mondal, Sanchita and Patra, Bichitrananda
Subjects: DATA analysis, K-means clustering, MACHINE learning, ALGORITHMS, SCIENTIFIC community
Abstract: Clustering is an unsupervised machine learning technique that serves a gargantuan task in passing on the data sets into precise clusters depending on various convergence or divergence characteristics. It has a brawny prospective in health-related data analysis for programmed disease prophecy. K-means is a clustering scheme that is extensively used in various areas of machine learning. The objective of our paper is to upgrade an existing clustering algorithm, K-Mean. The model will be trained using Microarray datasets and the testing will be done using WEKA, this is an open source application. Apparently, from innumerable biological experiments and various community researches, there has been upsurge in the amount and complexity of Micro-array datasets. A storehouse that contains Micro-array gene manifestation data is called a Micro-array database. [ABSTRACT FROM AUTHOR]
Published: 2020

16. Using k-means clustering in international location decision

Author: Khalid, Waqas and Herbert-Hansen, Zaza Nadja Lee
Published: 2018
Full Text: View/download PDF

17. What do we want to know about MOOCs? Results from a machine learning approach to a systematic literature mapping review.

Author: Despujol, Ignacio, Castañeda, Linda, Marín, Victoria I., and Turró, Carlos
Subjects: MASSIVE open online courses, MACHINE learning, EDUCATIONAL resources, LARGE-scale brain networks, CONTENT analysis, EDUCATORS
Abstract: By the end of 2020, over 16,300 Massive Open Online Courses (MOOCs) from 950 universities worldwide had enrolled over 180 million students. Interest in MOOCs has been matched by significant research on the topic, including a considerable number of reviews. This study uses Machine Learning techniques and human expert supervision to generate a comprehensive systematic literature mapping review that overcomes some limitations of the traditional ones and provides a broader overview of the content and main topics studied in the specialized literature devoted to MOOCs. The sample consisted of 6320 publications automatically classified within six research topics, denominated by human experts: institutional approach, pedagogical approach, evaluation, analytics, participation, and educational resources. The content analysis of the topics identified was conducted using visual network analysis, which supported the identification of different thematic sub-clusters and endorsed the classification. Results from the review show that the lowest production of MOOC papers is within the topics of the pedagogical approach and educational resources. In contrast, participation and evaluation are the most frequent ones. In addition, the most cited papers are on the topics of analytics and resources, being the pedagogical approach and the institutional approach the less cited. This highlights the need for more MOOC research from a pedagogical perspective and calls upon the presence of educators. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

18. Machine learning in landscape ecological analysis: a review of recent approaches.

Author: Stupariu, Mihai-Sorin, Cushman, Samuel A., Pleşoianu, Alin-Ionuţ, Pătru-Stupariu, Ileana, and Fürst, Christine
Subjects: LANDSCAPE ecology, DEEP learning, ARTIFICIAL intelligence, RANDOM forest algorithms, MACHINE learning, MULTIDIMENSIONAL scaling
Abstract: Context: Artificial Intelligence (AI) has rapidly developed over the past several decades. Several related AI approaches, such as Machine Learning (ML), have been applied to research on landscape patterns and ecological processes. Objectives: Our goal was to review the methods of AI, particularly ML, used in studies related to landscape ecology and the main topics addressed. We aimed to assess the trend in the number of ML papers and the methods used therein, and provide a synopsis and prospectus of current use and future applications of ML in landscape ecology. Methods: We conducted a systematic literature search and selected 125 papers for review. These were examined and scored according to multiple criteria regarding methods and topic. We applied quantitative statistical methods, including cluster analysis based on titles, abstracts, and keywords and a non-metric multidimensional scaling based on attributes assigned during the review. We used Random Forests machine learning to describe the differences between identified clusters in terms of the topics and methods they included. Results: The most frequent method found was Random Forests, but it is noteworthy to mention the increasing popularity of tools related to Deep Learning. The topics cover both ecologically oriented issues and the landscape-human interface. There has been a rapid increase in ML and AI methods in landscape ecology research, with Deep Learning and complex multi-step pipeline AI methods emerging in the last several years. Conclusions: The rapid increase in the number of ML papers in landscape ecology research, and the range of methods employed in them, suggest explosive growth in application of these methods in landscape ecology. The increase of Deep Learning approaches in the most recent years suggest a major change in analytical paradigms and methodologies that we feel may transform the field and enable analyses of more complex pattern process relationships across vaster data sets than has been possible previously. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

19. EVOLUTIONARY MACHINE LEARNING DRIVEN BIG DATA ANALYSIS AND PROCESSING FOR INDUSTRIAL INTERNET.

Author: CHEN, WEI, MENG, WEI, and ZHANG, LINGLING
Subjects: MANUFACTURING processes, ELECTRONIC data processing, BIG data, MACHINE learning, PARTICLE swarm optimization, DIFFERENTIAL evolution
Abstract: The Industrial Internet is based on the network, the platform is the core, and the security is the guarantee. The Industrial Internet connects all industry elements and the entire industry chain through the large-scale network infrastructure, collects and analyzes industry data in real time, and forms a new application model for a new generation of information communication. With the rapid development of industrial Internet technology, the scale of industrial Internet data will become larger and larger, and the data dimension will become higher and higher. How to efficiently use cluster analysis for industrial Internet big data mining is an urgent problem that needs to be solved. This paper proposes an improved differential evolution particle swarm algorithm for industrial Internet big data clustering analysis. Differential Evolution (DE) strategy can improve the problem that the particle swarm optimization (PSO) algorithm tends to fall into local optimum in the later stage as the number of iterations increases. Considering the influence of the randomness of the arrangement order of the cluster center vectors among the individuals on the learning and updating among individuals, this paper designs a method of adaptively adjusting the arrangement order of the cluster center vectors to optimize the cluster center vector with maximum similarity among individuals. In order to effectively evaluate our method, both industrial and non-industrial datasets are selected. The experimental results verify the feasibility and effectiveness of the proposed algorithm. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

20. A Clustering Extension of HUEPs for the Analysis of Performance Anomalies in Robots.

Author: Basurto, Nuño, Cambra, Carlos, Herrero, Álvaro, and Urda, Daniel
Abstract: Errors in Cyber-Physical Systems present a major problem given the current state of technological complexity. Self-diagnosis can contribute to address it, being the standpoint of the present paper. Hence, an application of exploratory Machine Learning models to assess the functioning of robot software in order to identify anomalies that lead to low performance is proposed. More precisely, Hybrid Unsupervised Exploratory Plots (HUEPs) are extended through density-based clustering techniques, that are applied together with unsupervised exploratory projection models. As a result, intuitive and informative visualizations of software performance are obtained, supporting the monitoring and anomaly detection tasks. The proposed clustering extension of HUEPs is thoroughly validated on a massive and up-to-date open dataset, obtaining promising results. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. From Customer's Voice to Decision-Maker Insights: Textual Analysis Framework for Arabic Reviews of Saudi Arabia's Super App.

Author: Alrayani, Bodoor, Kalkatawi, Manal, Abulkhair, Maysoon, and Abukhodair, Felwa
Subjects: USER experience, SENTIMENT analysis, PRIVATE sector, K-means clustering, DATA mining
Abstract: Recently, business sectors have focused on offering a wide variety of services through utilizing different modern technologies such as super apps in order to fulfill customers' needs and create a satisfactory user experience. Accordingly, studying the user experience has become one of the most popular trends in the research field due to its essential role in business prosperity and continuity. Thus, many researchers have dedicated their efforts to exploring and analyzing the user experience across social media, blogs, and websites, employing a variety of research methods such as machine learning to mine users' reviews. However, there are limited studies concentrated on analyzing super app users' experiences and specifically mining Arabic users' reviews. Therefore, this paper aims to analyze and discover the most important topics that affect the user experience in the super app environment by mining Arabic business sector users' reviews in Saudi Arabia using biterm topic modeling, CAMeL sentiment analyzer, and doc2vec with k-means clustering. We explore users' feelings regarding the extracted topics in order to identify the weak aspects to improve and the strong aspects to enhance, which will promote a satisfactory user experience. Hence, this paper proposes an Arabic text annotation framework to help the business sector in Saudi Arabia to determine the important topics with negative and positive impacts on users' experience. The proposed framework uses two approaches: topic modeling with sentiment analysis and topic modeling with clustering. As a result, the proposed framework reveals four important topics: delivery and payment, customer service and updates, prices, and application. The retrieved topics are thoroughly studied, and the findings show that, in most topics, negative comments outweigh positive comments. These results are provided with general analysis and recommendations to help the business sector to improve its level of services. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Applying Hybrid Clustering with Evaluation by AUC Classification Metrics.

Author: Dakhil, Ali Fattah, Ali, Waffaa M., and Hasan, Mustafa Asaad
Subjects: MACHINE learning, RESEARCH questions, FALSE alarms, STATISTICAL learning, DISTANCE education
Abstract: Traditional metrics may not adequately assess performance in certain situations, whereas the Area Under Curve (AUC) offers a comprehensive perspective by considering both sensitivity and specificity. This method enhances interpretability, addresses limitations, and promotes the development of robust clustering algorithms. In unsupervised learning, utilizing AUC is a significant method for improving the precision and accuracy of machine learning models. Our work is inspired by several recent related works that implement approaches to manage the challenges of developing new metrics that can effectively assess and evaluate the performance of clustering algorithms. The research question relies on the concept of using an optimal metric for model evaluation of classification and clustering. Therefore, the paper investigates the use of the classification metric AUC for clustering validation purposes. The methodology we adopt is a hybrid clustering model because such a technique offers a robust model by combining the strengths of each model. The linkage approach directly impacts the clustering results, so we give significant attention to this feature in our implementation. Among the various linkage methods, we utilized single and average linkages. The Manhattan and Euclidean metrics are the distance measures used in this work. Thus, our contribution is to explore the benefit of using linkages and distance measurement in clustering with the help of the AUC metric. In addition, the entire proposed work and the contributions of this paper are evaluated and applied to the NSL-KDD dataset. Based on the proposed approach of using AUC with clustering, the Detection Rate (DR), False Alarm Rate (FAR), and other criteria are chosen to examine the model's results and capabilities. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. A Review of Predictive Analytics Models in the Oil and Gas Industries.

Author: R Azmi, Putri Azmira, Yusoff, Marina, and Mohd Sallehud-din, Mohamad Taufik
Subjects: PREDICTION models, GAS industry, PETROLEUM industry, MACHINE learning
Abstract: Enhancing the management and monitoring of oil and gas processes demands the development of precise predictive analytic techniques. Over the past two years, oil and its prediction have advanced significantly using conventional and modern machine learning techniques. Several review articles detail the developments in predictive maintenance and the technical and non-technical aspects of influencing the uptake of big data. The absence of references for machine learning techniques impacts the effective optimization of predictive analytics in the oil and gas sectors. This review paper offers readers thorough information on the latest machine learning methods utilized in this industry's predictive analytical modeling. This review covers different forms of machine learning techniques used in predictive analytical modeling from 2021 to 2023 (91 articles). It provides an overview of the details of the papers that were reviewed, describing the model's categories, the data's temporality, field, and name, the dataset's type, predictive analytics (classification, clustering, or prediction), the models' input and output parameters, the performance metrics, the optimal model, and the model's benefits and drawbacks. In addition, suggestions for future research directions to provide insights into the potential applications of the associated knowledge. This review can serve as a guide to enhance the effectiveness of predictive analytics models in the oil and gas industries. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. APPLICATION OF FUZZY METRICS IN CLUSTERING PROBLEMS OF AGRICULTURAL CROP VARIETIES.

Author: Stamenković, Andrijana, Milosavljević, Nataša, and Ralević, Nebojša M.
Subjects: CULTIVARS, CROPS, ARTIFICIAL intelligence, METAHEURISTIC algorithms, MACHINE learning
Abstract: The problem of image-based detection of the variety of beans, using artificial intelligence, is currently dealt with by scientists of various profiles. The idea of this paper is to show the possibility of applying different types of distances, primarily those that are fuzzy metrics, in clustering models in order to improve existing models and obtain more accurate results. The paper presents the method of variable neighborhood search, which uses both standard and fuzzy t-metrics and dual fuzzy s-metrics characterized by appropriate parameters. By varying those parameters of the fuzzy metric as well as the parameters of the metaheuristic used, we have shown how it is possible to improve the clustering results. The obtained results were compared with existing ones from the literature. The criterion function used in clustering is a fuzzy metric, which is proven in the paper. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. A Deep Diagnostic Framework Using Explainable Artificial Intelligence and Clustering.

Author: Thunold, Håvard Horgen, Riegler, Michael A., Yazidi, Anis, and Hammer, Hugo L.
Subjects: ARTIFICIAL intelligence, FEATURE extraction, MACHINE learning, DEEP learning, IMAGE recognition (Computer vision)
Abstract: An important part of diagnostics is to gain insight into properties that characterize a disease. Machine learning has been used for this purpose, for instance, to identify biomarkers in genomics. However, when patient data are presented as images, identifying properties that characterize a disease becomes far more challenging. A common strategy involves extracting features from the images and analyzing their occurrence in healthy versus pathological images. A limitation of this approach is that the ability to gain new insights into the disease from the data is constrained by the information in the extracted features. Typically, these features are manually extracted by humans, which further limits the potential for new insights. To overcome these limitations, in this paper, we propose a novel framework that provides insights into diseases without relying on handcrafted features or human intervention. Our framework is based on deep learning (DL), explainable artificial intelligence (XAI), and clustering. DL is employed to learn deep patterns, enabling efficient differentiation between healthy and pathological images. Explainable artificial intelligence (XAI) visualizes these patterns, and a novel "explanation-weighted" clustering technique is introduced to gain an overview of these patterns across multiple patients. We applied the method to images from the gastrointestinal tract. In addition to real healthy images and real images of polyps, some of the images had synthetic shapes added to represent other types of pathologies than polyps. The results show that our proposed method was capable of organizing the images based on the reasons they were diagnosed as pathological, achieving high cluster quality and a rand index close to or equal to one. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

26. Acoustic emission with machine learning in fracture of composites: preliminary study.

Author: Smolnicki, M., Duda, Sz., Stabla, P., Zielonka, P., and Lesiuk, G.
Abstract: In this paper, preliminary studies on the failure analysis of hybrid composite materials utilizing acoustic emission and machine learning are presented. The main purpose of this study was to analyze the possibilities of using machine learning techniques as a way to better cluster the data obtained from acoustic emission. In this paper, we focus on data preparation, feature extraction (Laplacian score), determination of cluster number (Caliński–Harabasz, Silhouette, and Davies–Bouldin), and testing three clustering techniques, namely K-means, fuzzy C-means, and spectral clustering. The dataset was obtained by testing fiber metal laminates—composites consisting of metal and composite layers. Two experimental tests were realized on pre-cracked rectangular specimens—one with loading in mode I and one with loading in mode II (DCB—double cantilever beam and ENF—end-notch flexural test). Elastic waves were recorded during these tests via an acoustic emission system. Preliminary studies show that the proposed method can be used successfully to cluster data obtained in this way. The obtained dataset was split into 3 clusters (for the ENF test) and 5 clusters (DCB test). In the next stages of the research campaign, based on the presented results, we intend to change the approach to semi-supervised by running additional single-cause damage tests to enhance the achieved results and enable easier damage recognition. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

27. Fire detection using deep learning methods.

Author: Bayegizova, Aigulim, Abdikerimova, Gulzira, Kaliyeva, Samal, Shaikhanova, Aigul, Shangytbayeva, Gulmira, Sugurova, Laura, Sugur, Zharkynay, and Saimanova, Zagira
Abstract: Fire detection is an important task in the field of safety and emergency prevention. In recent years, deep learning methods have shown high efficiency in solving various computer vision problems, including detecting objects in images. In this paper, monitoring wildfires was considered, which allows you to quickly respond to them and prevent their spread using deep learning methods. For the experiment, images from the satellite and images from the FireWatch sensor were taken as initial data. In this work, the deep learning algorithms you only look once (YOLO), convolutional neural network (CNN), and fast recurrent neural network (FastRNN) were considered, which makes it possible to determine the accuracy of a natural fire. As a result of the experiments, an automated fire recognition algorithm using YOLOv4 deep learning methods was created. It is expected that the results of the study will show that deep learning methods can be successfully applied to detect fire in images. This may lead to the development of automated monitoring systems capable of quickly and reliably detecting fire situations, which will help improve safety and reduce the risk of fires. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Exploring Price Patterns of Vegetables with Recurrence Quantification Analysis.

Author: Karakasidou, Sofia, Fragkou, Athanasios, Zachilas, Loukas, and Karakasidis, Theodoros
Subjects: VEGETABLES, MACHINE learning, DEEP learning, ARTIFICIAL intelligence
Abstract: This study investigates the time-series behavior of vegetable prices in the Central Market of Thessaloniki, Greece, using Recurrence Plot (RP) analysis and Recurrence Quantification Analysis (RQA), which considers non-linearities and does not necessitate stationarity of time series. The period of study was 1999–2016 for practical and research reasons. In the present work, we focus on vegetables available throughout the year, exploring the dynamics and interrelationships between their prices to avoid missing data. The study applies RP visual inspection classification, a clustering based on RQA parameters, and a classification based on the RQA analysis graphs with epochs for the first time. The aim of the paper was to investigate the grouping of products based on their price dynamical behavior. The results show that the formed groups present similarities related to their use as dishes and their way of cultivation, which apparently affect the price dynamics. The results offer insights into market behaviors, helping to inform better management strategies and policymaking and offer a possibility to predict variability of prices. This information can interest government policies in various directions, such as what products to develop for greater stability, identity for fluctuating prices, etc. In future work, a larger dataset including missing data could be included, as well as a machine-learning algorithm to classify the products based on the RQA with epochs graphs. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Exploring the Effectiveness of Shallow and L2 Learner-Suitable Textual Features for Supervised and Unsupervised Sentence-Based Readability Assessment.

Author: Kostadimas, Dimitris, Kermanidis, Katia Lida, and Andronikos, Theodore
Subjects: LANGUAGE ability testing, NATURAL languages, MACHINE learning, CLASSIFICATION, RESEARCH evaluation
Abstract: Simplicity in information found online is in demand from diverse user groups seeking better text comprehension and consumption of information in an easy and timely manner. Readability assessment, particularly at the sentence level, plays a vital role in aiding specific demographics, such as language learners. In this paper, we research model evaluation metrics, strategies for model creation, and the predictive capacity of features and feature sets in assessing readability based on sentence complexity. Our primary objective is to classify sentences as either simple or complex, shifting the focus from entire paragraphs or texts to individual sentences. We approach this challenge as both a classification and clustering task. Additionally, we emphasize our tests on shallow features that, despite their simplistic nature and ease of use, seem to yield decent results. Leveraging the TextStat Python library and the WEKA toolkit, we employ a wide variety of shallow features and classifiers. By comparing the outcomes across different models, algorithms, and feature sets, we aim to offer valuable insights into optimizing the setup. We draw our data from sentences sourced from Wikipedia's corpus, a widely accessed online encyclopedia catering to a broad audience. We strive to take a deeper look at what leads to greater readability classification in datasets that appeal to audiences such as Wikipedia's, assisting in the development of improved models and new features for future applications with low feature extraction/processing times. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. Assessing climate change vulnerability: A village level analysis of the Indian west coast.

Author: Kasthala, Sindhuja, Devanathan, Parthasarathy, Krishnan, Narayanan, Inamdar, Arun B., and Punyamoorty, Vineet
Subjects: CLIMATE change, COASTS, VILLAGES, SOCIOECONOMIC factors, INFRASTRUCTURE (Economics), MACHINE learning
Abstract: The Indian west coast is under constant threat from climate change-induced hazards. Various social, economic, and infrastructural disparities along the coast cause significant variations in climate vulnerability. Current literature assesses vulnerability either over (1) a large area with poor spatial resolution or (2) a local area with better spatial resolution. The former assessments provide more comprehensive and broad insights into large spatial trends of vulnerability, while the latter provide more accurate and specific inputs needed by the local governments for effective intervention. However, there is a lack of studies that assess vulnerability simultaneously at a high-resolution and over a large geographic area, due to inadequacies in existing methodologies and difficulty in data management and analysis. This is a key gap that we address in our paper. We assess climate vulnerability of the entire Indian west coast at the village level, and propose a novel machine-learning based methodology tailored for high-resolution assessment over large geographic areas. This helped us produce the first high-resolution (i.e. village-level) climate vulnerability map of the entire Indian west coast. We found that the state of Maharashtra has the highest number of vulnerable villages and the state of Kerala has the least number of vulnerable villages. We collate and utilize a large dataset of 112 indicators describing socioeconomic characteristics, infrastructure and availability of financial services, among other aspects, to obtain a comprehensive picture of vulnerability. We analyze geospatial trends and attribute high vulnerability to specific indicators, which will help in effective decision-making at the village level. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Multi-Objective Unsupervised Feature Selection and Cluster Based on Symbiotic Organism Search.

Author: AL-Gburi, Abbas Fadhil Jasim, Nazri, Mohd Zakree Ahmad, Yaakub, Mohd Ridzwan Bin, and Alyasseri, Zaid Abdi Alkareem
Subjects: ARTIFICIAL intelligence, FEATURE selection, MACHINE learning, SUPERVISED learning, DATA analytics
Abstract: Unsupervised learning is a type of machine learning that learns from data without human supervision. Unsupervised feature selection (UFS) is crucial in data analytics, which plays a vital role in enhancing the quality of results and reducing computational complexity in huge feature spaces. The UFS problem has been addressed in several research efforts. Recent studies have witnessed a surge in innovative techniques like nature-inspired algorithms for clustering and UFS problems. However, very few studies consider the UFS problem as a multi-objective problem to find the optimal trade-off between the number of selected features and model accuracy. This paper proposes a multi-objective symbiotic organism search algorithm for unsupervised feature selection (SOSUFS) and a symbiotic organism search-based clustering (SOSC) algorithm to generate the optimal feature subset for more accurate clustering. The efficiency and robustness of the proposed algorithm are investigated on benchmark datasets. The SOSUFS method, combined with SOSC, demonstrated the highest f-measure, whereas the KHCluster method resulted in the lowest f-measure. SOSFS effectively reduced the number of features by more than half. The proposed symbiotic organisms search-based optimal unsupervised feature-selection (SOSUFS) method, along with search-based optimal clustering (SOSC), was identified as the top-performing clustering approach. Following this, the SOSUFS method demonstrated strong performance. In summary, this empirical study indicates that the proposed algorithm significantly surpasses state-of-the-art algorithms in both efficiency and effectiveness. Unsupervised learning in artificial intelligence involves machine-learning techniques that learn from data without human supervision. Unlike supervised learning, unsupervised machine-learning models work with unlabeled data to uncover patterns and insights independently, without explicit guidance or instruction. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. An effective deep learning architecture leveraging BIRCH clustering for resource usage prediction of heterogeneous machines in cloud data center.

Author: Garg, Sheetal, Ahuja, Rohit, Singh, Raman, and Perl, Ivan
Subjects: HIERARCHICAL clustering (Cluster analysis), ARTIFICIAL intelligence, TRANSFORMER models, MACHINE learning, TIME series analysis, DEEP learning
Abstract: Given the rise in demand for cloud computing in the modern era, the effectiveness of resource utilization is eminent to decrease energy footprint and achieve economic services. With the emerging machine learning and artificial intelligence techniques to model and predict, it is essential to explore a principal method that provides the best solution for the accurate provisioning of forthcoming requests in a cloud data center. Recent studies used machine learning and other advanced analytics to predict resource usage; however, these do not consider long-range dependencies in the time series, which is essential to capture for better prediction. Further, they show limitations in handling noise, missing values, and outliers in datasets. In this paper, we explored the problem by studying three techniques that enabled us to answer improvements in short-term forecasting of physical machines' resource usage if the above factors are considered. We evaluated the predictions using Transformer and Informer deep learning models that cover the above aspects and compared them with the Long short-term memory (LSTM) model. We used a real-world Google cluster trace usage dataset and employed Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithm to select heterogeneous machines. The evaluation of the three models depicts that the Transformer architecture that considers long-range dependencies in time series and shortcomings with datasets shows improvement in forecasting with 14.2% reduction in RMSE than LSTM. However, LSTM shows better results for some machines than the Transformer, which depicts the importance of input sequence order. The Informer model, which considers both dependencies and is a hybrid of LSTM and Transformer, outperformed both models with 21.7% from LSTM and 20.8% from Transformer reduction in RMSE. The results also depict Informer model consistently performs better than the other models across all subsets of the dataset. Our study proves that considering long-range dependencies and sequence ordering for resource usage time series improves the prediction. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering.

Author: Chu, Zhiguang, He, Jingsha, Zhang, Xiaolei, Zhang, Xing, and Zhu, Nafei
Subjects: DATA privacy, DATABASES, FEATURE selection, MACHINE learning, ELECTRONIC data processing, CLUSTER analysis (Statistics)
Abstract: As a social information product, the privacy and usability of high-dimensional data are the core issues in the field of privacy protection. Feature selection is a commonly used dimensionality reduction processing technique for high-dimensional data. Some feature selection methods only process some of the features selected by the algorithm and do not take into account the information associated with the selected features, resulting in the usability of the final experimental results not being high. This paper proposes a hybrid method based on feature selection and a cluster analysis to solve the data utility and privacy problems of high-dimensional data in the actual publishing process. The proposed method is divided into three stages: (1) screening features; (2) analyzing the clustering of features; and (3) adaptive noise. This paper uses the Wisconsin Breast Cancer Diagnostic (WDBC) database from UCI's Machine Learning Library. Using classification accuracy to evaluate the performance of the proposed method, the experiments show that the original data are processed by the algorithm in this paper while protecting the sensitive data information while retaining the contribution of the data to the diagnostic results. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

34. Distributed Incremental Clustering Algorithms: A Bibliometric and Word-Cloud Review Analysis.

Author: Mulay, Preeti, Joshi, Rahul, and Chaudhari, Archana
Subjects: MACHINE learning, DISTRIBUTED algorithms, ALGORITHMS, AUTHOR-reader relationships, DATA analysis
Abstract: "Incremental Learning (IL)" is the niche area of "Machine Learning." It is of utmost essential to keep learning incremental for ever-increasing data from all domains for effectual decisions, predications and solving problems. This can be achieved effectually by applying "Incremental Clustering" methods on real-time data sources. IL can be achieved by "Incremental Clustering" easily as well as effectively. To achieve worldwide data analysis related to the data and to achieve broader perspectives, it is essential to deploy "Incremental Clustering" algorithms on distributed platforms, which will enable them to accept data from varied sources; analyze it and produce distributed worldwide solutions. This paper hence focuses on understanding the current status of "Distributed Incremental Clustering Algorithms (DICA)," its scope, limitations and other details so as to formulate better than the best algorithm in future. To enhance the analysis further Word-Clouds of impactful papers were explored and added in this paper, along with the details about platforms used to implement DICA by various upcoming researchers, readers and authors. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

35. Machine Learning Empowerment in Industry 4.0 – Case Study for Micro and Small Enterprises in Romania.

Author: Bogoevici, Flavia, Albu, Octavia, Duță, Ruxandra, and Chitca, Camelia
Subjects: MACHINE learning, INDUSTRY 4.0, SMALL business, CORPORATE culture
Abstract: In the world in which technology quickly integrates in our daily lives, businesses that incorporate digital innovation throughout their organizational culture, spanning from top-level executives to low-level employees are prone to emerge as industry frontrunners. Supported by Machine Learning, which stands out as a pivotal revolutionary tool, companies can enhance their productivity and operational efficiencies by incorporating remarkable automation capabilities, error reduction, superior predictive analysis, together with gaining valuable insights into future trends. The paper confers an overview of Machine Learning's capabilities, developed types, provided solutions and built architecture, through a conceptual structure. The paper elaborates these crucial concepts, offering a precise perspective on the topic and adopts a descriptive approach, elucidating the provided terminologies and ideas by referencing the related literature. The paper highlights in the initial part the outcomes resulting from the key advantages of Machine Learning and its impact on organizations, the path towards realizing substantial value through these digital advancements, emphasizing the priority organizations assign to cultivate their digital potential. The research performed in the second part of the paper aims at analyzing the progress of Romanian micro and small enterprises with implemented Machine Learning solutions, with detailed metrics and comprising k-means clustering, having the following objectives: automating repetitive tasks, improving planning and forecasting, increasing net profit, effortlessly discovering new patterns from large, diverse data models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Using deep learning algorithms to classify crop diseases.

Author: Murzabekova, Gulden, Glazyrina, Natalya, Nekessova, Anargul, Ismailova, Aisulu, Bazarova, Madina, Kashkimbayeva, Nurzhamal, Mukhametzhanova, Bigul, and Aldashova, Madina
Subjects: MACHINE learning, AGRICULTURAL technology, DEEP learning, AGRICULTURE, AERIAL photographs, NOSOLOGY, CLASSIFICATION algorithms
Abstract: The use of deep learning algorithms for the classification of crop diseases is one of the promising areas in agricultural technology. This is due to the need for rapid and accurate detection of plant diseases, which allows timely measures to be taken to treat them and prevent their spread. One of them is to increase productivity and maintain land quality through the timely detection of diseases and pests in agriculture and their elimination. Traditional classification methods in machine learning and algorithms in deep learning were compared to note the high accuracy in detecting pests and crop diseases. The advantages and disadvantages of each model considered during training were taken into account, and the Inception V3 algorithm was incorporated into the application. They can monitor the condition of crops on a daily basis with the help of new technology-applications on gadgets. Aerial photographs used by research institutes and agricultural grain centers do not show the changes that occur in agricultural grains, that is, diseases and pests. Therefore, the method proposed in this paper determines the types of diseases and pests of cereals through a mobile application and suggests ways to deal with them. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

37. Exploring the Advancements and Future Research Directions of Artificial Neural Networks: A Text Mining Approach.

Author: Kariri, Elham, Louati, Hassen, Louati, Ali, and Masmoudi, Fatma
Subjects: DEEP learning, ARTIFICIAL neural networks, TEXT mining, MACHINE learning, FEATURE selection, FEATURE extraction
Abstract: Artificial Neural Networks (ANNs) are machine learning algorithms inspired by the structure and function of the human brain. Their popularity has increased in recent years due to their ability to learn and improve through experience, making them suitable for a wide range of applications. ANNs are often used as part of deep learning, which enables them to learn, transfer knowledge, make predictions, and take action. This paper aims to provide a comprehensive understanding of ANNs and explore potential directions for future research. To achieve this, the paper analyzes 10,661 articles and 35,973 keywords from various journals using a text-mining approach. The results of the analysis show that there is a high level of interest in topics related to machine learning, deep learning, and ANNs and that research in this field is increasingly focusing on areas such as optimization techniques, feature extraction and selection, and clustering. The study presented in this paper is motivated by the need for a framework to guide the continued study and development of ANNs. By providing insights into the current state of research on ANNs, this paper aims to promote a deeper understanding of ANNs and to facilitate the development of new techniques and applications for ANNs in the future. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

38. Clustering Network Traffic Using Semi-Supervised Learning.

Author: Krajewska, Antonina and Niewiadomska-Szynkiewicz, Ewa
Subjects: MACHINE learning, MATRIX decomposition, COMPUTER network traffic, NONNEGATIVE matrices, ALGORITHMS
Abstract: Clustering algorithms play a crucial role in early warning cybersecurity systems. They allow for the detection of new attack patterns and anomalies and enhance system performance. This paper discusses the problem of clustering data collected by a distributed system of network honeypots. In the proposed approach, when a network flow matches an attack signature, an appropriate label is assigned to it. This enables the use of semi-supervised learning algorithms and improves the quality of clustering results. The article compares the results of learning algorithms conducted with and without partial supervision, particularly non-negative matrix factorization and semi-supervised non-negative matrix factorization. Our results confirm the positive impact of labeling a portion of flows on the quality of clustering. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Explainable Artificial Intelligence Methods to Enhance Transparency and Trust in Digital Deliberation Settings.

Author: Siachos, Ilias and Karacapilidis, Nikos
Subjects: ARTIFICIAL intelligence, INFORMATION overload, TRUST, MACHINE learning, CITIZENS
Abstract: Digital deliberation has been steadily growing in recent years, enabling citizens from different geographical locations and diverse opinions and expertise to participate in policy-making processes. Software platforms aiming to support digital deliberation usually suffer from information overload, due to the large amount of feedback that is often provided. While Machine Learning and Natural Language Processing techniques can alleviate this drawback, their complex structure discourages users from trusting their results. This paper proposes two Explainable Artificial Intelligence models to enhance transparency and trust in the modus operandi of the above techniques, which concern the processes of clustering and summarization of citizens' feedback that has been uploaded on a digital deliberation platform. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. New Strategies in Archaeometric Provenance Analyses of Volcanic Rock Grinding Stones: Examples from Iulia Libica (Spain) and Sidi Zahruni (Tunisia).

Author: Casas, Lluís, Di Febo, Roberta, Anglisano, Anna, Pitarch Martí, África, Queralt, Ignasi, Carreras, Cèsar, and Fouzai, Boutheina
Subjects: PRINCIPAL components analysis, VOLCANIC ash, tuff, etc., STONE, IGNEOUS rocks, ARCHAEOLOGICAL excavations
Abstract: Archaeometry can help archaeologists in many ways, and one of the most common archaeometric objectives is provenance analysis. Volcanic rocks are often found in archaeological sites as materials used to make grinding tools such as millstones and mortars or as building materials. Petrographic characterization is commonly applied to identify their main mineralogical components. However, the provenance study of volcanic stones is usually undertaken by comparing geochemical data from reference outcrops using common descriptive statistical tools such as biplots of chemical elements, and occasionally, unsupervised multivariate data analysis like principal component analysis (PCA) is also used. Recently, the use of supervised classification methods has shown a superior performance in assigning provenance to archaeological samples. However, these methods require the use of reference databases for all the possible provenance classes in order to train the classification models. The existence of comprehensive collections of published geochemical analyses of igneous rocks enables the use of the supervised approach for the provenance determination of volcanic stones. In this paper, the provenance of volcanic grinding tools from two archaeological sites (Iulia Libica, Spain, and Sidi Zahruni, Tunisia) is attempted using data from the GEOROC database through unsupervised and supervised approaches. The materials from Sidi Zahruni have been identified as basalts from Pantelleria (Italy), and the agreement between the different supervised classification models tested is particularly conclusive. In contrast, the provenance of the materials from Iulia Libica remained undetermined. The results illustrate the advantages and limitations of all the examined methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. A novel fuzzy clustering-based method for human activity recognition in cloud-based industrial IoT environment.

Author: Mittal, Himanshu, Tripathi, Ashish Kumar, Pandey, Avinash Chandra, Venu, P., Menon, Varun G., and Pal, Raju
Subjects: HUMAN activity recognition, INTERNET of things, VIDEO monitors, MACHINE learning, INDUSTRIAL safety, INDUSTRY 4.0
Abstract: With the advancement of technology such as video monitoring, Internet-of-things, cloud, and machine learning, Industry 4.0 is working continuously to ensure the security of workers. The workers are equipped with sensors to analyze their activities. In general, the recognition of human activities in cloud-based industrial scenario is leveraged to monitor the safety of the workers. This paper introduced a new optimal clustering method for the activity recognition of workers in industry using cloud based IoT environment. The proposed method uses the temporal and spatial features of human workers in industry. The proposed method is tested on publicly available dataset of different activities maintained into three groups, namely movement, gestures, and object handling, in the context of the medium and small industrial environment. The experimental findings validate that the proposed method achieves 80.2 % , 81.05 % and 80.19 % of average accuracy for movement, gesture, and object handling activities, which clearly outperformed the fuzzy c-means, particle-swarm optimization, and HMM-based activity recognition methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. Intelligent Exchange of Sustainable Tourist Habits among the EU Member States.

Author: Leal, Fátima and Pinho, Micaela
Subjects: SUSTAINABILITY, SUSTAINABLE tourism, MACHINE learning, TOURIST attractions, RECOMMENDER systems, TOURISTS
Abstract: Despite much research being conducted within the scope of sustainable tourism, more progress has yet to be made in defining how close or far different countries are from achieving this goal. Consequently, this paper aims to evaluate and compare the commitment of citizens, as tourists, from the 27 member states of the European Union to sustainable tourism. A map of sustainability was developed through the use of machine learning algorithms. A cluster analysis was performed, followed by a sustainable rating. The main findings indicate the existence of three country segments among the European Union member states according to the involvement of its citizens as tourists with sustainable practices: highly committed, moderately committed, and uncommitted. Based on these segments, we proposed a recommendation system that suggests the top-five countries where tourists could exchange sustainable tourism habits based on the idea of contagion or imitation behaviours among individuals across an extensive set of everyday decisions. The results reveal significant variations in sustainable tourism practices across member states, highlighting both challenges and opportunities for harmonisation. By implementing this recommendation system, we facilitate the adoption of sustainable habits among tourists and stakeholders, driving a more unified approach to sustainability in the multiple tourism destinations. This study shows no convergence between the 27 European Union member states regarding sustainable tourism. Therefore, political policies are necessary so that all citizens converge on sustainable tourist habits and the European Union contributes, as a whole, to sustainable tourism. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. A COMPREHENSIVE EVALUATION OF ROUGH SETS CLUSTERING IN UNCERTAINTY DRIVEN CONTEXTS.

Author: SZEDERJESI-DRAGOMIR, ARNOLD
Subjects: ROUGH sets, CLUSTER set theory, MACHINE learning, SUPERVISED learning
Abstract: This paper presents a comprehensive evaluation of the Agent BAsed Rough sets Clustering (ABARC) algorithm, an approach using rough sets theory for clustering in environments characterized by uncertainty. Several experiments utilizing standard datasets are performed in order to compare ABARC against a range of supervised and unsupervised learning algorithms. This comparison considers various internal and external performance measures to evaluate the quality of clustering. The results highlight the ABARC algorithm’s capability to effectively manage vague data and outliers, showcasing its advantage in handling uncertainty in data. Furthermore, they also emphasize the importance of choosing appropriate performance metrics, especially when evaluating clustering algorithms in scenarios with unclear or inconsistent data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. High-Dimensional Data Analysis Using Parameter Free Algorithm Data Point Positioning Analysis.

Author: Mustapha, S. M. F. D. Syed
Subjects: PATTERN recognition systems, DATA analysis, K-means clustering, DATA mining, ALGORITHMS, MACHINE learning, HIGH-dimensional model representation
Abstract: Clustering is an effective statistical data analysis technique; it has several applications, including data mining, pattern recognition, image analysis, bioinformatics, and machine learning. Clustering helps to partition data into groups of objects with distinct characteristics. Most of the methods for clustering use manually selected parameters to find the clusters from the dataset. Consequently, it can be very challenging and time-consuming to extract the optimal parameters for clustering a dataset. Moreover, some clustering methods are inadequate for locating clusters in high-dimensional data. To address these concerns systematically, this paper introduces a novel selection-free clustering technique named data point positioning analysis (DPPA). The proposed method is straightforward since it calculates 1-NN and Max-NN by analyzing the data point placements without the requirement of an initial manual parameter assignment. This method is validated using two well-known publicly available datasets used in several clustering algorithms. To compare the performance of the proposed method, this study also investigated four popular clustering algorithms (DBSCAN, affinity propagation, Mean Shift, and K-means), where the proposed method provides higher performance in finding the cluster without using any manually selected parameters. The experimental finding demonstrated that the proposed DPPA algorithm is less time-consuming compared to the existing traditional methods and achieves higher performance without using any manually selected parameters. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Classification of intestinal T cell receptor repertoires using machine learning methods can identify patients with coeliac disease regardless of dietary gluten status

Author: Anna Fowler, Oliver E Welsh, Killian Donovan, Andrew D. Foers, Michael FitzPatrick, M. Saad Shoukat, N. Collins, Paul Klenerman, Elizabeth J. Soilleux, Russell Petry, and Shelley C Evans
Subjects: 0301 basic medicine, Adult, Male, TRG, Concordance, T‐cell receptor repertoire, duodenum, Machine learning, computer.software_genre, Coeliac disease, Pathology and Forensic Medicine, Machine Learning, 03 medical and health sciences, Diet, Gluten-Free, 0302 clinical medicine, T‐lymphocyte, Intestine, Small, medicine, Humans, Pathological, chemistry.chemical_classification, Original Paper, business.industry, T-cell receptor, fungi, Receptors, Antigen, T-Cell, gamma-delta, Middle Aged, medicine.disease, Gluten, Original Papers, Lymphoma, Celiac Disease, 030104 developmental biology, medicine.anatomical_structure, chemistry, 030220 oncology & carcinogenesis, gluten, Duodenum, Female, TRD, Artificial intelligence, business, computer, CD8, coeliac disease, clustering
Abstract: In coeliac disease (CeD), immune‐mediated small intestinal damage is precipitated by gluten, leading to variable symptoms and complications, occasionally including aggressive T‐cell lymphoma. Diagnosis, based primarily on histopathological examination of duodenal biopsies, is confounded by poor concordance between pathologists and minimal histological abnormality if insufficient gluten is consumed. CeD pathogenesis involves both CD4+ T‐cell‐mediated gluten recognition and CD8+ and γδ T‐cell‐mediated inflammation, with a previous study demonstrating a permanent change in γδ T‐cell populations in CeD. We leveraged this understanding and explored the diagnostic utility of bulk T‐cell receptor (TCR) sequencing in assessing duodenal biopsies in CeD. Genomic DNA extracted from duodenal biopsies underwent sequencing for TCR‐δ (TRD) (CeD, n = 11; non‐CeD, n = 11) and TCR‐γ (TRG) (CeD, n = 33; non‐CeD, n = 21). We developed a novel machine learning‐based analysis of the TCR repertoire, clustering samples by diagnosis. Leave‐one‐out cross‐validation (LOOCV) was performed to validate the classification algorithm. Using TRD repertoire, 100% (22/22) of duodenal biopsies were correctly classified, with a LOOCV accuracy of 91%. Using TCR‐γ (TRG) repertoire, 94.4% (51/54) of duodenal biopsies were correctly classified, with LOOCV of 87%. Duodenal biopsy TRG repertoire analysis permitted accurate classification of biopsies from patients with CeD following a strict gluten‐free diet for at least 6 months, who would be misclassified by current tests. This result reflects permanent changes to the duodenal γδ TCR repertoire in CeD, even in the absence of gluten consumption. Our method could complement or replace histopathological diagnosis in CeD and might have particular clinical utility in the diagnostic testing of patients unable to tolerate dietary gluten, and for assessing duodenal biopsies with equivocal features. This approach is generalisable to any TCR/BCR locus and any sequencing platform, with potential to predict diagnosis or prognosis in conditions mediated or modulated by the adaptive immune response. © 2020 The Authors. The Journal of Pathology published by John Wiley & Sons, Ltd. on behalf of The Pathological Society of Great Britain and Ireland.
Published: 2020

46. MAPREDUCE FRAMEWORK BASED BIG DATA SUMMARIZATION USING HIDDEN MARKOV MODEL AND DBSCAN.

Author: Belerao, Krushnadeo and Chaudhari, S. B.
Subjects: BIG data, HIDDEN Markov models, DATA analysis, COMPUTER algorithms, MACHINE learning
Abstract: With the advent of the internet there is vast increase in the storage of information. Almost all the information exists in digital form which reduces lots of the paper work and increases ease of storage. Searching relevant information in collection of documents is a tedious task. The solution comes in picture for this problem is automatic text summarization. In this paper, abstract summary generation of multiple documents for big data is proposed which will consider user input as topic. The proposed technique is designed using DBSCAN algorithm which works with Map Reduce framework for clustering and Hidden Markov Model for summarization. The summarization process is performed in three main stages and provides a modular implementation of multiple documents summarization. The pro-posed method follows preprocessing step in which documents are scanned with similarity and various machine learning technique are applied. The result of applying clustering enhances the summarizer system to collect exact words rather than copying redundant words. Topic based abstract summarization from big data is challenging task particularly when there are multiple documents with same or different content. Hadoop with its programming techniques can provide better ways of generating summary and it also enhances the complexity of summarization process using distributed computing. [ABSTRACT FROM AUTHOR]
Published: 2018

47. Parkinson's disease prediction and drug personalization using machine learning techniques.

Author: Begum, M. Sharmila, Balajee, A., Kulothungan, S., Santhakumar, D., and Basheer, Shajahan
Subjects: PARKINSON'S disease, ROUGH sets, MACHINE learning, PRINCIPAL components analysis, DRUG utilization, RANDOM forest algorithms
Abstract: Parkinson disease (PD) is a neurodegenerative disease that occurs due to insufficient level of dopamine in the human brain. This disease may occur predominantly in elder people. There exists no definite procedure to diagnose PD. It is diagnosed based on the symptoms, clinical trials, and number of laboratory tests. In this research paper, machine learning techniques are used to predict PD and help the medical practitioner to recommend personalized drugs for the patients. In this research paper, appropriate features are selected through rough set theory, and principal component analysis is used for dimensionality reduction. The performance is experimented using deep neural network, random forest, and SVM classifiers. The efficiency of the proposed approach is measured through confusion matrix, accuracy, precision, and recall. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

48. AI/ML assisted shale gas production performance evaluation.

Author: Syed, Fahad I., Muther, Temoor, Dahaghi, Amirmasoud K., and Negahban, Shahin
Subjects: SHALE gas, OIL shales, ARTIFICIAL intelligence, SHALE gas reservoirs, ARTIFICIAL neural networks
Abstract: Shale gas reservoirs are contributing a major role in overall hydrocarbon production, especially in the United States, and due to the intense development of such reservoirs, it is a must thing to learn the productive methods for modeling production and performance evaluation. Consequently, one of the most adopted techniques these days for the sake of production performance analysis is the utilization of artificial intelligence (AI) and machine learning (ML). Hydrocarbon exploration and production is a continuous process that brings a lot of data from sub-surface as well as from the surface facilities. Availability of such a huge data set that keeps on increasing over time enhances the computational capabilities and performance accuracy through AI and ML applications using a data-driven approach. The ML approach can be utilized through supervised and unsupervised methods in addition to artificial neural networks (ANN). Other ML approaches include random forest (RF), support vector machine (SVM), boosting technique, clustering methods, and artificial network-based architecture, etc. In this paper, a systematic literature review is presented focused on the AI and ML applications for the shale gas production performance evaluation and their modeling. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

49. Behavior of Callers to a Crisis Helpline Before and During the COVID-19 Pandemic: Quantitative Data Analysis

Author: Louise Hamra, Mette Isaksen, Edel Ennis, Siobhan O'Neill, Maurice Mulvenna, Robin Turkington, Elizabeth Scowcroft, Jacqui Morrissey, Raymond Bond, Ciaran Moore, and Courtney Potts
Subjects: 020205 medical informatics, Population, coronavirus, caller behavior, 02 engineering and technology, Disease cluster, 03 medical and health sciences, 0302 clinical medicine, Pandemic, 0202 electrical engineering, electronic engineering, information engineering, medicine, Psychology, education, crisis helplines, Service (business), Government, education.field_of_study, Original Paper, pandemic, COVID-19, Mental illness, medicine.disease, Mental health, BF1-990, 030227 psychiatry, Psychiatry and Mental health, Distress, machine learning, mental health, Demography, clustering
Abstract: Background The World Health Organization declared the outbreak of COVID-19 to be an international pandemic in March 2020. While numbers of new confirmed cases of the disease and death tolls are rising at an alarming rate on a daily basis, there is concern that the pandemic and the measures taken to counteract it could cause an increase in distress among the public. Hence, there could be an increase in need for emotional support within the population, which is complicated further by the reduction of existing face-to-face mental health services as a result of measures taken to limit the spread of the virus. Objective The objective of this study was to determine whether the COVID-19 pandemic has had any influence on the calls made to Samaritans Ireland, a national crisis helpline within the Republic of Ireland. Methods This study presents an analysis of calls made to Samaritans Ireland in a four-week period before the first confirmed case of COVID-19 (calls=41,648, callers=3752) and calls made to the service within a four-week period after a restrictive lockdown was imposed by the government of the Republic of Ireland (calls=46,043, callers=3147). Statistical analysis was conducted to explore any differences between the duration of calls in the two periods at a global level and at an hourly level. We performed k-means clustering to determine the types of callers who used the helpline based on their helpline call usage behavior and to assess the impact of the pandemic on the caller type usage patterns. Results The analysis revealed that calls were of a longer duration in the postlockdown period in comparison with the pre–COVID-19 period. There were changes in the behavior of individuals in the cluster types defined by caller behavior, where some caller types tended to make longer calls to the service in the postlockdown period. There were also changes in caller behavior patterns with regard to the time of day of the call; variations were observed in the duration of calls at particular times of day, where average call durations increased in the early hours of the morning. Conclusions The results of this study highlight the impact of COVID-19 on a national crisis helpline service. Statistical differences were observed in caller behavior between the prelockdown and active lockdown periods. The findings suggest that service users relied on crisis helpline services more during the lockdown period due to an increased sense of isolation, worsening of underlying mental illness due to the pandemic, and reduction or overall removal of access to other support resources. Practical implications and limitations are discussed.
Published: 2020

50. Unsupervised identification of significant lineages of SARS-CoV-2 through scalable machine learning methods.

Author: Cahuantzi, Roberto, Lythgoe, Katrina A., Hall, Ian, Pellis, Lorenzo, and House, Thomas
Subjects: SARS-CoV-2 Omicron variant, MACHINE learning, SARS-CoV-2, IMMUNE response
Abstract: Since its emergence in late 2019, SARS-CoV-2 has diversified into a large number of lineages and caused multiple waves of infection globally. Novel lineages have the potential to spread rapidly and internationally if they have higher intrinsic transmissibility and/or can evade host immune responses, as has been seen with the Alpha, Delta, and Omicron variants of concern. They can also cause increased mortality and morbidity if they have increased virulence, as was seen for Alpha and Delta. Phylogenetic methods provide the "gold standard" for representing the global diversity of SARS-CoV-2 and to identify newly emerging lineages. However, these methods are computationally expensive, struggle when datasets get too large, and require manual curation to designate new lineages. These challenges provide a motivation to develop complementary methods that can incorporate all of the genetic data available without down-sampling to extract meaningful information rapidly and with minimal curation. In this paper, we demonstrate the utility of using algorithmic approaches based on word-statistics to represent whole sequences, bringing speed, scalability, and interpretability to the construction of genetic topologies. While not serving as a substitute for current phylogenetic analyses, the proposed methods can be used as a complementary, and fully automatable, approach to identify and confirm new emerging variants. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,557 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources