Descriptor: "unsupervised Machine Learning" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"unsupervised Machine Learning"' showing total 3,717 results

Start Over Descriptor "unsupervised Machine Learning"

3,717 results on '"unsupervised Machine Learning"'

1. The characteristics of policy supply in the construction of smart emergency management in China: Based on text mining method

Author: Wang, Yanqing, Chen, Hong, and Gu, Xiao
Published: 2025
Full Text: View/download PDF

2. Utilizing echocardiography and unsupervised machine learning for heart failure risk identification

Author: Simonsen, Jakob Øystein, Modin, Daniel, Skaarup, Kristoffer, Djernæs, Kasper, Lassen, Mats Christian Højbjerg, Johansen, Niklas Dyrby, Marott, Jacob Louis, Jensen, Magnus Thorsten, Jensen, Gorm B., Schnohr, Peter, Martínez, Sergio Sánchez, Claggett, Brian Lee, Møgelvang, Rasmus, and Biering-Sørensen, Tor
Published: 2025
Full Text: View/download PDF

3. Predictive model for novel subtypes of patients undergoing lower extremity amputation for peripheral artery disease: An unsupervised machine learning study

Author: Ma, Yuanliang, Zhang, Lin, Li, Que, and Qin, Xiao
Published: 2024
Full Text: View/download PDF

4. Unsupervised quality monitoring of metal additive manufacturing using Bayesian adaptive resonance

Author: Shevchik, S., Wrobel, R., Quang T, Le, Pandiyan, V., Hoffmann, P., Leinenbach, C., and Wasmer, K.
Published: 2024
Full Text: View/download PDF

5. Anomaly Detection in Binary Time Series Data: An unsupervised Machine Learning Approach for Condition Monitoring

Author: Princz, Gábor, Shaloo, Masoud, and Erol, Selim
Published: 2024
Full Text: View/download PDF

6. Unsupervised pattern identification in spatial gene expression atlas reveals mouse brain regions beyond established ontology.

Author: Cahill, Robert, Wang, Yu, Xian, R, Lee, Alex, Zeng, Hongkui, Yu, Bin, Tasic, Bosiljka, and Abbasi-Asl, Reza
Subjects: brain ontology, spatial gene expression, unsupervised learning, Animals, Mice, Brain, Gene Expression Profiling, Transcriptome, Algorithms, Unsupervised Machine Learning, Gene Ontology, Atlases as Topic, Gene Regulatory Networks, Principal Component Analysis
Abstract: The rapid growth of large-scale spatial gene expression data demands efficient and reliable computational tools to extract major trends of gene expression in their native spatial context. Here, we used stability-driven unsupervised learning (i.e., staNMF) to identify principal patterns (PPs) of 3D gene expression profiles and understand spatial gene distribution and anatomical localization at the whole mouse brain level. Our subsequent spatial correlation analysis systematically compared the PPs to known anatomical regions and ontology from the Allen Mouse Brain Atlas using spatial neighborhoods. We demonstrate that our stable and spatially coherent PPs, whose linear combinations accurately approximate the spatial gene data, are highly correlated with combinations of expert-annotated brain regions. These PPs yield a brain ontology based purely on spatial gene expression. Our PP identification approach outperforms principal component analysis and typical clustering algorithms on the same task. Moreover, we show that the stable PPs reveal marked regional imbalance of brainwide genetic architecture, leading to region-specific marker genes and gene coexpression networks. Our findings highlight the advantages of stability-driven machine learning for plausible biological discovery from dense spatial gene expression data, streamlining tasks that are infeasible by conventional manual approaches.
Published: 2024

7. Multi-modal contrastive learning of subcellular organization using DICE.

Author: Nasser, Rami, Schaffer, Leah, Ideker, Trey, and Sharan, Roded
Subjects: Humans, HEK293 Cells, Computational Biology, Protein Interaction Mapping, Proteins, Unsupervised Machine Learning
Abstract: The data deluge in biology calls for computational approaches that can integrate multiple datasets of different types to build a holistic view of biological processes or structures of interest. An emerging paradigm in this domain is the unsupervised learning of data embeddings that can be used for downstream clustering and classification tasks. While such approaches for integrating data of similar types are becoming common, there is scarcer work on consolidating different data modalities such as network and image information. Here, we introduce DICE (Data Integration through Contrastive Embedding), a contrastive learning model for multi-modal data integration. We apply this model to study the subcellular organization of proteins by integrating protein-protein interaction data and protein image data measured in HEK293 cells. We demonstrate the advantage of data integration over any single modality and show that our framework outperforms previous integration approaches. Availability: https://github.com/raminass/protein-contrastive Contact: raminass@gmail.com.
Published: 2024

8. Application of Unsupervised Learning in Detecting Behavioral Patterns in E-commerce Customers

Author: Udayan, J. Divya, Moneesh, N., Vemulapalli, Nehith Sai, Pruthvi, Paladugula, Sakhamuri, Rakshith, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Kumar, Amit, editor, Gunjan, Vinit Kumar, editor, Senatore, Sabrina, editor, and Hu, Yu-Chen, editor
Published: 2025
Full Text: View/download PDF

9. Sustainable HRM the next hotspot for management research? A study using topic modelling

Author: Singh, Shefali, Awasthi, Kanchan, Patra, Pradipta, Srivastava, Jaya, and Trivedi, Shrawan Kumar
Published: 2024
Full Text: View/download PDF

10. Explainable unsupervised anomaly detection for healthcare insurance data.

Author: De Meulemeester, Hannes, De Smet, Frank, van Dorst, Johan, Derroitte, Elise, and De Moor, Bart
Subjects: *ANOMALY detection (Computer security), *ARTIFICIAL intelligence, *HEALTH insurance, *FRAUD, *MACHINE learning
Abstract: Background: Waste and fraud are important problems for health insurers to deal with. With the advent of big data, these insurers are looking more and more towards data mining and machine learning methods to help in detecting waste and fraud. However, labeled data is costly and difficult to acquire as it requires expert investigators and known care providers with atypical behavior. Methods: In this work we show how recent advances in machine learning can be used to set up a workflow that can aid investigators in discovering practitioners or groups of practitioners with unusual resource use in order to more efficiently combat waste and fraud. We combine three different techniques, which have not been used in the context of healthcare insurance anomaly detection: categorical embeddings to deal with high-cardinality categorical variables, state-of-the-art unsupervised anomaly detection techniques to detect anomalies and Shapley additive explanations (SHAP) to explain the model output. Results: The method has been evaluated on providers with a known anomalous profile and with the help of experts of the largest health insurance fund in Belgium. The quantitative experiments show that categorical embeddings offer a significant improvement compared to standard methods and that the state-of-the-art unsupervised anomaly detection techniques generally show an improvement over traditional methods. In a practical setting, the proposed workflow with SHAP was able to detect a previously unknown, anomalous trend among general practitioners. Conclusions: The proposed workflow is able to detect known care providers with atypical behaviour and helps expert investigators in making informed decisions concerning possible fraud or overconsumption in the health insurance field. [ABSTRACT FROM AUTHOR]
Published: 2025
Full Text: View/download PDF

11. Changes in DNA methylation are associated with systemic lupus erythematosus flare remission and clinical subtypes.

Author: Horton, Mary K., Nititham, Joanne, Taylor, Kimberly E., Katz, Patricia, Ye, Chun Jimmie, Yazdany, Jinoos, Dall'Era, Maria, Hurabielle, Charlotte, Barcellos, Lisa F., Criswell, Lindsey A., and Lanata, Cristina M.
Subjects: *SYSTEMIC lupus erythematosus, *DNA methylation, *LIFE sciences, *HIERARCHICAL clustering (Cluster analysis), *METHYLATION
Abstract: Background: Systemic lupus erythematosus (SLE) has numerous symptoms across organs and an unpredictable flare-remittance pattern. This has made it challenging to understand drivers of long-term SLE outcomes. Our objective was to identify whether changes in DNA methylation over time, in an actively flaring SLE cohort, were associated with remission and whether these changes meaningfully subtype SLE patients. Methods: Fifty-nine multi-ethnic SLE patients had clinical visits and DNA methylation profiles at a flare and approximately 3 months later. Methylation was measured using the Illumina EPIC array. We identified sites where methylation change between visits was associated with remission at the follow-up visit using limma package and a time x remission interaction term. Models adjusted for batch, age at diagnosis, time between visits, age at flare, sex, medications, and cell-type proportions. Separately, a paired T-test identified Bonferroni significant methylation sites with ≥ 3% change between visits (n = 546). Methylation changes at these sites were used for unsupervised consensus hierarchical clustering. Associations between clusters and patient features were assessed. Results: Nineteen patients fully remitted at the follow-up visit. For 1,953 CpG sites, methylation changed differently for remitters vs. non-remitters (Bonferroni p < 0.05). Nearly half were within genes regulated by interferon. The largest effect was at cg22873177; on average, remitters had 23% decreased methylation between visits while non-remitters had no change. Three SLE patient clusters were identified using methylation differences agnostic of clinical outcomes. All Cluster 1 subjects (n = 12) experienced complete flare remission, despite similar baseline disease activity scores, medications, and demographics as other clusters. Methylation changes at six CpG sites, including within immune-related CD45 and IFI genes, were particularly distinct for each cluster, suggesting these may be good candidates for stratifying patients in the future. Conclusions: Changes in DNA methylation during active SLE were associated with remission status and identified subgroups of SLE patients with several distinct clinical and biological characteristics. DNA methylation patterns might help inform SLE subtypes, leading to targeted therapies based on relevant underlying biological pathways. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Radiomic Consensus Clustering in Glioblastoma and Association with Gene Expression Profiles.

Author: Wroblewski, Tadeusz H., Karabacak, Mert, Seah, Carina, Yong, Raymund L., and Margetis, Konstantinos
Subjects: *CONSENSUS (Social sciences), *DISEASE clusters, *GLIOMAS, *RADIOMICS, *KRUSKAL-Wallis Test, *FISHER exact test, *DESCRIPTIVE statistics, *MAGNETIC resonance imaging, *GENE expression profiling, *MACHINE learning, *DATA analysis software
Abstract: Simple Summary: Glioblastoma (GBM) is an aggressive primary central nervous system tumor with poor survival outcomes and limited treatment options. In this study, we investigate the use of radiomic features derived from magnetic resonance imaging (MRI) scans to identify unique gene expression profiles in a cohort of patients with GBM. This study grouped patients based on radiomic features using a consensus clustering approach, which iteratively clusters patients to find robust and stable groups. We identified three clusters which yielded unique gene expression profiles. Significant differentially expressed genes previously associated with GBM prognosis and treatment sensitivity were identified in one cluster. In pathway enrichment analyses, genes upregulated in immune-related and DNA metabolism pathways and downregulated protein and histone deacetylation pathways were identified in the same cluster. Together, these findings suggest that consensus clustering of radiomic features may be a promising avenue for non-invasive characterization of molecular heterogeneity of GBM. Background/Objectives: Glioblastoma (GBM) is the most common malignant primary central nervous system tumor with extremely poor prognosis and survival outcomes. Non-invasive methods like radiomic feature extraction, which assess sub-visual imaging features, provide a potentially powerful tool for distinguishing molecular profiles across groups of patients with GBM. Using consensus clustering of MRI-based radiomic features, this study aims to investigate differential gene expression profiles based on radiomic clusters. Methods: Patients from the TCGA and CPTAC datasets (n = 114) were included in this study. Radiomic features including T1, T1 with contrast, T2, and FLAIR MRI sequences were extracted using PyRadiomics. Selected radiomic features were then clustered using ConsensusClusterPlus (k-means base algorithm and Euclidean distance), which iteratively subsamples and clusters 80% of the data to identify stable clusters by calculating the frequency in which each patient is a member of a cluster across iterations. Gene expression data (available for n = 69 patients) was analyzed using differential gene expression (DEG) and gene set enrichment (GSEA) approaches, after batch correction using ComBat-seq. Results: Three distinct clusters were identified based on the relative consensus matrix and cumulative distribution plots (Cluster 1, n = 25; Cluster 2, n = 46; Cluster 3, n = 43). No significant differences in patient demographic characteristics, MGMT methylation status, tumor location, or overall survival were identified across clusters. Differentially expressed genes were identified in Cluster 1, which have been previously associated with GBM prognosis, recurrence, and treatment sensitivity. GSEA of Cluster 1 showed an enrichment of genes upregulated for immune-related and DNA metabolism pathways and genes downregulated in pathways associated with protein and histone deacetylation. Clusters 2 and 3 exhibited fewer DEGs which failed to reach significance after multiple testing corrections. Conclusions: Consensus clustering of radiomic features revealed unique gene expression profiles in the GBM cohort which likely represent subtle differences in tumor biology and radiosensitivity that are not visually discernible, underscoring the potential of radiomics to serve as a non-invasive alternative for identifying GBM molecular heterogeneity. Further investigation is still required to validate these findings and their clinical implications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Unveiling and mapping polymorphs in fluorite Y2TiO5 using 4D‐STEM and unsupervised machine learning.

Author: Hershkovitz, Eitan, Yoo, Timothy, Pu, Xiaofei, Bawane, Kaustubh, Nakayama, Tadachika, Suematsu, Hisayuki, He, Lingfeng, and Kim, Honggyu
Subjects: *SCANNING transmission electron microscopy, *RADIATION tolerance, *PERMITTIVITY, *MACHINE learning, *CERAMIC materials, *PYROCHLORE
Abstract: Y2TiO5 belongs to the Ln2TiO5 (Ln = lanthanide or Y) family of ceramic materials and exhibits a range of desirable material properties such as radiation tolerance, frustrated magnetism, and large dielectric constant. However, understanding the complex crystal structure of Y2TiO5 remains elusive, given that Y2TiO5 can adopt multiple polymorphs such as cubic, orthorhombic, and hexagonal phases within the lattice. In this work, we report a detailed structural analysis of Y2TiO5 using four‐dimensional scanning transmission electron microscopy coupled with unsupervised machine learning. The pyrochlore nanodomains, characterized by the ordered arrangement of yttrium cations on the A site of their A2BO5 structure, are present within the matrix of a predominantly fluorite‐structured Y2TiO5 along with a third polymorph, the hexagonal phase. The pyrochlore phase is found to form 2 nm boundary regions around hexagonal phase stacking faults, highlighting the potential influence of the hexagonal phase on the occurrence and distribution of the pyrochlore phase. Lastly, we identify a unique pyrochlore phase with asymmetric arrangement of cation ordering along a single planar direction. Our findings provide invaluable insights into the possible mechanisms stabilizing pyrochlore nanodomains within the fluorite lattice of Y2TiO5. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Unsupervised machine learning highlights the challenges of subtyping disorders of gut‐brain interaction.

Author: Dowrick, Jarrah M., Roy, Nicole C., Bayer, Simone, Frampton, Chris M. A., Talley, Nicholas J., Gearry, Richard B., and Angeli‐Gordon, Timothy R.
Subjects: *IRRITABLE colon, *MACHINE learning, *CLUSTER analysis (Statistics), *FACTOR analysis, *LEARNING communities
Abstract: Background: Unsupervised machine learning describes a collection of powerful techniques that seek to identify hidden patterns in unlabeled data. These techniques can be broadly categorized into dimension reduction, which transforms and combines the original set of measurements to simplify data, and cluster analysis, which seeks to group subjects based on some measure of similarity. Unsupervised machine learning can be used to explore alternative subtyping of disorders of gut‐brain interaction (DGBI) compared to the existing gastrointestinal symptom‐based definitions of Rome IV. Purpose: This present review aims to familiarize the reader with fundamental concepts of unsupervised machine learning using accessible definitions and provide a critical summary of their application to the evaluation of DGBI subtyping. By considering the overlap between Rome IV clinical definitions and identified clusters, along with clinical and physiological insights, this paper speculates on the possible implications for DGBI. Also considered are algorithmic developments in the unsupervised machine learning community that may help leverage increasingly available omics data to explore biologically informed definitions. Unsupervised machine learning challenges the modern subtyping of DGBI and, with the necessary clinical validation, has the potential to enhance future iterations of the Rome criteria to identify more homogeneous, diagnosable, and treatable patient populations. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. An optimized intelligent open-source MLaaS framework for user-friendly clustering and anomaly detection.

Author: ElDahshan, Kamal A., Abutaleb, Gaber E., Elemary, Berihan R., Ebeid, Ebeid A., and AlHabshy, AbdAllah A.
Subjects: *MACHINE learning, *ARTIFICIAL intelligence, *GAUSSIAN mixture models, *OUTLIER detection, *K-nearest neighbor classification
Abstract: As data grow exponentially, the demand for advanced intelligent solutions has become increasingly urgent. Unfortunately, not all businesses have the expertise to utilize machine learning algorithms effectively. To bridge this gap, the present paper introduces a cost-effective, user-friendly, dependable, adaptable, and scalable solution for visualizing, analyzing, processing, and extracting valuable insights from data. The proposed solution is an optimized open-source unsupervised machine learning as a service (MLaaS) framework that caters to both experts and non-experts in machine learning. The framework aims to assist companies and organizations in solving problems related to clustering and anomaly detection, even without prior experience or internal infrastructure. With a focus on several clustering and anomaly detection techniques, the proposed framework automates data processing while allowing user intervention. The proposed framework includes default algorithms for clustering and outlier detection. In the clustering category, it features three algorithms: k-means, hierarchical clustering, and DBScan clustering. For outlier detection, it includes local outlier factor, K-nearest neighbors, and Gaussian mixture model. Furthermore, the proposed solution is expandable; it may include additional algorithms. It is versatile and capable of handling diverse datasets by generating separate rapid artificial intelligence models for each dataset and facilitating their comparison rapidly. The proposed framework provides a solution through a representational state transfer application programming interface, enabling seamless integration with various systems. Real-world testing of the proposed framework on customer segmentation and fraud detection data demonstrates that it is reliable, efficient, cost-effective, and time-saving. With the innovative MLaaS framework, companies may harness the full potential of business analysis. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Unsupervised Machine Learning for Automatic Image Segmentation of Impact Damage in CFRP Composites.

Author: Zhupanska, Olesya and Krokhmal, Pavlo
Abstract: In this work, a novel unsupervised machine learning (ML) method for automatic image segmentation of low velocity impact damage in carbon fiber reinforced polymer (CFRP) composites has been developed. The method relies on the use of non-parametric statistical models in conjunction with the so-called intensity-based segmentation, enabling one to determine the thresholds of image histograms and isolate the damage. Statistical distance metrics, including the Kullback–Leibler divergence, the Helling distance, and the Renyi divergence are used to formulate and solve optimization problems for finding the thresholds. The developed method enabled rigorous and rapid automatic image segmentation of the grayscale images from the micro computed tomography (micro-CT) scans of the impacted CFRP composites. Sensitivity of the segmentation results with respect to the thresholds obtained using different statistical distances has been investigated. Based on the analysis of the segmentation results, it is concluded that the Kullback-Leibler divergence is the most appropriate statistical measure and should be used for automatic image segmentation of impact damage in CFRP composites. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Value‐centric analysis of user adoption for sustainable urban micro‐mobility transportation through shared e‐scooter services.

Author: Çallı, Levent and Çallı, Büşra Alma
Subjects: CONSUMER behavior, URBAN transportation, SUSTAINABLE urban development, MACHINE learning, CITIES & towns
Abstract: Micro‐mobility services, which are considered a sustainable alternative to traditional transportation modes, have gained substantial popularity due to advancements in mobile technology. As one of those modes of transportation, shared e‐scooter services have encouraged several startups in urban areas, allowing them to reach massive numbers of consumers in a highly competitive environment. This study aims to explore gains and barriers that affect the intention of consumers to use shared e‐scooter services, all within the framework of sustainability‐driven considerations. The Latent Dirichlet Allocation (LDA) Algorithm was used to analyse 24.798 reviews from the Google Play Store, uncovering eight topics. Those topics were used to discover customer value perceptions in the shared e‐scooter context and compare them with the related literature on perceived value. Besides, their impact on user ratings on the mobile application platform was measured using machine learning algorithms. The study's findings are expected to contribute to developing regulations for shared e‐scooter services, which have gained popularity as an eco‐friendly mode of transportation sustainability in urban areas by introducing a novel perspective. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. A framework for unsupervised learning and predictive maintenance in Industry 4.0.

Author: Nota, G., Nota, F. D., Toro, A., and Nastasia, M.
Subjects: INDUSTRY 4.0, SUPERVISED learning, MAINTENANCE, INTERNET of things, GAUSSIAN mixture models
Abstract: In recent decades, the economic importance of maintaining machines, equipment, and production facilities has prompted many scholars to examine various aspects of the maintenance of physical assets. However, the industry continues to face the recurring problem of improving product and equipment maintenance processes. New opportunities for improving these processes arise from Industry 4.0 technologies because they make it possible to realize better solutions to the problem of predictive maintenance. Starting from a Big Data and Internet of Things (IoT) architecture as a reference, this paper proposes an abstract framework for predictive maintenance using unsupervised learning models to support decision-making in maintenance programs. From the abstract framework, a predictive maintenance system was developed to enable effective just-in-time maintenance strategies. An unsupervised machine learning algorithm, based on the Gauxian mixtures model, allows us to study the influence on a machine's behavior of a single variable, a group of variables of the same type, and combined variables of different types. The algorithm provides experts with information on which part of the machine they need to focus on to find potential causes of future failures. The case study conducted for an Italian automotive company shows preliminary results on the effectiveness of the approach. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Maximum power point tracking using unsupervised learning for photovoltaic power systems.

Author: Guessoum, Djamel, Takruri, Maen, Badawi, Sufian A., Farhat, Maissa, and ElBadawi, Isam
Subjects: PARTICLE swarm optimization, PHOTOVOLTAIC power systems, SOLAR panels, K-means clustering, SOLAR energy, MAXIMUM power point trackers
Abstract: The Maximum Power Point Tracking (MPPT) technique in the solar energy field optimises the performance of solar panels in different atmospheric conditions and variable loads. In this study, we present a new method that uses unsupervised learning (K-means clustering) to identify the atmospheric clusters of solar irradiance and cell temperature (G, T) and delimit homogeneous atmospheric zones or clusters to reduce the search space of the optimal parameters (${V_{mp}}$ V mp , ${I_{mp}}$ I mp , and ${P_{mp}}$ P mp ). The data collected for one year is segregated into 12 clusters; in every cluster, 04 regions are defined based on every cluster's centroid (${G_c}$ G c , ${T_c}$ T c ). A local search of the reference voltage/Duty cycle per cluster region is initiated for every sensed (G, T). Variable atmospheric conditions and resistive loads are tested. The results show that the efficiency of the DC/DC converter is 97.5% with a settling time (4.013 ms/5.577 ms) compared to the Perturb and Observe (P&O) the conventional tracking method and the Particle Swarm Optimisation (PSO), both applied locally inside a cluster and a deviation of 2% from the global maximum. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Generative Simplex Mapping: Non-Linear Endmember Extraction and Spectral Unmixing for Hyperspectral Imagery.

Author: Waczak, John and Lary, David J.
Subjects: *WATER pollution, *EXPECTATION-maximization algorithms, *MATRIX decomposition, *NONNEGATIVE matrices, *NONLINEAR estimation
Abstract: We introduce a new model for non-linear endmember extraction and spectral unmixing of hyperspectral imagery called Generative Simplex Mapping (GSM). The model represents endmember mixing using a latent space of points sampled within a (n − 1) -simplex corresponding to n unique sources. Barycentric coordinates within this simplex are naturally interpreted as relative endmember abundances satisfying both the abundance sum-to-one and abundance non-negativity constraints. Points in this latent space are mapped to reflectance spectra via a flexible function combining linear and non-linear mixing. Due to the probabilistic formulation of the GSM, spectral variability is also estimated by a precision parameter describing the distribution of observed spectra. Model parameters are determined using a generalized expectation-maximization algorithm, which guarantees non-negativity for extracted endmembers. We first compare the GSM against three varieties of non-negative matrix factorization (NMF) on a synthetic data set of linearly mixed spectra from the USGS spectral database. Here, the GSM performed favorably for both endmember accuracy and abundance estimation with all non-linear contributions driven to zero by the fitting procedure. In a second experiment, we apply the GTM to model non-linear mixing in real hyperspectral imagery captured over a pond in North Texas. The model accurately identified spectral signatures corresponding to near-shore algae, water, and rhodamine tracer dye introduced into the pond to simulate water contamination by a localized source. Abundance maps generated using the GSM accurately track the evolution of the dye plume as it mixes into the surrounding water. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. The application of template matching and novel unsupervised and supervised machine learning methodology to forensic hair analysis.

Author: Airlie, Melissa, Robertson, James, and Brooks, Elizabeth
Subjects: *SUPERVISED learning, *INTELLIGENCE tests, *HAIR analysis, *EXPERTISE, *HAIR
Abstract: Hair pigmentation is a valuable feature in forensic hair analysis and hair comparisons. Microscopic pigmentation features from participants' hair were classified and documented. The discriminating power of template matching, unsupervised and supervised machine learning methods were compared to determine which method best distinguished between participants and accurately assigned hairto an individual, based on pigmentation features. Template matching analysis revealed a higher frequency of inter-person matches compared to intra-person matches, and the unsupervised model’s predicted labels did not align with true labels. These findings suggest the presence of an unknown dimensionality beyond the observable pigmentation features used in forensic hair classification, highlighting the crucial role of forensic expertise. In contrast, supervised machine learning demonstrated superior accuracy and greater interpretability due to its training on labelled data, making it more appropriate for forensic applications. This research underscores the continued importance of human expertise, particularly in the initial classification of training data for supervised machine learning. The potential applications of supervised machine learning in forensic science include training, competency testing and rapid intelligence as a novel and innovative tool while advancing the field towards more objective methodologies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Clustering Electrophysiological Predisposition to Binge Drinking: An Unsupervised Machine Learning Analysis.

Author: Uceta, Marcos, Cerro‐León, Alberto del, Shpakivska‐Bilán, Danylyna, García‐Moreno, Luis M., Maestú, Fernando, and Antón‐Toro, Luis Fernando
Subjects: *BINGE drinking, *COMPULSIVE behavior, *ALCOHOL drinking, *MACHINE learning, *FUNCTIONAL connectivity
Abstract: Background: The demand for fresh strategies to analyze intricate multidimensional data in neuroscience is increasingly evident. One of the most complex events during our neurodevelopment is adolescence, where our nervous system suffers constant changes, not only in neuroanatomical traits but also in neurophysiological components. One of the most impactful factors we deal with during this time is our environment, especially when encountering external factors such as social behaviors or substance consumption. Binge drinking (BD) has emerged as an extended pattern of alcohol consumption in teenagers, not only affecting their future lifestyle but also changing their neurodevelopment. Recent studies have changed their scope into finding predisposition factors that may lead adolescents into this kind of patterns of consumption. Methods: In this article, using unsupervised machine learning (UML) algorithms, we analyze the relationship between electrophysiological activity of healthy teenagers and the levels of consumption they had 2 years later. We used hierarchical agglomerative UML techniques based on Ward's minimum variance criterion to clusterize relations between power spectrum and functional connectivity and alcohol consumption, based on similarity in their correlations, in frequency bands from theta to gamma. Results: We found that all frequency bands studied had a pattern of clusterization based on anatomical regions of interest related to neurodevelopment and cognitive and behavioral aspects of addiction, highlighting the dorsolateral and medial prefrontal, the sensorimotor, the medial posterior, and the occipital cortices. All these patterns, of great cohesion and coherence, showed an abnormal electrophysiological activity, representing a dysregulation in the development of core resting‐state networks. The clusters found maintained not only plausibility in nature but also robustness, making this a great example of the usage of UML in the analysis of electrophysiological activity—a new perspective into analysis that, while contributing to classical statistics, can clarify new characteristics of the variables of interest. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. Unsupervised Machine Learning to Identify Risk Factors of Pyeloplasty Failure in Ureteropelvic Junction Obstruction.

Author: Song, Jonathan J., Kielhofner, Jane, Qian, Zhiyu, Gu, Catherine, Boysen, William, Chang, Steven, Dahl, Douglas, Eswara, Jairam, Haleblian, George, Wintner, Anton, and Wollin, Daniel A.
Subjects: *URETERIC obstruction, *DISEASE risk factors, *ABDOMINAL surgery, *MULTIHOSPITAL systems, *SURGICAL complications
Abstract: Introduction: In adult patients with ureteropelvic junction obstruction (UPJO), little data exist on predicting pyeloplasty outcome, and there is no unified definition of pyeloplasty success. As such, defining pyeloplasty success retrospectively is particularly vulnerable to bias, allowing researchers to choose significant outcomes with the benefit of hindsight. To mitigate these biases, we performed an unsupervised machine learning cluster analysis on a dataset of 216 pyeloplasty patients between 2015 and 2023 from a multihospital system to identify the defining risk factors of patients that experience worse outcomes. Methods: A KPrototypes model was fitted with pre- and perioperative data and blinded to postoperative outcomes. T-test and chi-square tests were performed to look at significant differences of characteristics between clusters. SHapley Additive exPlanation values were calculated from a random forest classifier to determine the most predictive features of cluster membership. A logistic regression model identified which of the most predictive variables remained significant after adjusting for confounding effects. Results: Two distinct clusters were identified. One cluster (denoted as "high-risk") contained 111 (51.4%) patients and was identified by having more comorbidities, such as old age (62.7 vs 35.7), high body mass index (BMI) (26.9 vs 23.8), hypertension (66.7% vs 17.1%), and previous abdominal surgery (72.1% vs 37.1%) and was found to have worse outcomes, such as more frequent severe postoperative complications (7.2% vs 1.0%). After adjusting for confounding effects, the most predictive features of high-risk cluster membership were old age, low preoperative estimated glomerular filtration rate (eGFR), hypertension, greater BMI, previous abdominal surgery, and left-sided UPJO. Conclusions: Adult UPJO patients with older age, lower eGFR, hypertension, greater BMI, previous abdominal surgery, and left-sided UPJO naturally cluster into to a group that more commonly suffers from perioperative complications and worse outcomes. Preoperative counseling and perioperative management for patients with these risk factors may need to be thought of or approached differently. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Delineating three distinct spatiotemporal patterns of brain atrophy in Parkinson's disease.

Author: Sakato, Yusuke, Shima, Atsushi, Terada, Yuta, Takeda, Kiyoaki, Sakamaki-Tsukita, Haruhi, Nishida, Akira, Yoshimura, Kenji, Wada, Ikko, Furukawa, Koji, Kambe, Daisuke, Togo, Hiroki, Mukai, Yohei, Sawamura, Masanori, Nakanishi, Etsuro, Yamakado, Hodaka, Fushimi, Yasutaka, Okada, Tomohisa, Takahashi, Yuji, Nakamoto, Yuji, and Takahashi, Ryosuke
Subjects: *PATHOLOGY, *BRAIN stem, *MACHINE learning, *TEMPORAL lobe, *PARKINSON'S disease
Abstract: The clinical manifestation of Parkinson's disease exhibits significant heterogeneity in the prevalence of non-motor symptoms and the rate of progression of motor symptoms, suggesting that Parkinson's disease can be classified into distinct subtypes. In this study, we aimed to explore this heterogeneity by identifying a set of subtypes with distinct patterns of spatiotemporal trajectories of neurodegeneration. We applied Subtype and Stage Inference (SuStaIn), an unsupervised machine learning algorithm that combined disease progression modelling with clustering methods, to cortical and subcortical neurodegeneration visible on 3 T structural MRI of a large cross-sectional sample of 504 patients and 279 healthy controls. Serial longitudinal data were available for a subset of 178 patients at the 2-year follow-up and for 140 patients at the 4-year follow-up. In a subset of 210 patients, concomitant Alzheimer's disease pathology was assessed by evaluating amyloid-β concentrations in the CSF or via the amyloid-specific radiotracer 18F-flutemetamol with PET. The SuStaIn analysis revealed three distinct subtypes, each characterized by unique patterns of spatiotemporal evolution of brain atrophy: neocortical, limbic and brainstem. In the neocortical subtype, a reduction in brain volume occurred in the frontal and parietal cortices in the earliest disease stage and progressed across the entire neocortex during the early stage, although with relative sparing of the striatum, pallidum, accumbens area and brainstem. The limbic subtype represented comparative regional vulnerability, which was characterized by early volume loss in the amygdala, accumbens area, striatum and temporal cortex, subsequently spreading to the parietal and frontal cortices across disease stage. The brainstem subtype showed gradual rostral progression from the brainstem extending to the amygdala and hippocampus, followed by the temporal and other cortices. Longitudinal MRI data confirmed that 77.8% of participants at the 2-year follow-up and 84.0% at the 4-year follow-up were assigned to subtypes consistent with estimates from the cross-sectional data. This three-subtype model aligned with empirically proposed subtypes based on age at onset, because the neocortical subtype demonstrated characteristics similar to those found in the old-onset phenotype, including older onset and cognitive decline symptoms (P < 0.05). Moreover, the subtypes correspond to the three categories of the neuropathological consensus criteria for symptomatic patients with Lewy pathology, proposing neocortex-, limbic- and brainstem-predominant patterns as different subgroups of α-synuclein distributions. Among the subtypes, the prevalence of biomarker evidence of amyloid-β pathology was comparable. Upon validation, the subtype model might be applied to individual cases, potentially serving as a biomarker to track disease progression and predict temporal evolution. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Effective Machine Learning Solution for State Classification and Productivity Identification: Case of Pneumatic Pressing Machine.

Author: Kolokas, Alexandros, Mallioris, Panagiotis, Koutsiantzis, Michalis, Bialas, Christos, Bechtsis, Dimitrios, and Diamantis, Evangelos
Subjects: REMAINING useful life, ARTIFICIAL intelligence, DATA analytics, INDUSTRY 4.0, MACHINE learning
Abstract: The fourth industrial revolution (Industry 4.0) brought significant changes in manufacturing, driven by technologies like artificial intelligence (AI), Internet of Things (IoT), 5G, robotics, and big data analytics. For industries to remain competitive, the primary goals must be the improvement of the efficiency and safety of machinery, the reduction of production costs, and the enhancement of product quality. Predictive maintenance (PdM) utilizes historical data and AI models to diagnose equipment's health and predict the remaining useful life (RUL), providing critical insights for machinery effectiveness and product manufacturing. This prediction is a critical strategy to maximize the useful life of equipment, especially in large-scale and important infostructures. This study focuses on developing an unsupervised machine state-classification solution utilizing real-world industrial measurements collected from a pneumatic pressing machine. Unsupervised machine learning (ML) models were tested to diagnose and output the working state of the pressing machine at each given point (offline, idle, pressing, defective). Our research contributes to extracting valuable insights regarding real-world industrial settings for PdM and production efficiency using unsupervised ML, promoting operation safety, cost reduction, and productivity enhancement in modern industries. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Unsupervised Machine Learning for Effective Code Smell Detection: A Novel Method

Author: Ruchin Gupta, Narendra Kumar, Sunil Kumar, and Jitendra Kumar Seth
Subjects: code smell, unsupervised machine learning, open-source java projects, Computer software, QA76.75-76.765
Abstract: The quality of source code is negatively impacted by code smells. Since the term "code smell" originated, numerous attempts have been made to comprehend it by identifying it using various techniques, such as metric-based, heuristic-based, optimization-based, machine learning (ML)-based, etc. Among these, supervised machine learning (SML) has shown effectiveness in detecting code smells. However, SML techniques have significant limitations, including the dependency on expensive and high-quality labeled data, the need for representative training datasets, and the risk of introducing biases in labeled examples that lead to skewed predictions. To overcome these challenges, this study introduces a method that leverages unsupervised machine learning (UnML) along with feature engineering. Unlike SML, UnML does not require labeled data and minimizes potential biases. The proposed method was evaluated using four datasets containing different types of code smells and was compared with a previous study that used SML techniques. The results indicate that the UnML-based method is effective, achieving outcomes closely aligned with those from the SML approach. This method is especially beneficial in situations where labeled data is scarce or unavailable and can be used to identify new code smells, generate labeled data for SML and detect multiple code smells simultaneously within a codebase.
Published: 2024
Full Text: View/download PDF

27. A framework for unsupervised learning and predictive maintenance in Industry 4.0

Author: Giancarlo Nota, Francesco David Nota, Alonso Toro, and Michele Nastasia
Subjects: predictive maintenance, industry 4.0, internet of things, unsupervised machine learning, gaussian mixtures, Industrial engineering. Management engineering, T55.4-60.8
Abstract: In recent decades, the economic importance of maintaining machines, equipment, and production facilities has prompted many scholars to examine various aspects of the maintenance of physical assets. However, the industry continues to face the recurring problem of improving product and equipment maintenance processes. New opportunities for improving these processes arise from Industry 4.0 technologies because they make it possible to realize better solutions to the problem of predictive maintenance. Starting from a Big Data and Internet of Things (IoT) architecture as a reference, this paper proposes an abstract framework for predictive maintenance using unsupervised learning models to support decision-making in maintenance programs. From the abstract framework, a predictive maintenance system was developed to enable effective just-in-time maintenance strategies. An unsupervised machine learning algorithm, based on the Gauxian mixtures model, allows us to study the influence on a machine's behavior of a single variable, a group of variables of the same type, and combined variables of different types. The algorithm provides experts with information on which part of the machine they need to focus on to find potential causes of future failures. The case study conducted for an Italian automotive company shows preliminary results on the effectiveness of the approach.
Published: 2024
Full Text: View/download PDF

28. Maximum power point tracking using unsupervised learning for photovoltaic power systems

Author: Djamel Guessoum, Maen Takruri, Sufian A. Badawi, Maissa Farhat, and Isam ElBadawi
Subjects: PV panel, clustering, K-means, MPPT, boost converter, unsupervised machine learning, Engineering (General). Civil engineering (General), TA1-2040
Abstract: The Maximum Power Point Tracking (MPPT) technique in the solar energy field optimises the performance of solar panels in different atmospheric conditions and variable loads. In this study, we present a new method that uses unsupervised learning (K-means clustering) to identify the atmospheric clusters of solar irradiance and cell temperature (G, T) and delimit homogeneous atmospheric zones or clusters to reduce the search space of the optimal parameters ([Formula: see text], [Formula: see text], and [Formula: see text]). The data collected for one year is segregated into 12 clusters; in every cluster, 04 regions are defined based on every cluster’s centroid ([Formula: see text], [Formula: see text]). A local search of the reference voltage/Duty cycle per cluster region is initiated for every sensed (G, T). Variable atmospheric conditions and resistive loads are tested. The results show that the efficiency of the DC/DC converter is 97.5% with a settling time (4.013 ms/5.577 ms) compared to the Perturb and Observe (P&O) the conventional tracking method and the Particle Swarm Optimisation (PSO), both applied locally inside a cluster and a deviation of 2% from the global maximum.
Published: 2024
Full Text: View/download PDF

29. Dissonance between posts of health agencies and public comments regarding COVID-19 and vaccination on Facebook in Northern California

Author: Calabrese, Christopher, Xue, Haoning, and Zhang, Jingwen
Subjects: Epidemiology, Health Services and Systems, Public Health, Health Sciences, Health Disparities, Emerging Infectious Diseases, Infectious Diseases, Vaccine Related, Prevention, Machine Learning and Artificial Intelligence, Coronaviruses, Biotechnology, Immunization, Infection, Good Health and Well Being, Humans, COVID-19, California, Social Media, COVID-19 Vaccines, Vaccination Hesitancy, SARS-CoV-2, Vaccination, Machine Learning, Health agencies, Unsupervised machine learning, Emotion, Facebook, Social media, Northern California, Public Health and Health Services
Abstract: BackgroundPublic health crises, such as the COVID-19 pandemic, have prompted a need for health agencies to improve their disease preparedness strategies, informing their communities of new information and promoting preventive behaviors to help curb the spread of the virus.MethodsWe ran unsupervised machine learning and emotion analysis, validated with manual coding, on posts of health agencies (N = 1588) and their associated public comments (N = 7813) during a crucial initial period of the COVID-19 pandemic (January 2020 to February 2021) among nine different counties with a higher proportion of vaccine-hesitant communities in Northern California. In addition, we explored differences in concerns and expressed emotions by two key group-level factors, county-level COVID-19 death rate and political party affiliation.ResultsWe consistently find that while health agencies primarily disseminated information about COVID-19 and the vaccine, they failed to address the concerns of their communities as expressed in public comment sections. Topics among public audiences focused on concerns with the COVID-19 vaccine safety and rollout, state mandates, flu vaccination, and frustration with politicians, and they expressed more positive and more negative emotions than health agencies. Further, there were several differences in primary topics and emotions expressed among public audiences by county-level COVID-19 death rate and political party affiliation.ConclusionWhile this research serves as a case study, findings indicate how local health agencies, and their audiences, discuss their perceptions and concerns regarding the COVID-19 pandemic and may inform health communication researchers and practitioners on how to prepare and manage for emerging health crises.
Published: 2024

30. Interpretable unsupervised learning enables accurate clustering with high-throughput imaging flow cytometry.

Author: Tang, Rui, Zhu, Yuxuan, Guo, Han, Qu, Yunjia, Xie, Pengtao, Lian, Ian, Wang, Yingxiao, Zhang, Zunming, Lo, Yu-Hwa, and Chen, Xinyu
Subjects: Humans, Unsupervised Machine Learning, Flow Cytometry, Algorithms, Neural Networks, Computer, Cluster Analysis
Abstract: A primary challenge of high-throughput imaging flow cytometry (IFC) is to analyze the vast amount of imaging data, especially in applications where ground truth labels are unavailable or hard to obtain. We present an unsupervised deep embedding algorithm, the Deep Convolutional Autoencoder-based Clustering (DCAEC) model, to cluster label-free IFC images without any prior knowledge of input labels. The DCAEC model first encodes the input images into the latent representations and then clusters based on the latent representations. Using the DCAEC model, we achieve a balanced accuracy of 91.9% for human white blood cell (WBC) clustering and 97.9% for WBC/leukemia clustering using the 3D IFC images and 3D DCAEC model. Above all, although no human recognizable features can separate the clusters of cells with protein localization, we demonstrate the fused DCAEC model can achieve a cluster balanced accuracy of 85.3% from the label-free 2D transmission and 3D side scattering images. To reveal how the neural network recognizes features beyond human ability, we use the gradient-weighted class activation mapping method to discover the cluster-specific visual patterns automatically. Evaluation results show that the automatically identified salient image regions have strong cluster-specific visual patterns for different clusters, which we believe is a stride for the interpretable neural network for cell analysis with high-throughput IFCs.
Published: 2023

31. Rapid measurement and machine learning classification of colour vision deficiency.

Author: He, Jingyi, Bex, Peter, and Skerswetat, Jan
Subjects: colour detection, colour discrimination, colour vision deficiency, cone-isolating directions, k-means clustering, unsupervised machine learning, vision diagnostics, Humans, Color Vision Defects, Color Vision, Vision Tests, Machine Learning, Cardiovascular Diseases, Color Perception
Abstract: Colour vision deficiencies (CVDs) indicate potential genetic variations and can be important biomarkers of acquired impairment in many neuro-ophthalmic diseases. However, CVDs are typically measured with tests which possess high sensitivity for detecting the presence of a CVD but do not quantify its type or severity. In this study, we introduce Foraging Interactive D-prime (FInD), a novel computer-based, generalisable, rapid, self-administered vision assessment tool and apply it to colour vision testing. This signal detection theory-based adaptive paradigm computed test stimulus intensity from d-prime analysis. Stimuli were chromatic Gaussian blobs in dynamic luminance noise, and participants clicked on cells that contained chromatic blobs (detection) or blob pairs of differing colours (discrimination). Sensitivity and repeatability of FInD colour tasks were compared against the Hardy-Rand-Rittler and the Farnsworth-Munsell 100 hue tests in 19 colour-normal and 18 inherited colour-atypical, age-matched observers. Rayleigh colour match was also completed. Detection and discrimination thresholds were higher for atypical than for typical observers, with selective threshold elevations corresponding to unique CVD types. Classifications of CVD type and severity via unsupervised machine learning confirmed functional subtypes. FInD tasks reliably detect inherited CVDs, and may serve as valuable tools in basic and clinical colour vision science.
Published: 2023

32. iVRIDA-fleet: unsupervised rail vehicle running instability detection algorithm for passenger vehicle fleet.

Author: Kulkarni, Rohan, Berg, Mats, Qazizadeh, Alireza, Söderström, Pör, Wheelwright, Hugh, Vincent, David, and Li, Martin
Subjects: *DETECTION algorithms, *K-means clustering, *PRINCIPAL components analysis, *AUTOENCODER, *MECHANICAL wear
Abstract: Identifying faults contributing to unsafe conditions, such as a high-speed rail vehicle running instability, is crucial to ensuring operational safety. But the occurrence of vehicle running instability during regular operation across the whole vehicle fleet is a rare anomaly. An unsupervised anomaly detection (AD) based iVRIDA-fleet framework is therefore proposed to detect vehicle running instability and identify its root cause. The performance of Principal Component Analysis (PCA-AD, baseline model), Sparse Autoencoder (SAE-AD), and LSTM Encoder Decoder (LSTMEncDec-AD) models are evaluated to detect the occurrence of vehicle running instability. A k-means algorithm is then applied to latent space representations to identify various clusters associated with different root causes of observed vehicle running instability. The effectiveness of the proposed iVRIDA-fleet framework is demonstrated using onboard accelerations measured on a Swedish X2000 vehicle fleet. The probability of vehicle running instability occurrence is observed to be only 0.35% of onboard accelerations corresponding to 827,467 km travel distance. Furthermore, the root causes identified by the iVRIDA-fleet framework are validated by investigating the maintenance records of the vehicles and track. It is identified that heavily worn wheels were the primary root cause of observed vehicle running instability, but the track (actual gauge and rail profiles) was also a contributing factor. The proposed algorithm contributes towards the digitalisation of vehicle and track maintenance by intelligently identifying anomalous events of the vehicle–track dynamic interaction. Abbreviations: AD: Anomaly Detection; AL: Alignment Level (Lateral); BDL: TrackSection Id; D1: 3–25 m; wavelength rate of track irregularity; FRA: Federal Railroad Administration (USA); IN2TRACK3: Research into enhanced track and switchand crossing system 3; iVRIDA: Intelligent Vehicle Running Instability Detection Algorithm; iVRIDA-fleet: iVRIDA for fleet; LSTMEncDec: LSTM Encoder Decoder Network; LSTMEncDec-AD: LSTM Encoder Decoder Anomaly Detection; PCA-AD: Principal Component Analysis – Anomaly Detection; SAE-AD: Sparse Autoencoder-Anomaly Detection; TG: Track Gauge; VFI: Vehicle Fault Identification; VFIA: VFIAccuracy; VRID: Vehicle Running Instability Detection; Wz: Wertungszahl Ride Index. [ABSTRACT FROM AUTHOR]
Published: 2025
Full Text: View/download PDF

33. Machine learning-assisted rapid determination for traditional Chinese Medicine Constitution

Author: Wen Sun, Minghua Bai, Ji Wang, Bei Wang, Yixing Liu, Qi Wang, and Dongran Han
Subjects: Automated machine learning (AutoML), Unsupervised machine learning, Constitution in Chinese Medicine Questionnaire (CCMQ), Tree-based Pipeline Optimization Tool (TPOT), Variable clustering (varclus), Other systems of medicine, RZ201-999
Abstract: Abstract The aim of this study was to develop a machine learning-assisted rapid determination methodology for traditional Chinese Medicine Constitution. Based on the Constitution in Chinese Medicine Questionnaire (CCMQ), the most applied diagnostic instrument for assessing individuals’ constitutions, we employed automated supervised machine learning algorithms (i.e., Tree-based Pipeline Optimization Tool; TPOT) on all the possible item combinations for each subscale and an unsupervised machine learning algorithm (i.e., variable clustering; varclus) on the whole scale to select items that can best predict body constitution (BC) classifications or BC scores. By utilizing subsets of items selected based on TPOT and corresponding machine learning algorithms, the accuracies of BC classifications prediction ranged from 0.819 to 0.936, with the root mean square errors of BC scores prediction stabilizing between 6.241 and 9.877. Overall, the results suggested that the automated machine learning algorithms performed better than the varclus algorithm for item selection. Additionally, based on an automated machine learning item selection procedure, we provided the top three ranked item combinations with each possible subscale length, along with their corresponding algorithms for predicting BC classification and severity. This approach could accommodate the needs of different practitioners in traditional Chinese medicine for rapid constitution determination.
Published: 2024
Full Text: View/download PDF

34. Dissonance between posts of health agencies and public comments regarding COVID-19 and vaccination on Facebook in Northern California

Author: Christopher Calabrese, Haoning Xue, and Jingwen Zhang
Subjects: COVID-19, Vaccination, Health agencies, Unsupervised machine learning, Emotion, Facebook, Public aspects of medicine, RA1-1270
Abstract: Abstract Background Public health crises, such as the COVID-19 pandemic, have prompted a need for health agencies to improve their disease preparedness strategies, informing their communities of new information and promoting preventive behaviors to help curb the spread of the virus. Methods We ran unsupervised machine learning and emotion analysis, validated with manual coding, on posts of health agencies (N = 1588) and their associated public comments (N = 7813) during a crucial initial period of the COVID-19 pandemic (January 2020 to February 2021) among nine different counties with a higher proportion of vaccine-hesitant communities in Northern California. In addition, we explored differences in concerns and expressed emotions by two key group-level factors, county-level COVID-19 death rate and political party affiliation. Results We consistently find that while health agencies primarily disseminated information about COVID-19 and the vaccine, they failed to address the concerns of their communities as expressed in public comment sections. Topics among public audiences focused on concerns with the COVID-19 vaccine safety and rollout, state mandates, flu vaccination, and frustration with politicians, and they expressed more positive and more negative emotions than health agencies. Further, there were several differences in primary topics and emotions expressed among public audiences by county-level COVID-19 death rate and political party affiliation. Conclusion While this research serves as a case study, findings indicate how local health agencies, and their audiences, discuss their perceptions and concerns regarding the COVID-19 pandemic and may inform health communication researchers and practitioners on how to prepare and manage for emerging health crises.
Published: 2024
Full Text: View/download PDF

35. Unsupervised machine learning for identifying attention-deficit/hyperactivity disorder subtypes based on cognitive function and their implications for brain structure.

Author: Yamashita, Masatoshi, Shou, Qiulu, and Mizuno, Yoshifumi
Subjects: *BRAIN anatomy, *LANGUAGE & languages, *ATTENTION-deficit hyperactivity disorder, *COGNITIVE testing, *CLUSTER analysis (Statistics), *RESEARCH funding, *COGNITIVE processing speed, *EPISODIC memory, *PREFRONTAL cortex, *TEMPORAL lobe, *MACHINE learning, *SHORT-term memory, *SYMPTOMS
Abstract: Background: Structural anomalies in the frontal lobe and basal ganglia have been reported in patients with attention-deficit/hyperactivity disorder (ADHD). However, these findings have been not always consistent because of ADHD diversity. This study aimed to identify ADHD subtypes based on cognitive function and find their distinct brain structural characteristics. Methods: Using the data of 656 children with ADHD from the Adolescent Brain Cognitive Development (ABCD) Study, we applied unsupervised machine learning to identify ADHD subtypes using the National Institutes of Health Toolbox Tasks. Moreover, we compared the regional brain volumes between each ADHD subtype and 6601 children without ADHD (non-ADHD). Results: Hierarchical cluster analysis automatically classified ADHD into three distinct subtypes: ADHD-A (n = 212, characterized by high-order cognitive ability), ADHD-B (n = 190, characterized by low cognitive control, processing speed, and episodic memory), and ADHD-C (n = 254, characterized by strikingly low cognitive control, working memory, episodic memory, and language ability). Structural analyses revealed that the ADHD-C type had significantly smaller volumes of the left inferior temporal gyrus and right lateral orbitofrontal cortex than the non-ADHD group, and the right lateral orbitofrontal cortex volume was positively correlated with language performance in the ADHD-C type. However, the volumes of the ADHD-A and ADHD-B types were not significantly different from those of the non-ADHD group. Conclusions: These results indicate the presence of anomalies in the lateral orbitofrontal cortex associated with language deficits in the ADHD-C type. Subtype specificity may explain previous inconsistencies in brain structural anomalies reported in ADHD. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. cellPLATO -- an unsupervised method for identifying cell behaviour in heterogeneous cell trajectory data.

Author: Shannon, Michael J., Eisman, Shira E., Lowe, Alan R., Sloan, Tyler F. W., and Mace, Emily M.
Subjects: *CELL migration, *KILLER cells, *CELL morphology, *SOFTWARE measurement, *CELL analysis, *PYTHON programming language
Abstract: Advances in imaging, segmentation and tracking have led to the routine generation of large and complex microscopy datasets. New tools are required to process this 'phenomics' type data. Here, we present 'Cell PLasticity Analysis Tool' (cellPLATO), a Python-based analysis software designed for measurement and classification of cell behaviours based on clustering features of cell morphology and motility. Used after segmentation and tracking, the tool extracts features from each cell per timepoint, using them to segregate cells into dimensionally reduced behavioural subtypes. Resultant cell tracks describe a 'behavioural ID' at each timepoint, and similarity analysis allows the grouping of behavioural sequences into discrete trajectories with assigned IDs. Here, we use cellPLATO to investigate the role of IL-15 inmodulating human natural killer (NK) cell migration on ICAM-1 or VCAM-1.We find eight behavioural subsets of NK cells based on their shape and migration dynamics between single timepoints, and four trajectories based on sequences of these behaviours over time. Therefore, by using cellPLATO, we show that IL-15 increases plasticity between cell migration behaviours and that different integrin ligands induce different forms of NK cell migration. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Illuminating the Hierarchical Segmentation of Faults Through an Unsupervised Learning Approach Applied to Clouds of Earthquake Hypocenters.

Author: Piegari, E., Camanni, G., Mercurio, M., and Marzocchi, W.
Subjects: *EARTHQUAKE hazard analysis, *FOCAL planes, *PRINCIPAL components analysis, *SEISMIC event location, *EARTHQUAKES, *TSUNAMI warning systems
Abstract: We propose a workflow for the recognition of the hierarchical segmentation of faults through earthquake hypocenter clustering without prior information. Our approach combines density‐based clustering algorithms (DBSCAN and OPTICS), and principal component analysis (PCA). Given a spatial distribution of earthquake hypocenters, DBSCAN identifies first‐order clusters, representing regions with the highest density of connected seismic events. Within each first‐order cluster, OPTICS further identifies nested higher‐order clusters, providing information on their number and size. PCA analysis is applied to first‐ and higher‐order clusters to evaluate eigenvalues, allowing discrimination between seismicity associated with planar features and distributed seismicity that remains uncategorized. The identified planes are then geometrically characterized in terms of their location and orientation in the space, length, and height. This automated procedure operates within two spatial scales: the largest scale corresponds to the longest pattern of approximately equally dense earthquake clouds, while the smallest scale relates to earthquake location errors. By applying PCA analysis, a planar feature outputted from a first‐order cluster can be interpreted as a fault surface while planes outputted after OPTICS can be interpreted as fault segments comprised within the fault surface. The evenness between the orientation of illuminated fault surfaces and fault segments, and that of the nodal planes of earthquake focal mechanisms calculated along the same faults, corroborates this interpretation. Our workflow has been successfully applied to earthquake hypocenter distributions from various seismically active areas (Italy, Taiwan, and California) associated with faults exhibiting diverse kinematics. Plain Language Summary: Active faults are associated with ongoing movement and seismic activity. Recognizing them within large clouds of earthquake hypocenters is at the same time challenging and crucial for seismic hazard estimates. Here, we present a new procedure that can illuminate fault surfaces and its constituting segments by exclusively using hypocenter locations and their spatial density. We apply our approach to hypocenter distributions from various seismically active areas (Italy, Taiwan, and California). The evenness between the orientation of illuminated fault surfaces and fault segments, and that derived from other data sources, corroborates our workflow. This workflow is showed to be an effective tool to derive unbiased fault geometries. It also offers new perspectives for the study of the relationships between seismic activity patterns and fault segment interactions, as well as seismic forecasting. Key Points: We present a workflow to reveal segmented fault surfaces within hypocenter distributions through unsupervised learning algorithmsComparison with earthquake focal mechanisms corroborates the procedureWe derive a hierarchical order of planar fault segments associated with different types of faulting [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Artificial intelligence driven definition of food preference endotypes in UK Biobank volunteers is associated with distinctive health outcomes and blood based metabolomic and proteomic profiles.

Author: Navratilova, Hana F., Whetton, Anthony D., and Geifman, Nophar
Subjects: *INSULIN-like growth factor-binding proteins, *DIETARY patterns, *FOOD preferences, *SWEETNESS (Taste), *CHRONIC kidney failure
Abstract: Background: Specific food preferences can determine an individual's dietary patterns and therefore, may be associated with certain health risks and benefits. Methods: Using food preference questionnaire (FPQ) data from a subset comprising over 180,000 UK Biobank participants, we employed Latent Profile Analysis (LPA) approach to identify the main patterns or profiles among participants. blood biochemistry across groups/profiles was compared using the non-parametric Kruskal–Wallis test. We applied the Limma algorithm for differential abundance analysis on 168 metabolites and 2923 proteins, and utilized the Database for Annotation, Visualization and Integrated Discovery (DAVID) to identify enriched biological processes and pathways. Relative risks (RR) were calculated for chronic diseases and mental conditions per group, adjusting for sociodemographic factors. Results: Based on their food preferences, three profiles were termed: the putative Health-conscious group (low preference for animal-based or sweet foods, and high preference for vegetables and fruits), the Omnivore group (high preference for all foods), and the putative Sweet-tooth group (high preference for sweet foods and sweetened beverages). The Health-conscious group exhibited lower risk of heart failure (RR = 0.86, 95%CI 0.79–0.93) and chronic kidney disease (RR = 0.69, 95%CI 0.65–0.74) compared to the two other groups. The Sweet-tooth group had greater risk of depression (RR = 1.27, 95%CI 1.21–1.34), diabetes (RR = 1.15, 95%CI 1.01–1.31), and stroke (RR = 1.22, 95%CI 1.15–1.31) compared to the other two groups. Cancer (overall) relative risk showed little difference across the Health-conscious, Omnivore, and Sweet-tooth groups with RR of 0.98 (95%CI 0.96–1.01), 1.00 (95%CI 0.98–1.03), and 1.01 (95%CI 0.98–1.04), respectively. The Health-conscious group was associated with lower levels of inflammatory biomarkers (e.g., C-reactive Protein) which are also known to be elevated in those with common metabolic diseases (e.g., cardiovascular disease). Other markers modulated in the Health-conscious group, ketone bodies, insulin-like growth factor-binding protein (IGFBP), and Growth Hormone 1 were more abundant, while leptin was less abundant. Further, the IGFBP pathway, which influences IGF1 activity, may be significantly enhanced by dietary choices. Conclusions: These observations align with previous findings from studies focusing on weight loss interventions, which include a reduction in leptin levels. Overall, the Health-conscious group, with preference to healthier food options, has better health outcomes, compared to Sweet-tooth and Omnivore groups. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Ecological momentary assessment (EMA) combined with unsupervised machine learning shows sensitivity to identify individuals in potential need for psychiatric assessment.

Author: Wenzel, Julian, Dreschke, Nils, Hanssen, Esther, Rosen, Marlene, Ilankovic, Andrej, Kambeitz, Joseph, Fett, Anne-Kathrin, and Kambeitz-Ilankovic, Lana
Subjects: *ECOLOGICAL momentary assessments (Clinical psychology), *SYMPTOM burden, *MACHINE learning, *PSYCHOSES, *CLUSTER analysis (Statistics)
Abstract: Ecological momentary assessment (EMA), a structured diary assessment technique, has shown feasibility to capture psychotic(-like) symptoms across different study groups. We investigated whether EMA combined with unsupervised machine learning can distinguish groups on the continuum of genetic risk toward psychotic illness and identify individuals with need for extended healthcare. Individuals with psychotic disorder (PD, N = 55), healthy individuals (HC, N = 25) and HC with first-degree relatives with psychosis (RE, N = 20) were assessed at two sites over 7 days using EMA. Cluster analysis determined subgroups based on similarities in longitudinal trajectories of psychotic symptom ratings in EMA, agnostic of study group assignment. Psychotic symptom ratings were calculated as average of items related to hallucinations and paranoid ideas. Prior to EMA we assessed symptoms using the Positive and Negative Syndrome Scale (PANSS) and the Community Assessment of Psychic Experience (CAPE) to characterize the EMA subgroups. We identified two clusters with distinct longitudinal EMA characteristics. Cluster 1 (NPD = 12, NRE = 1, NHC = 2) showed higher mean EMA symptom ratings as compared to cluster 2 (NPD = 43, NRE = 19, NHC = 23) (p < 0.001). Cluster 1 showed a higher burden on negative (p < 0.05) and positive (p < 0.05) psychotic symptoms in cross-sectional PANSS and CAPE ratings than cluster 2. Findings indicate a separation of PD with high symptom burden (cluster 1) from PD with healthy-like rating patterns grouping together with HC and RE (cluster 2). Individuals in cluster 1 might particularly profit from exchange with a clinician underlining the idea of EMA as clinical monitoring tool. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. Designing Porous Structure with Optimized Topology using Machine Learning.

Author: Ghansiyal, Shradha, Yi, Li, Klar, Matthias, and Aurich, Jan C.
Abstract: Biomedical engineering relies on topology optimization to refine material distribution, crucial for lightweight, high-performance prostheses and orthoses. Advanced manufacturing techniques like additive manufacturing can then be used to create these intricate designs layer by layer, ensuring precision and customization. However, conventional numerical simulation-based topology optimization methods can be time-consuming and resource-intensive, especially as the design domain expands. To overcome this issue, machine learning models are investigated for their ability to perform topology optimization. The results indicate a significant decrease in computation time, along with comparable optimization performance to conventional methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. CLUSTERDC: A New Density-Based Clustering Algorithm and its Application in a Geological Material Characterization Workflow.

Author: Meyrieux, Maximilien, Hmoud, Samer, van Geffen, Pim, and Kaeter, David
Subjects: PROBABILITY density function, ORE deposits, WASTE products, HIERARCHICAL clustering (Cluster analysis), MINING corporations
Abstract: The ore and waste materials extracted from a mineral deposit during the mining process can have significant variations in their physical and chemical characteristics. The current approaches to geological material characterization are often subjective and usually involve a significant human workload, as there is no optimized, well-defined, and robust methodology to perform this task. This paper proposes a robust, data-driven workflow for geological material characterization. The methodology involves selecting relevant features as a starting point to discriminate between material types. The workflow then employs a robust, state-of-the-art nonlinear dimension reduction (DR) algorithm when the dataset is multidimensional to obtain a two-dimensional embedding. From this two-dimensional embedding, a kernel density estimation (KDE) function is derived. Subsequently, a new clustering algorithm, named ClusterDC, is employed to generate clusters from the KDE function, accurately reflecting geological material types while achieving scalable clustering performance on large drillhole datasets. ClusterDC is a density-based clustering algorithm capable of delineating and ranking high-density zones corresponding to clusters of data samples from a two-dimensional KDE function. The algorithm reduces subjectivity by automatically determining optimal cluster numbers and minimizing reliance on hyperparameters. It also offers hierarchical and flexible clustering, allowing users to group or split clusters, optimally reassign data samples, and identify cluster core points as well as potential outliers. Two case studies were carried out to test the algorithm and demonstrate its application to geochemical drill-core assay data. The results of these case studies demonstrate that the application of ClusterDC in the presented workflow supports the characterization of geological material types based on multi-element geochemistry and thus has the potential to help mining companies optimize downstream processes and mitigate technical risks by improving their understanding of their orebodies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. Derivation and validation of generalized sepsis-induced acute respiratory failure phenotypes among critically ill patients: a retrospective study.

Author: Choudhary, Tilendra, Upadhyaya, Pulakesh, Davis, Carolyn M., Yang, Philip, Tallowin, Simon, Lisboa, Felipe A., Schobel, Seth A., Coopersmith, Craig M., Elster, Eric A., Buchman, Timothy G., Dente, Christopher J., and Kamaleswaran, Rishikesan
Abstract: Background: Septic patients who develop acute respiratory failure (ARF) requiring mechanical ventilation represent a heterogenous subgroup of critically ill patients with widely variable clinical characteristics. Identifying distinct phenotypes of these patients may reveal insights about the broader heterogeneity in the clinical course of sepsis, considering multi-organ dynamics. We aimed to derive novel phenotypes of sepsis-induced ARF using observational clinical data and investigate the generalizability of the derived phenotypes. Methods: We performed a multi-center retrospective study of ICU patients with sepsis who required mechanical ventilation for ≥ 24 h. Data from two different high-volume academic hospital centers were used, where all phenotypes were derived in MICU of Hospital-I (N = 3225). The derived phenotypes were validated in MICU of Hospital-II (N = 848), SICU of Hospital-I (N = 1112), and SICU of Hospital-II (N = 465). Clinical data from 24 h preceding intubation was used to derive distinct phenotypes using an explainable machine learning-based clustering model interpreted by clinical experts. Results: Four distinct ARF phenotypes were identified: A (severe multi-organ dysfunction (MOD) with a high likelihood of kidney injury and heart failure), B (severe hypoxemic respiratory failure [median P/F = 123]), C (mild hypoxia [median P/F = 240]), and D (severe MOD with a high likelihood of hepatic injury, coagulopathy, and lactic acidosis). Patients in each phenotype showed differences in clinical course and mortality rates despite similarities in demographics and admission co-morbidities. The phenotypes were reproduced in external validation utilizing the MICU of Hospital-II and SICUs from Hospital-I and -II. Kaplan–Meier analysis showed significant difference in 28-day mortality across the phenotypes (p < 0.01) and consistent across MICU and SICU of both Hospital-I and -II. The phenotypes demonstrated differences in treatment effects associated with high positive end-expiratory pressure (PEEP) strategy. Conclusion: The phenotypes demonstrated unique patterns of organ injury and differences in clinical outcomes, which may help inform future research and clinical trial design for tailored management strategies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. 'Your Strength Is Inspirational': How Naomi Osaka's Twitter Announcement Destigmatizes Mental Health Disclosures.

Author: Kumble, Sushma, Diddi, Pratiti, and Bien-Aimé, Steve
Subjects: COMMUNICATION in sports, ATHLETES' health, MEDICAL disclosure, MACHINE learning, THEMATIC analysis, SOCIAL stigma
Abstract: On May 31, 2021, Naomi Osaka, one of the top-ranked female tennis players, and one of the highest-paid female athletes in the world, announced her withdrawal from the French Open on her social media (Twitter) account, citing mental health issues. There exists a stigma around mental health; and people suffering from mental health conditions often experience "discrimination and stigma" (World Health Organization, 2019). Such disclosures by a noted sportsperson provide an opportunity to help combat the stigma. The present study uses unsupervised machine learning and qualitative thematic analysis to analyze 11,800 English language responses to her tweet. Results indicate that Osaka's tweet mostly garnered a lot of support and encouragement. However, there also existed some negative comments. Additionally, 40% of the negative comments were disseminated by bot-like automated accounts. Practical implications for sports communication are also discussed. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Portuguese Textiles and Apparel Industry: Assessing the Effect of International Trade on Employment and Green Employment.

Author: Ribeiro, Vitor Miguel
Subjects: EMPLOYMENT in foreign countries, ECONOMIC models, ELASTICITY (Economics), CLOTHING industry, INTERNATIONAL trade, IMPORT substitution
Abstract: This study examines the impact of international trade activities on employment in the Portuguese textiles and apparel industry from 2010 to 2017. It finds evidence that imports and exports have a persistent, negative, and significant effect on overall job creation, with this impact intensifying over the long-run. Additionally, the increasing elasticity of substitution between imports and exports indicates that private companies of this industry have benefited from a win–win situation characterised by higher production volumes and lower marginal costs. By applying an unsupervised machine-learning method, followed by a discrete choice analysis to infer the firm-level propensity to possess green capital, we identify a phenomenon termed the green international trade paradox. This study also reveals that international trade activities positively influence green job creation in firms lacking green capital if and only if these players are engaged in international markets while negatively affecting firms already endowed with green technologies. As such, empirical results suggest that the export-oriented economic model followed over the last decade by the Portuguese textiles and apparel industry has not necessarily generated new domestic employment opportunities but has significantly altered the magnitude and profile of skill requirements that employers seek to identify in new workforce hires. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. CoMadOut—a robust outlier detection algorithm based on CoMAD.

Author: Lohrer, Andreas, Kazempour, Daniyal, Hünemörder, Maximilian, and Kröger, Peer
Subjects: MACHINE learning, RECEIVER operating characteristic curves, ROBUST statistics, ALGORITHMS
Abstract: Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier datasets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: they may not only distort the orientation and translation of the principal components, they also make it more complicated to detect outliers. To address this problem, we propose the robust outlier detection algorithm CoMadOut, which satisfies two required properties: (1) being robust towards outliers and (2) detecting them. Our CoMadOut outlier detection variants using comedian PCA define, dependent on its variant, an inlier region with a robust noise margin by measures of in-distribution (variant CMO) and optimized scores by measures of out-of-distribution (variants CMO*), e.g. kurtosis-weighting by CMO+k. These measures allow distribution based outlier scoring for each principal component, and thus, an appropriate alignment of the degree of outlierness between normal and abnormal instances. Experiments comparing CoMadOut with traditional, deep and other comparable robust outlier detection methods showed that the performance of the introduced CoMadOut approach is competitive to well established methods related to average precision (AP), area under the precision recall curve (AUPRC) and area under the receiver operating characteristic (AUROC) curve. In summary our approach can be seen as a robust alternative for outlier detection tasks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. Correlation Aware Relevance-Based Semantic Index for Clinical Big Data Repository.

Author: Deshpande, Priya and Rasin, Alexander
Subjects: MEDICAL information storage & retrieval systems, STATISTICAL correlation, DATABASES, DATABASE management, CLUSTER analysis (Statistics), DATA analytics, ELECTRONIC health records, INFORMATION retrieval, SEMANTICS, MACHINE learning, ABSTRACTING & indexing services
Abstract: In this paper, we focus on indexing mechanisms for unstructured clinical big integrated data repository systems. Clinical data is unstructured and heterogeneous, which comes in different files and formats. Accessing data efficiently and effectively are critical challenges. Traditional indexing mechanisms are difficult to apply on unstructured data, especially by identifying correlation information between clinical data elements. In this research work, we developed a correlation-aware relevance-based index that retrieves clinical data by fetching most relevant cases efficiently. In our previous work, we designed a methodology that categorizes medical data based on the semantics of data elements and merges them into an integrated repository. We developed a data integration system for medical data sources that combines heterogeneous medical data and provides access to knowledge-based database repositories to different users. In this research work, we designed an indexing system using semantic tags extracted from clinical data sources and medical ontologies that retrieves relevant data from database repositories and speeds up the process of data retrieval. Our objective is to provide an integrated biomedical database repository that can be used by radiologists as a reference, or for patient care, or by researchers. In this paper, we focus on designing a technique that performs data processing for data integration, learn the semantic properties of data elements, and develop a correlation-aware topic index that facilitates efficient data retrieval. We generated semantic tags by identifying key elements from integrated clinical cases using topic modeling techniques. We investigated a technique that identifies tags for merged categories and provides an index to fetch data from an integrated database repository. We developed a topic coherence matrix that shows how well a topic is supported by a corpus from clinical cases and medical ontologies. We were able to find more relevant results using an annotation index from an integrated database repository, and there was a 61% increase in a recall. We evaluated results with the help of experts and compared them with naive index (index with all terms from the corpus). Our approach improved data retrieval quality by providing most relevant results and reduced data retrieval time as we applied correlation-aware index on an integrated data repository. Topic indexing approach proposed in this research work identifies tags based on a correlation between different data elements, improves data retrieval time, and provides most relevant cases as an outcome of this system. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

47. SMS Scam Detection Application Based on Optical Character Recognition for Image Data Using Unsupervised and Deep Semi-Supervised Learning.

Author: Shinde, Anjali, Shahra, Essa Q., Basurra, Shadi, Saeed, Faisal, AlSewari, Abdulrahman A., and Jabbar, Waheb A.
Subjects: *MACHINE learning, *SUPERVISED learning, *OPTICAL character recognition, *GAUSSIAN mixture models, *FEATURE extraction, *DEEP learning
Abstract: The growing problem of unsolicited text messages (smishing) and data irregularities necessitates stronger spam detection solutions. This paper explores the development of a sophisticated model designed to identify smishing messages by understanding the complex relationships among words, images, and context-specific factors, areas that remain underexplored in existing research. To address this, we merge a UCI spam dataset of regular text messages with real-world spam data, leveraging OCR technology for comprehensive analysis. The study employs a combination of traditional machine learning models, including K-means, Non-Negative Matrix Factorization, and Gaussian Mixture Models, along with feature extraction techniques such as TF-IDF and PCA. Additionally, deep learning models like RNN-Flatten, LSTM, and Bi-LSTM are utilized. The selection of these models is driven by their complementary strengths in capturing both the linear and non-linear relationships inherent in smishing messages. Machine learning models are chosen for their efficiency in handling structured text data, while deep learning models are selected for their superior ability to capture sequential dependencies and contextual nuances. The performance of these models is rigorously evaluated using metrics like accuracy, precision, recall, and F1 score, enabling a comparative analysis between the machine learning and deep learning approaches. Notably, the K-means feature extraction with vectorizer achieved 91.01% accuracy, and the KNN-Flatten model reached 94.13% accuracy, emerging as the top performer. The rationale behind highlighting these models is their potential to significantly improve smishing detection rates. For instance, the high accuracy of the KNN-Flatten model suggests its applicability in real-time spam detection systems, but its computational complexity might limit scalability in large-scale deployments. Similarly, while K-means with vectorizer excels in accuracy, it may struggle with the dynamic and evolving nature of smishing attacks, necessitating continual retraining. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Machine learning-assisted rapid determination for traditional Chinese Medicine Constitution.

Author: Sun, Wen, Bai, Minghua, Wang, Ji, Wang, Bei, Liu, Yixing, Wang, Qi, and Han, Dongran
Subjects: CHINESE medicine, CLUSTER analysis (Statistics), RESEARCH funding, QUESTIONNAIRES, HUMAN constitution, MACHINE learning, AUTOMATION, ALGORITHMS
Abstract: The aim of this study was to develop a machine learning-assisted rapid determination methodology for traditional Chinese Medicine Constitution. Based on the Constitution in Chinese Medicine Questionnaire (CCMQ), the most applied diagnostic instrument for assessing individuals' constitutions, we employed automated supervised machine learning algorithms (i.e., Tree-based Pipeline Optimization Tool; TPOT) on all the possible item combinations for each subscale and an unsupervised machine learning algorithm (i.e., variable clustering; varclus) on the whole scale to select items that can best predict body constitution (BC) classifications or BC scores. By utilizing subsets of items selected based on TPOT and corresponding machine learning algorithms, the accuracies of BC classifications prediction ranged from 0.819 to 0.936, with the root mean square errors of BC scores prediction stabilizing between 6.241 and 9.877. Overall, the results suggested that the automated machine learning algorithms performed better than the varclus algorithm for item selection. Additionally, based on an automated machine learning item selection procedure, we provided the top three ranked item combinations with each possible subscale length, along with their corresponding algorithms for predicting BC classification and severity. This approach could accommodate the needs of different practitioners in traditional Chinese medicine for rapid constitution determination. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Rapid brain lymphoma diagnostics through nanopore sequencing of cytology-negative cerebrospinal fluid.

Author: Hench, J., Hultschig, C., Bratic Hench, I., Sadasivan, H., Yaldizli, Ö, Hutter, G., Dirnhofer, S., Tzankov, A., and Frank, S.
Subjects: *DIFFUSE large B-cell lymphomas, *CELL-free DNA, *SOLID state drives, *DNA methylation, *DNA copy number variations, *CEREBROSPINAL fluid examination, *RHINORRHEA
Abstract: This article discusses a new method for diagnosing brain lymphoma using nanopore sequencing of cerebrospinal fluid (CSF). The study demonstrates the clinical application of this method in two cases of CNS lymphoma. The researchers adapted their fast-track unsupervised machine learning approach for cases with a differential diagnosis of lymphoma and other malignant brain tumors. The results showed that nanopore sequencing-derived methylation patterns can be used for next-day diagnosis and treatment initiation. This point-of-care protocol has the potential to significantly reduce neurological impairment through timely and non-invasive testing. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

50. Regional Spatial Mean of Ionospheric Irregularities Based on K-Means Clustering of ROTI Maps.

Author: Migoya-Orué, Yenca, Abe, Oladipo E., and Radicella, Sandro
Subjects: *IONOSPHERIC plasma, *K-means clustering, *MACHINE learning, *MATHEMATICAL optimization, *LONGITUDE
Abstract: In this paper, we investigate and propose the application of an unsupervised machine learning clustering method to characterize the spatial and temporal distribution of ionospheric plasma irregularities over the Western African equatorial region. The ordinary Kriging algorithm was used to interpolate the rate of change of the total electron content (TEC) index (ROTI) over gridded 0.5° by 0.5° latitude and longitude regional maps in order to simulate the level of ionospheric plasma irregularities in a quasi-real-time scenario. K-means was used to obtain a spatial mean index through an optimal stratification of regional post-processed ROTI maps. The results obtained could be adapted by appropriate K-means algorithms to a real-time scenario, as has been performed for other applications. This method could allow us to monitor plasma irregularities in real time over the African region and, therefore, lead to the possibility of mitigating their effects on satellite-based location systems in the said region. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

3,717 results on '"unsupervised Machine Learning"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources