Soubeiga, Armel, Antoine, Violaine, Corteval, Alice, Kerckhove, Nicolas, Moreno, Sylvain, Falih, Issam, and Phalip, Jules
The most well-known unsupervised classification algorithms allow for the identification of hard or probabilistic partitions. However, when working with complex datasets such as those found in the healthcare domain, it is important to note that these partitions may have limitations, particularly when considering the uncertainty associated with medical data. Indeed, these algorithms do not account for modeling outliers, imprecise, or uncertain observations. In this study, we focused on analyzing time series of barometers attributes related to pain intensity in individuals with chronic pain. Our goal was to identify different typologies of care trajectories, highlighting distinct trajectories of chronic pain through clustering techniques. We propose a soft clustering approach based on feature extraction and selection from sequential data, aiming to improve interpretability and enhance the performance of the clustering procedure. Time series feature extraction and selection effectively handles complex and noisy data, improving interpretability and clustering performance. Evidential soft clustering models data uncertainty and imprecision, providing a better representation of nuances and variations in complex, raw data. The first step involves extracting features from the available time series and selecting the most important attributes. For this purpose, we use a time series feature extractor (Tsfresh) and unsupervised feature selection methods, including unsupervised Random Forest, Laplacian Score, and unsupervised Spectral Feature Selection. The second step involves using the evidential c-means (ECM) clustering algorithm on the extracted attributes. The ECM method, based on belief functions, allows for generating a credal partition that has the ability to model various forms of uncertainty. This partition can then be transformed into a hard partition to study individuals based on this uncertainty criterion. The results reveal the existence of two clusters of chronic pain related to discomfort and well-being, exhibiting excellent separability and compactness. Additionally, an uncertain cluster emerges grouping patients with intermediate characteristics. The interpretability of the determined partitions through descriptive analysis, statistical tests, and repeated measures multinomial regression on clinical and demographic data allowed us to determine the profile of patients in the identified trajectories. • Clustering care pathways for patients with chronic pain. • Patient clusters: pain related to fatigue and stress vs. pain related to sleep, physical comfort and mood. • Evidential c-means clustering models uncertainty in chronic pain time series. • Feature extraction from time series helps manage data complexity, imprecision and noisiness. • Feature selection maintains interpretability and improves clustering. [ABSTRACT FROM AUTHOR]