1. kmlShape: An Efficient Method to Cluster Longitudinal Data (Time-Series) According to Their Shapes
- Author
-
Tarak Driss, Christophe Genolini, Mamoun Benghezal, Sandrine Andrieu, Fabien Subtil, René Ecochard, Biostatistiques santé, Département biostatistiques et modélisation pour la santé et l'environnement [LBBE], Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS), Centre de Recherche sur le Sport et le Mouvement (CeRSM), Université Paris Nanterre (UPN), Epidémiologie et analyses en santé publique : risques, maladies chroniques et handicaps (LEASP), Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Institut National de la Santé et de la Recherche Médicale (INSERM), Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées, and Université de Toulouse (UT)-Université de Toulouse (UT)-Institut National de la Santé et de la Recherche Médicale (INSERM)
- Subjects
Leaves ,[SDV.IB.IMA]Life Sciences [q-bio]/Bioengineering/Imaging ,Computer science ,Physiology ,lcsh:Medicine ,Social Sciences ,Plant Science ,Bioinformatics ,Elections ,01 natural sciences ,010104 statistics & probability ,Database and Informatics Methods ,0302 clinical medicine ,Endocrinology ,Reproductive Physiology ,Medicine and Health Sciences ,Cluster Analysis ,lcsh:Science ,Mammals ,Multidisciplinary ,Group (mathematics) ,Applied Mathematics ,Simulation and Modeling ,Plant Anatomy ,Neurodegenerative Diseases ,Variable (computer science) ,Neurology ,Data Interpretation, Statistical ,Physical Sciences ,Vertebrates ,Information Technology ,Algorithm ,Algorithms ,Research Article ,Ovulation ,Computer and Information Sciences ,Current (mathematics) ,Political Science ,Analyse du Mouvement en Biomécanique Physiologie et Imagerie ,Research and Analysis Methods ,03 medical and health sciences ,Dogs ,Alzheimer Disease ,Mental Health and Psychiatry ,[SDV.MHEP.PHY]Life Sciences [q-bio]/Human health and pathology/Tissues and Organs [q-bio.TO] ,Cluster (physics) ,Animals ,Humans ,[PHYS.MECA.BIOM]Physics [physics]/Mechanics [physics]/Biomechanics [physics.med-ph] ,0101 mathematics ,Cluster analysis ,Menstrual Cycle ,Series (mathematics) ,Endocrine Physiology ,lcsh:R ,Organisms ,Biology and Life Sciences ,Moment (mathematics) ,Data Reduction ,Amniotes ,lcsh:Q ,Dementia ,030217 neurology & neurosurgery ,Mathematics - Abstract
Background Longitudinal data are data in which each variable is measured repeatedly over time. One possibility for the analysis of such data is to cluster them. The majority of clustering methods group together individual that have close trajectories at given time points. These methods group trajectories that are locally close but not necessarily those that have similar shapes. However, in several circumstances, the progress of a phenomenon may be more important than the moment at which it occurs. One would thus like to achieve a partitioning where each group gathers individuals whose trajectories have similar shapes whatever the time lag between them. Method In this article, we present a longitudinal data partitioning algorithm based on the shapes of the trajectories rather than on classical distances. Because this algorithm is time consuming, we propose as well two data simplification procedures that make it applicable to high dimensional datasets. Results In an application to Alzheimer disease, this algorithm revealed a “rapid decline” patient group that was not found by the classical methods. In another application to the feminine menstrual cycle, the algorithm showed, contrarily to the current literature, that the luteinizing hormone presents two peaks in an important proportion of women (22%).
- Published
- 2016