79 results on '"curve clustering"'
Search Results
2. A Novel Curve Clustering Method for Functional Data: Applications to COVID-19 and Financial Data
- Author
-
Ting Wei and Bo Wang
- Subjects
curve clustering ,functional data ,proximity measure ,time-shift clustering ,COVID-19 ,NASDAQ ,Electronic computers. Computer science ,QA75.5-76.95 ,Probabilities. Mathematical statistics ,QA273-280 - Abstract
Functional data analysis has significantly enriched the landscape of existing data analysis methodologies, providing a new framework for comprehending data structures and extracting valuable insights. This paper is dedicated to addressing functional data clustering—a pivotal challenge within functional data analysis. Our contribution to this field manifests through the introduction of innovative clustering methodologies tailored specifically to functional curves. Initially, we present a proximity measure algorithm designed for functional curve clustering. This innovative clustering approach offers the flexibility to redefine measurement points on continuous functions, adapting to either equidistant or nonuniform arrangements, as dictated by the demands of the proximity measure. Central to this method is the “proximity threshold”, a critical parameter that governs the cluster count, and its selection is thoroughly explored. Subsequently, we propose a time-shift clustering algorithm designed for time-series data. This approach identifies historical data segments that share patterns similar to those observed in the present. To evaluate the effectiveness of our methodologies, we conduct comparisons with the classic K-means clustering method and apply them to simulated data, yielding encouraging simulation results. Moving beyond simulation, we apply the proposed proximity measure algorithm to COVID-19 data, yielding notable clustering accuracy. Additionally, the time-shift clustering algorithm is employed to analyse NASDAQ Composite data, successfully revealing underlying economic cycles.
- Published
- 2023
- Full Text
- View/download PDF
3. A Novel Curve Clustering Method for Functional Data: Applications to COVID-19 and Financial Data.
- Author
-
Wei, Ting and Wang, Bo
- Subjects
COVID-19 ,FINANCIAL databases ,NASDAQ composite index ,BUSINESS cycles ,K-means clustering - Abstract
Functional data analysis has significantly enriched the landscape of existing data analysis methodologies, providing a new framework for comprehending data structures and extracting valuable insights. This paper is dedicated to addressing functional data clustering—a pivotal challenge within functional data analysis. Our contribution to this field manifests through the introduction of innovative clustering methodologies tailored specifically to functional curves. Initially, we present a proximity measure algorithm designed for functional curve clustering. This innovative clustering approach offers the flexibility to redefine measurement points on continuous functions, adapting to either equidistant or nonuniform arrangements, as dictated by the demands of the proximity measure. Central to this method is the "proximity threshold", a critical parameter that governs the cluster count, and its selection is thoroughly explored. Subsequently, we propose a time-shift clustering algorithm designed for time-series data. This approach identifies historical data segments that share patterns similar to those observed in the present. To evaluate the effectiveness of our methodologies, we conduct comparisons with the classic K-means clustering method and apply them to simulated data, yielding encouraging simulation results. Moving beyond simulation, we apply the proposed proximity measure algorithm to COVID-19 data, yielding notable clustering accuracy. Additionally, the time-shift clustering algorithm is employed to analyse NASDAQ Composite data, successfully revealing underlying economic cycles. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
4. Portraying the life cycle of ideas in social psychology through functional (textual) data analysis: a toolkit for digital history.
- Author
-
Rizzoli, Valentina, Trevisani, Matilde, and Tuzzi, Arjuna
- Abstract
This paper presents a method for the digital history of a discipline (social psychology in this application) through the analysis of scientific publications. The titles of a comprehensive set of papers published in the Journal of Personality and Social Psychology (1965–2021) were collected, yielding a total of 10,222 items. The corpus thus constructed underwent several stages of preprocessing until the final conversion into a terms x time-points matrix, where terms are stemmed words and multi-words. After normalizing frequencies via a chi square-like transformation, clusters of words portraying similar temporal patterns were identified by functional (textual) data analysis and distance-based curve clustering. Among the best candidates in terms of the number of clusters, the solutions with six, nine and thirteen clusters (from lower to higher resolution) have been chosen and the nesting relationship demonstrated. They reveal—at different levels of granularity—increasing, decreasing, and stable keywords trends, highlighting methods, theories, and application domains that have become more popular in recent years, lost popularity, or have remained in common use. Moreover, this method allows to highlight historical issues (such as crises in the discipline or debates over the use of terms). The results highlight the core topics of social psychology in the past and today, underlying the crucial contribution of this method for the digital history of a discipline. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. Cluster-Specific Predictions with Multi-Task Gaussian Processes.
- Author
-
Leroy, Arthur, Latouche, Pierre, Guedj, Benjamin, and Gey, Servane
- Subjects
- *
GAUSSIAN processes , *EXPECTATION-maximization algorithms , *FORECASTING , *LATENT variables - Abstract
A model involving Gaussian processes (GPs) is introduced to simultaneously handle multitask learning, clustering, and prediction for multiple functional data. This procedure acts as a model-based clustering method for functional data as well as a learning step for subsequent predictions for new tasks. The model is instantiated as a mixture of multi-task GPs with common mean processes. A variational EM algorithm is derived for dealing with the optimisation of the hyper-parameters along with the hyper-posteriors' estimation of latent variables and processes. We establish explicit formulas for integrating the mean processes and the latent clustering variables within a predictive distribution, accounting for uncertainty in both aspects. This distribution is defined as a mixture of cluster-specific GP predictions, which enhances the performance when dealing with group-structured data. The model handles irregular grids of observations and offers different hypotheses on the covariance structure for sharing additional information across tasks. The performances on both clustering and prediction tasks are assessed through various simulated scenarios and real data sets. The overall algorithm, called MAGMACLUST, is publicly available as an R package. [ABSTRACT FROM AUTHOR]
- Published
- 2023
6. Unsupervised Driving Style Analysis Based on Driving Maneuver Intensity
- Author
-
Xian-Sheng Li, Xiao-Tong Cui, Yuan-Yuan Ren, and Xue-Lian Zheng
- Subjects
Driving behavior ,unsupervised driving style analysis ,dynamic decision-making process ,driving maneuver ,curve clustering ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This paper proposes a novel unsupervised clustering framework to identify driving style not in terms of the discrete features of driving behavior data, but rather the time-varying patterns of driving maneuver intensity. This framework can describe the dynamic decision-making process of driving behavior and the continuity of driving data, and driving maneuver intensity is the basic of this paper. Therefore, detection, feature analysis and clustered on intensity of driving maneuvers are carried out using a threshold-based approach, hierarchical feature extraction, and k-means clustering. Then, to analyze fine-grained driving style, dynamic time windows are determined according to road alignment. In dynamic time windows, this paper constructs time-varying patterns based on driving maneuver intensity, which consider the intensity and frequency of driving behavior and preserve the time-varying characteristics of time-series data. However, not all dynamic time windows are equal in maneuvers’ duration and number, which means the time-varying patterns of driving maneuver intensity are curves with various lengths. So that, for clustering time-varying patterns, this paper proposes a novel curve clustering algorithm named Similarity-Based Clustering with Dynamic Time Warping (SBC-DTW) that can cluster curves with various lengths. The empirical results based on real driving data demonstrate that the proposed framework can classify driving style more accurately than the classical method. Moreover, according to this framework, we can have an in-depth understanding of dynamic driving behavior and the composition of drivers’ long-term driving styles.
- Published
- 2022
- Full Text
- View/download PDF
7. CurveCluster+: Curve Clustering for Hard Landing Pattern Recognition and Risk Evaluation Based on Flight Data.
- Author
-
Li, Xu, Shang, Jiaxing, Zheng, Linjiang, Wang, Qixing, Sun, Hong, and Qi, Lin
- Abstract
Hard landing is a typical flight safety incident, and interpretability plays an important role in flight safety research. However, existing studies failed to provide good interpretability of the reasons for hard landing incidents and suffer from low prediction accuracy. To address the above problems, in this paper we propose CurveCluster+, a curve clustering method based on quick access recorder (QAR) data for hard landing risk evaluation. Specifically, we first conduct an in-depth analysis on hard landing flights by comparing key QAR parameter curves with the group behavior, based on which we establish a two-level hierarchical classification of hard landing incidents according to the hard landing patterns. Then we extract curve-level features from key QAR parameters through interpolation and resampling. After that we turn the classic K-means clustering into a semi-supervised algorithm by incorporating some expert experience and apply it on the curve-level features to automatically recognize the hard landing patterns. Finally, we propose a risk evaluation model based on the clustering results to discover high-risk flights from normal ones. We evaluate our method on a QAR dataset of 37,943 Airbus 320 aircraft flights. The results show that compared with other state-of-the-art data-driven methods, CurveCluster+ provides strong interpretability of hard landing incidents and exhibits good performance in recognizing hard landing patterns (the overall accuracy of our method reaches up to 92.99%). Moreover, it only requires a handful of hard landing samples to discover high-risk flights from tremendous normal landing flights, which is critical for flight safety warnings. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Curve clustering analysis and intelligent recognition of grid nanoindentation data for cementitious material.
- Author
-
Chen, Xiaowen, Sun, Tianci, Sun, Tianshi, Yin, Huazhe, and Hou, Dongwei
- Subjects
- *
CLUSTER analysis (Statistics) , *NANOINDENTATION tests , *GAUSSIAN mixture models , *GENOME editing , *BACK propagation , *IRIS recognition , *MULTISCALE modeling - Abstract
Grid nanoindentation method is commonly adopted to probe micro-mechanical properties of composite materials, providing the basis of multi-scale design of materials and furtherly supporting the research of Materials Genome Engineering. To overcome the disadvantages of traditional nanoindentation data processing methods, a new analysis framework based on intelligence recognition technology was proposed in present work. Curve clustering analysis was established based on principal component analysis to identify the minerals of cementitious materials represented by the indentation load-depth curves. This method avoids the Gaussian distribution assumption and matches every mineral with the test data at effective indentation depth. Compared with deconvolution analysis, Gaussian Mixture Model and point clustering analysis, curve clustering analysis offers more reasonable results. Furthermore, two supervised classification methods, Support Vector Machine classifier and Back Propagation neural network classifier, are introduced to recognize and group nanoindentation data. The training and testing results indicate that they are feasible and highly accurate. • A new analysis framework based on intelligence recognition method is developed to deal with grid nanoindentation test data. • The curve clustering method can distinguish major mineral phases in higher accuracy compared with deconvolution analysis. • The effective indentation depth range of each phase can be determined by the derivative curve of average E - h curve. • Supervised classification methods perform well in the classification of grid nanoindentation test data with high accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Functional Data Analysis and Knowledge-Based Systems
- Author
-
Trevisani, Matilde, DeFanti, Thomas, Series Editor, Grafton, Anthony, Series Editor, Levy, Thomas E., Series Editor, Manovich, Lev, Series Editor, Rockwood, Alyn, Series Editor, and Tuzzi, Arjuna, editor
- Published
- 2018
- Full Text
- View/download PDF
10. The Recent History of Statistics: Comparing Temporal Patterns of Word Clusters
- Author
-
Trevisani, Matilde, Tuzzi, Arjuna, DeFanti, Thomas, Series Editor, Grafton, Anthony, Series Editor, Levy, Thomas E., Series Editor, Manovich, Lev, Series Editor, Rockwood, Alyn, Series Editor, and Tuzzi, Arjuna, editor
- Published
- 2018
- Full Text
- View/download PDF
11. Curve Clustering for Brain Functional Activity and Synchronization
- Author
-
Bertarelli, Gaia, Corbella, Alice, Di Iorio, Jacopo, Gorshechnikova, Anastasia, Scott, Marian, Canale, Antonio, editor, Durante, Daniele, editor, Paci, Lucia, editor, and Scarpa, Bruno, editor
- Published
- 2018
- Full Text
- View/download PDF
12. On the importance of similarity characteristics of curve clustering and its applications.
- Author
-
Cheam, Amay S.M. and Fredette, Marc
- Subjects
- *
CURVES , *OPEN-ended questions - Abstract
• Literature of curve clustering is classified by similarity characteristics. • A new nomenclature is suggested in hope of easing the understanding of practitioners. • Good perspective of curve clustering problem by focusing on similarity characteristic. • Better insight on which method to use depending on the similarity characteristics. This paper presents an overview of curve clustering from the similarity characteristics perspective, with a goal of providing useful advice and references regarding fundamental concepts that are accessible to the broad community of curve clustering practitioners. We introduce a new taxonomy of curve clustering by proposing four major similarity characteristics. We reviewed some contributions to curve clustering with respect to their similarity characteristics along with their applications. Lastly, we give an in-depth discussion of the overall challenges in this field with respect to similarity characteristics, highlight open research questions and discuss guidelines for further progress. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
13. Simultaneous Registration and Clustering for Multidimensional Functional Data.
- Author
-
Zeng, Pengcheng, Qing Shi, Jian, and Kim, Won-Seok
- Subjects
- *
KRIGING , *REGRESSION analysis - Abstract
The clustering for functional data with misaligned problems has drawn much attention in the last decade. Most methods do the clustering after those functional data being registered and there has been little research using both functional and scalar variables. In this article, we propose a simultaneous registration and clustering model via two-level models, allowing the use of both types of variables and also allowing simultaneous registration and clustering. For the data collected from subjects in different groups, a Gaussian process functional regression model with time warping is used as the first level model; an allocation model depending on scalar variables is used as the second level model providing further information over the groups. The former carries out registration and modeling for the multidimensional functional data (two-dimensional curves) at the same time. This methodology is implemented using an EM algorithm, and is examined on both simulated data and real data. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
14. Semantic Concept Extraction for Eyebrow Shapes via AFS Clustering.
- Author
-
Du, Tao, Li, Danyang, Ren, Yan, Lu, Chong, and Liu, Wanquan
- Subjects
EYEBROWS ,GEOMETRIC shapes ,CONCEPTS ,INTERPOLATION ,SPLINES - Abstract
In this paper, a revised directional triangle-area curve representation method (DTAR) is proposed to address the problem of eyebrow semantic shape characterization via curve representation. First, 11 or 12 DTAR values are selected to describe eyebrows via considering the eyebrow corner information roughly, and then the corresponding DTAR curves are acquired via the cubic spline interpolation based on these selected points. Second, a descriptor of the landmarks is developed to represent selected reference eyebrows, and the corresponding DTAR curves are obtained for the selected reference eyebrows. Lastly, a similarity notion based on AFS is introduced via measuring the membership degrees of each eyebrow shape similar to the given reference shapes, and then one can describe each eyebrow shape by using two given reference eyebrow shapes via computing the membership degrees representing the relative similarities. To illustrate the effectiveness of the proposed approach, we use the AR and BJUT databases for experiments to demonstrate the consistency comparison with human perceptions. The experimental results show that the extracted semantic notions of eyebrow shapes obtained by the proposed approach are much better than those by only utilizing 11 DTAR values or 12 DTAR values directly in terms of the consistency with human perceptions. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
15. A DAEM Algorithm for Mixtures of Gaussian Process Functional Regressions
- Author
-
Wu, Di, Ma, Jinwen, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Huang, De-Shuang, editor, Han, Kyungsook, editor, and Hussain, Abir, editor
- Published
- 2016
- Full Text
- View/download PDF
16. An efficient EM algorithm for two-layer mixture model of gaussian process functional regressions.
- Author
-
Wu, Di, Xie, Yurong, and Qiang, Zhe
- Subjects
- *
EXPECTATION-maximization algorithms , *KRIGING , *OPTIMIZATION algorithms , *GAUSSIAN mixture models , *SIMULATED annealing - Abstract
• The comp utat ional effi cien cy of the optimization algorithm is improved for the MGPFR model. • The local maxi mum problem of the optimization algorithm is overcome for the mix-GPFR model. • The comp utat ional effi cien cy of the optimization algorithm is improved for the TMGPFR model. • The local maxi mum prob lem of the optimization algorithm is overcome for TMGPFR. The mixture of Gaussian processes is effective for regression, but it cannot handle the non-stationary curve clustering problem well. The two-layer mixture of Gaussian process functional regressions (TMGPFR) model was established to deal with this problem. In this paper, we first propose the classification EM (CEM) algorithm to solve that the optimization algorithm is inefficient for TMGPFRs, and then propose the deterministic annealing CEM algorithm for TMGPFRs to overcome the local maximum problem of the CEM algorithm. Lastly, experiments are conducted on synthetic and real-world data sets, and the results show that our proposed algorithms are more effective than the compared algorithms on curve clustering and regression. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Portraying the life cycle of ideas in social psychology through functional (textual) data analysis: a toolkit for digital history
- Author
-
Valentina Rizzoli, Matilde Trevisani, and Arjuna Tuzzi
- Subjects
History of social psychology ,Curve clustering ,Functional ,data analysis ,General Social Sciences ,Digital history, History of social psychology, Diachronic corpora, Functional, data analysis, Curve clustering ,Diachronic corpora ,Library and Information Sciences ,Digital history ,Computer Science Applications - Abstract
This paper presents a method for the digital history of a discipline (social psychology in this application) through the analysis of scientific publications. The titles of a comprehensive set of papers published in the Journal of Personality and Social Psychology (1965–2021) were collected, yielding a total of 10,222 items. The corpus thus constructed underwent several stages of preprocessing until the final conversion into a terms x time-points matrix, where terms are stemmed words and multi-words. After normalizing frequencies via a chi square-like transformation, clusters of words portraying similar temporal patterns were identified by functional (textual) data analysis and distance-based curve clustering. Among the best candidates in terms of the number of clusters, the solutions with six, nine and thirteen clusters (from lower to higher resolution) have been chosen and the nesting relationship demonstrated. They reveal—at different levels of granularity—increasing, decreasing, and stable keywords trends, highlighting methods, theories, and application domains that have become more popular in recent years, lost popularity, or have remained in common use. Moreover, this method allows to highlight historical issues (such as crises in the discipline or debates over the use of terms). The results highlight the core topics of social psychology in the past and today, underlying the crucial contribution of this method for the digital history of a discipline.
- Published
- 2023
18. Semantics characterization for eye shapes based on directional triangle-area curve clustering.
- Author
-
Ren, Yan, Li, Qilin, Liu, Wanquan, Li, Ling, and Guan, Wei
- Subjects
SEMANTICS ,CURVES ,EYE ,GEOMETRIC shapes ,DATABASE design - Abstract
In this paper, we present a novel approach to address the problem of eye shape characterization via curve representation. Firstly, a directional triangle-area curve representation method (DTAR) is presented for this aim. Equipped with DTAR, the shape similarities between two eyes can be measured by the similarities between two corresponding DTAR curves. Secondly, in order to exploit the underlying information of eye shapes, a curve clustering algorithm is utilized to automatically discover a set of eye shape prototypes. Consequently, a semantics extraction method for eye shapes is proposed in terms of seven reference eye shapes. Finally, in order to validate the consistency of the clustering results and the extracted semantics, extensive experiments on AR and BU4DFE databases are designed and conducted, and all the results demonstrate the effectiveness of the proposed DTAR curve representation and the semantics extraction method. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
19. Functional Data Analysis in Sport Science: Example of Swimmers' Progression Curves Clustering.
- Author
-
Leroy, Arthur, MARC, Andy, DUPAS, Olivier, REY, Jean Lionel, and Gey, Servane
- Subjects
SWIMMING ,DATA analysis ,MULTIPLE correspondence analysis (Statistics) - Abstract
Many data collected in sport science come from time dependent phenomenon. This article focuses on Functional Data Analysis (FDA), which study longitudinal data by modelling them as continuous functions. After a brief review of several FDA methods, some useful practical tools such as Functional Principal Component Analysis (FPCA) or functional clustering algorithms are presented and compared on simulated data. Finally, the problem of the detection of promising young swimmers is addressed through a curve clustering procedure on a real data set of performance progression curves. This study reveals that the fastest improvement of young swimmers generally appears before 16 years old. Moreover, several patterns of improvement are identified and the functional clustering procedure provides a useful detection tool. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
20. Split-and-merge model selection of mixtures of Gaussian processes with RJMCMC.
- Author
-
Qiang, Zhe, Ma, Jinwen, and Wu, Di
- Subjects
- *
MARKOV chain Monte Carlo , *GAUSSIAN mixture models , *GAUSSIAN processes , *EXPECTATION-maximization algorithms , *SIMULATED annealing - Abstract
The mixture of Gaussian processes is a powerful statistical learning model that can be effectively applied to curve clustering and prediction. However, the corresponding model selection problem, that is, selecting an appropriate number of components in the mixture, is rather difficult to solve. In our previous work, we established the split-and-merge automatic model selection algorithm for mixtures of Gaussian processes along the output space under the framework of Reversible Jump Markov Chain Monte Carlo (RJMCMC), which can not only determine the number of actual Gaussian processes but also dynamically adjust the Gaussian process components to avoid dependence on parameter initialization and initial partitioning of the dataset during the parameter learning on a given dataset. In this study, we propose two algorithms: Penalized Likelihood RJMCMC and Penalized Prior RJMCMC. The former integrates a penalized term into the likelihood, while the latter incorporates a penalized term into the prior and operates within the full Bayesian inference framework, both aiming to focus more sharply on determining the number of components in the convergence process. Furthermore, we prove the geometric ergodicity of the RJMCMC algorithm for the mixture of Gaussian processes model, ensuring convergence of the posterior distribution with sufficient iterations. The experimental results further demonstrate the robustness of our PP-RJMCMC algorithm in model selection, showing superior performance compared to traditional approaches in curve classification and clustering. Additionally, the prediction performance is comparable to the EM algorithm. Although not directly explored in this study, the RJMCMC results can be used to initialize the EM algorithm, which could potentially improve prediction accuracy and accelerate computation. • Penalized likelihood and simulated annealing are used in RJMCMC for mix-GP to focus on a component. • Penalty term is integrated into the prior for full Bayesian inference in RJMCMC. • NUTS sampler replaces HMC for adaptive tuning, boosting RJMCMC efficiency. • Ergodicity of RJMCMC for mix-GP has been proved, implying convergence. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
21. Spectral methods for growth curve clustering.
- Author
-
Majstorović, Snježana, Sabo, Kristian, Jung, Johannes, and Klarić, Matija
- Subjects
GRAPH theory ,GROWTH curves (Statistics) ,CLUSTER analysis (Statistics) ,MATHEMATICAL decomposition ,LAPLACIAN matrices - Abstract
The growth curve clustering problem is analyzed and its connection with the spectral relaxation method is described. For a given set of growth curves and similarity function, a similarity matrix is defined, from which the corresponding similarity graph is constructed. It is shown that a nearly optimal growth curve partition can be obtained from the eigendecomposition of a specific matrix associated with a similarity graph. The results are illustrated and analyzed on the set of synthetically generated growth curves. One real-world problem is also given. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
22. A Fused Load Curve Clustering Algorithm Based on Wavelet Transform.
- Author
-
Jiang, Zigui, Lin, Rongheng, Yang, Fangchun, and Wu, Budan
- Abstract
The electricity load data recorded by smart meters contain plenty of knowledge that contributes to obtaining load patterns and consumer categories. Generally, the daily load curves are clustered first in order to obtain load patterns of each consumer. However, due to the volume and high dimensions of load curves, existing clustering algorithms are not appropriate in this situation. Thus, a fused load curve clustering algorithm based on wavelet transform (FCCWT) is proposed to solve this problem. The algorithm includes two main phases. First, FCCWT applies multilevel discrete wavelet transform (DWT) to convert the daily load curves for dimensionality reduction. Second, it detects clusters at two outputs of the first phase, and then fuses two groups of clusters with a subalgorithm named cluster fusion to achieve the optimized clusters. FCCWT is implemented on datasets of both China and United States. Their clustering performances are evaluated by diverse validity indices comparing with four typical clustering methods. The experimental results show that FCCWT outperforms other comparison methods. Additionally, case analysis of two datasets are also provided to discuss the significance of load patterns. [ABSTRACT FROM PUBLISHER]
- Published
- 2018
- Full Text
- View/download PDF
23. Learning the evolution of disciplines from scientific literature: A functional clustering approach to normalized keyword count trajectories.
- Author
-
Trevisani, Matilde and Tuzzi, Arjuna
- Subjects
- *
MACHINE learning , *BAG-of-words model (Computer science) , *HISTORICAL linguistics , *CORPORA , *SCIENTIFIC literature , *CLUSTER analysis (Statistics) - Abstract
The growing availability of large diachronic corpora of scientific literature offers the opportunity of reading the temporal evolution of concepts, methods and applications, i.e., the history of disciplines involved in the strand under investigation. After a retrieval process of the most relevant keywords, bag-of-words approaches produce words × time-points contingency tables, i.e. the frequencies of each word in the set of texts grouped by time-points. Through the analysis of word counts over the observed period of time, main purpose of the study is, after reconstructing the “life-cycle” of words, clustering words that have similar life-cycles and, thus, detecting prototypical or exemplary temporal patterns. Unveiling such relevant and (through expert opinion) meaningful inner dynamics enables us to trace a historical narrative of the discipline of interest. However, different history readings are possible depending on the type of data normalization, which is needed to account for the fluctuating size of texts across time and the general problems of data sparsity and strong asymmetry. This study proposes a methodology consisting of (1) a stepwise information retrieval procedure for keywords’ selection and (2) a functional clustering two-stage approach for statistical learning. Moreover, a sample of possible normalizations of word frequencies is considered, showing that the different concept of curve similarity induced in clustering by the type of transformation heavily affects groups’ composition and size. The corpus of titles of scientific papers published by the American Statistical Association journals in the time span 1888–2012 is examined for illustration. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
24. Topics and trends in the end-of-year addresses of the Presidents of the Italian Republic (1949-2021)
- Author
-
Matilde, Trevisani and Tuzzi, Arjuna
- Subjects
topic trends ,chronological textual data ,presidential addresses ,curve clustering ,presidential addresses, functional data analysis, chronological textual data, curve clustering, topic trends ,functional data analysis - Published
- 2022
25. Cluster-Specific Predictions with Multi-Task Gaussian Processes
- Author
-
Leroy, Arthur, Latouche, Pierre, Guedj, Benjamin, Gey, Servane, University of Manchester [Manchester], Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145), Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS)-Université Paris Cité (UPCité), University College of London [London] (UCL), Department of Computer science [University College of London] (UCL-CS), Inria-CWI (Inria-CWI), Centrum Wiskunde & Informatica (CWI)-Institut National de Recherche en Informatique et en Automatique (Inria), MOdel for Data Analysis and Learning (MODAL), Laboratoire Paul Painlevé (LPP), Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Evaluation des technologies de santé et des pratiques médicales - ULR 2694 (METRICS), Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-École polytechnique universitaire de Lille (Polytech Lille), The Inria London Programme (Inria-London), University College of London [London] (UCL)-University College of London [London] (UCL)-Institut National de Recherche en Informatique et en Automatique (Inria), Engineering and Physical Sciences Research Council (EPSRC), grant number EP/R013616/1, ANR-18-CE40-0016,BEAGLE,Apprentissage PAC-bayésien agnostique(2018), ANR-18-CE23-0015,APRIORI,Une Perspective PAC-Bayésienne de l'Apprentissage de Représentations(2018), Computer science department [University College London] (UCL-CS), Laboratoire Paul Painlevé - UMR 8524 (LPP), Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS)-Université de Paris (UP), Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Université de Lille, Sciences et Technologies-Inria Lille - Nord Europe, Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-École polytechnique universitaire de Lille (Polytech Lille), and Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS)-Université Paris Cité (UPC)
- Subjects
FOS: Computer and information sciences ,cluster-specific predictions ,Computer Science - Machine Learning ,Gaussian processes mixture ,multi-task learning ,Machine Learning (stat.ML) ,[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH] ,curve clustering ,Statistics - Computation ,variational EM ,Machine Learning (cs.LG) ,Methodology (stat.ME) ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Statistics - Machine Learning ,[STAT.CO]Statistics [stat]/Computation [stat.CO] ,Computation (stat.CO) ,Statistics - Methodology - Abstract
A model involving Gaussian processes (GPs) is introduced to simultaneously handle multi-task learning, clustering, and prediction for multiple functional data. This procedure acts as a model-based clustering method for functional data as well as a learning step for subsequent predictions for new tasks. The model is instantiated as a mixture of multi-task GPs with common mean processes. A variational EM algorithm is derived for dealing with the optimisation of the hyper-parameters along with the hyper-posteriors' estimation of latent variables and processes. We establish explicit formulas for integrating the mean processes and the latent clustering variables within a predictive distribution, accounting for uncertainty on both aspects. This distribution is defined as a mixture of cluster-specific GP predictions, which enhances the performances when dealing with group-structured data. The model handles irregular grid of observations and offers different hypotheses on the covariance structure for sharing additional information across tasks. The performances on both clustering and prediction tasks are assessed through various simulated scenarios and real datasets. The overall algorithm, called MagmaClust, is publicly available as an R package., 47 pages
- Published
- 2022
26. Life cycle of ideas in the Journal of Personality and Social Psychology: A history of US social psychology
- Author
-
Rizzoli, Valentina, Matilde, Trevisani, and Tuzzi, Arjuna
- Subjects
history of social psychology ,science mapping ,diachronic corpora ,functional data analysis ,curve clustering - Published
- 2022
27. Topics and trends in the End-of-Year addresses of the Presidents of the Italian Republic (1949-2021)
- Author
-
Trevisani, Matilde, Tuzzi, Arjuna, Balzanella A., Bini M., Cavicchia C., Verde R., Trevisani, Matilde, and Tuzzi, Arjuna
- Subjects
topic trends ,chronological textual data ,presidential addresse ,functional data analysi ,presidential addresses ,curve clustering ,functional data analysis - Abstract
The aim of this study is to analyse the corpus of end-of-year speeches of the Presidents of the Italian Republic from a diachronic perspective in order to identify groups of words that share the same pattern and depict main topic trends. The procedure adopted for the recognition of the dynamics of word frequencies moves from a statistical learning perspective and envisages decisions that concern the normalization of occurrences, the smoothing of the trajectories, and the curve clustering. In resulting clusters, emerging topics as well as those that have disappeared over years are clearly visible but, above all, the individual trait of the President stands out as the most relevant element that determines the contents of his discourses.
- Published
- 2022
28. Functional Data Analysis in Sport Science: Example of Swimmers’ Progression Curves Clustering
- Author
-
Arthur Leroy, Andy MARC, Olivier DUPAS, Jean Lionel REY, and Servane Gey
- Subjects
curve clustering ,functional data analysis ,swimming ,sport ,detection ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Many data collected in sport science come from time dependent phenomenon. This article focuses on Functional Data Analysis (FDA), which study longitudinal data by modelling them as continuous functions. After a brief review of several FDA methods, some useful practical tools such as Functional Principal Component Analysis (FPCA) or functional clustering algorithms are presented and compared on simulated data. Finally, the problem of the detection of promising young swimmers is addressed through a curve clustering procedure on a real data set of performance progression curves. This study reveals that the fastest improvement of young swimmers generally appears before 16 years old. Moreover, several patterns of improvement are identified and the functional clustering procedure provides a useful detection tool.
- Published
- 2018
- Full Text
- View/download PDF
29. Entropy Minimizing Curves with Application to Flight Path Design and Clustering.
- Author
-
Puechmorel, Stéphane and Nicol, Florence
- Subjects
- *
AIR traffic , *AIRWAYS (Aeronautics) , *ENTROPY , *PROBABILITY theory , *ROBOTIC trajectory control , *MANAGEMENT - Abstract
Air traffic management (ATM) aims at providing companies with a safe and ideally optimal aircraft trajectory planning. Air traffic controllers act on flight paths in such a way that no pair of aircraft come closer than the regulatory separation norms. With the increase of traffic, it is expected that the system will reach its limits in the near future: a paradigm change in ATM is planned with the introduction of trajectory-based operations. In this context, sets of well-separated flight paths are computed in advance, tremendously reducing the number of unsafe situations that must be dealt with by controllers. Unfortunately, automated tools used to generate such planning generally issue trajectories not complying with operational practices or even flight dynamics. In this paper, a means of producing realistic air routes from the output of an automated trajectory design tool is investigated. For that purpose, the entropy of a system of curves is first defined, and a mean of iteratively minimizing it is presented. The resulting curves form a route network that is suitable for use in a semi-automated ATM system with human in the loop. The tool introduced in this work is quite versatile and may be applied also to unsupervised classification of curves: an example is given for French traffic. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
30. Unsupervised learning of regression mixture models with unknown number of components.
- Author
-
Chamroukhi, Faicel
- Subjects
- *
REGRESSION analysis , *COMPUTER programming , *EXPECTATION-maximization algorithms , *SPLINES , *ELECTRONIC file management , *APPLICATION software - Abstract
We propose a new unsupervised learning algorithm to fit regression mixture models with unknown number of components. The developed approach consists in a penalized maximum likelihood estimation carried out by a robust expectation–maximization (EM)-like algorithm. We derive it for polynomial, spline, and B-spline regression mixtures. The proposed learning approach is unsupervised: (i) it simultaneously infers the model parameters and the optimal number of the regression mixture components from the data as the learning proceeds, rather than in a two-fold scheme as in standard model-based clustering using afterward model selection criteria, and (ii) it does not require accurate initialization unlike the standard EM for regression mixtures. The developed approach is applied to curve clustering problems. Numerical experiments on simulated and real data show that the proposed algorithm performs well and provides accurate clustering results, and confirm its benefit for practical applications. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
31. Deformation analysis in tunnels through curve clustering.
- Author
-
Ordóñez, Celestino, Argüelles, Ramón, Sanz-Ablanedo, Enoc, and Roca-Pardiñas, Javier
- Subjects
- *
DEFORMATIONS (Mechanics) , *TUNNELS , *CLUSTER analysis (Statistics) , *ALGORITHMS , *MATHEMATICAL models - Abstract
Deformations in tunnels and galleries due to confinement pressures can occur continuously but also at discrete time intervals. We propose a methodology to detect discrete significant deformations in tunnels based on a probabilistic model-based curve clustering. An EM (expectation-maximization) algorithm is used to obtain the parameters of the component density functions that maximize the log-likelihood function. The estimation of the number of clusters was performed by means of the Bayesian Information Criterion (BIC). The proposed methodology was applied to the analysis of the deformations in a tunnel that has been used in the past to transport coal in an underground mine. A set of 40 profiles measured over a period of 20 months were compared. The results obtained show that deformations are not continuous but significantly high deformation episodes occur. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
32. Sulle tracce dell'espressione dell'interiorità: analisi diacronica di un corpus di narrativa italiana del XIX-XX secolo
- Author
-
Sciandra, Andrea, Matilde, Trevisani, and Tuzzi, Arjuna
- Subjects
chronological textual data ,diachronic corpora, chronological textual data, word embedding, functional data analysis, curve clustering ,word embedding ,curve clustering ,diachronic corpora ,functional data analysis - Published
- 2021
33. A semiparametric mixture regression model for longitudinal data
- Author
-
Nummi, Tapio, Salonen, Janne, Koskinen, Lasse, and Pan, Jianxin
- Published
- 2018
- Full Text
- View/download PDF
34. Entropy Minimizing Curves with Application to Flight Path Design and Clustering
- Author
-
Stéphane Puechmorel and Florence Nicol
- Subjects
curve system entropy ,curves manifold ,curve clustering ,probability distribution estimation ,air traffic management ,Science ,Astrophysics ,QB460-466 ,Physics ,QC1-999 - Abstract
Air traffic management (ATM) aims at providing companies with a safe and ideally optimal aircraft trajectory planning. Air traffic controllers act on flight paths in such a way that no pair of aircraft come closer than the regulatory separation norms. With the increase of traffic, it is expected that the system will reach its limits in the near future: a paradigm change in ATM is planned with the introduction of trajectory-based operations. In this context, sets of well-separated flight paths are computed in advance, tremendously reducing the number of unsafe situations that must be dealt with by controllers. Unfortunately, automated tools used to generate such planning generally issue trajectories not complying with operational practices or even flight dynamics. In this paper, a means of producing realistic air routes from the output of an automated trajectory design tool is investigated. For that purpose, the entropy of a system of curves is first defined, and a mean of iteratively minimizing it is presented. The resulting curves form a route network that is suitable for use in a semi-automated ATM system with human in the loop. The tool introduced in this work is quite versatile and may be applied also to unsupervised classification of curves: an example is given for French traffic.
- Published
- 2016
- Full Text
- View/download PDF
35. A Hybrid Clustering Method for ROI Delineation in Small-Animal Dynamic PET Images: Application to the Automatic Estimation of FDG Input Functions.
- Author
-
Zheng, Xiujuan, Tian, Guangjian, Huang, Sung-Cheng, and Feng, Dagan
- Subjects
POSITRON emission tomography ,AUTOMATION ,RADIOPHARMACEUTICALS ,CAVITY resonators ,IMAGE processing ,COMPUTER simulation ,MYOCARDIUM ,PIXELS - Abstract
Tracer kinetic modeling with dynamic positron emission tomography (PET) requires a plasma time-activity curve (PTAC) as an input function. Several image-derived input function (IDIF) methods that rely on drawing the region of interest (ROI) in large vascular structures have been proposed to overcome the problems caused by the invasive approach for obtaining the PTAC, especially for small-animal studies. However, the manual placement of ROIs for estimating IDIF is subjective and labor-intensive, making it an undesirable and unreliable process. In this paper, we propose a novel hybrid clustering method (HCM) that objectively delineates ROIs in dynamic PET images for the estimation of IDIFs, and demonstrate its application to the mouse PET studies acquired with [ ^18F]Fluoro-2-deoxy-2-D-glucose (FDG). We begin our HCM using k-means clustering for background removal. We then model the time-activity curves using polynomial regression mixture models in curve clustering for heart structure detection. The hierarchical clustering is finally applied for ROI refinements. The HCM achieved accurate ROI delineation in both computer simulations and experimental mouse studies. In the mouse studies, the predicted IDIF had a high correlation with the gold standard, the PTAC derived from the invasive blood samples. The results indicate that the proposed HCM has a great potential in ROI delineation for automatic estimation of IDIF in dynamic FDG-PET studies. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
36. Robust Curve Clustering Based on a Multivariate t-Distribution Model.
- Author
-
Wang, Zhi Min, Song, Qing, Soh, Yeng Chai, and Sim, Kang
- Subjects
- *
ROBUST control , *CLUSTER analysis (Statistics) , *MULTIVARIATE analysis , *DISTRIBUTION (Probability theory) , *GAUSSIAN processes , *OUTLIERS (Statistics) , *RANDOM noise theory , *MATHEMATICAL models , *SPLINE theory - Abstract
This brief presents a curve clustering technique based on a new multivariate model. Instead of the usual Gaussian random effect model, our method uses the multivariate t-distribution model which has better robustness to outliers and noise. In our method, we use the B-spline curve to model curve data and apply the mixed-effects model to capture the randomness and covariance of all curves within the same cluster. After fitting the B-spline-based mixed-effects model to the proposed multivariate t-distribution, we derive an expectation-maximization algorithm for estimating the parameters of the model, and apply the proposed approach to the simulated data and the real dataset. The experimental results show that our model yields better clustering results when compared to the conventional Gaussian random effect model. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
37. A REGRESSION MIXTURE MODEL WITH SPATIAL CONSTRAINTS FOR CLUSTERING SPATIOTEMPORAL DATA.
- Author
-
BLEKAS, K., NIKOU, C., GALATSANOS, N., and TSEKOS, N. V.
- Subjects
- *
REGRESSION analysis , *METHODOLOGY , *GAUSSIAN processes , *MAGNETIC resonance imaging , *ALGORITHMS - Abstract
We present a new approach for curve clustering designed for analysis of spatiotemporal data. Such data contains both spatial and temporal patterns that we desire to capture. The proposed methodology is based on regression and Gaussian mixture modeling. The novelty of the herein work is the incorporation of spatial smoothness constraints in the form of a prior for the data labels. This allows to take into account the property of spatiotemporal data according to which spatially adjacent data points have higher probability to belong to the same cluster. The proposed model can be formulated as a Maximum a Posteriori (MAP) problem, where the Expectation Maximization (EM) algorithm is used to estimate the model parameters. Several numerical experiments with both simulated data and real cardiac perfusion MRI data are used for evaluating the methodology. The results are promising and demonstrate the value of the proposed approach. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
38. Knowledge discovery for dynamic textual data: temporal patterns of topics and word clusters in corpora of scientific literature
- Author
-
Stefano Sbalchiero, Matilde Trevisani, Arjuna Tuzzi, Arbia Giuseppe, Peluso Stefano, Pini Alessia, Rivellini Giulia (editors), Sbalchiero, Stefano, Trevisani, Matilde, and Tuzzi, Arjuna
- Subjects
trend analysi ,topic detection ,topic detection, trend analysis, curve clustering, clustering consensus ,trend analysis ,curve clustering ,clustering consensus - Abstract
The study aims at comparing two methods for tracing the temporal evolution of topics and keywords in corpora of scientific literature: the well-known Latent Dirichelet Allocation and a new knowledge-based system that has been developed in a functional data analysis unsupervised perspective. Object of the study is a corpus of abstracts of articles published by the American Journal of Sociology over a century (1921-2018). Our study advocates that the two methods might not be seen as alternative but rather as integrable means to improve the interpretation of findings.
- Published
- 2019
39. -mean alignment for curve clustering
- Author
-
Sangalli, Laura M., Secchi, Piercesare, Vantini, Simone, and Vitelli, Valeria
- Subjects
- *
FUNCTIONAL analysis , *DATA analysis , *CLUSTER analysis (Statistics) , *INITIAL value problems , *CURVES , *ALGORITHMS , *SIMULATION methods & models - Abstract
Abstract: The problem of curve clustering when curves are misaligned is considered. A novel algorithm is described, which jointly clusters and aligns curves. The proposed procedure efficiently decouples amplitude and phase variability; in particular, it is able to detect amplitude clusters while simultaneously disclosing clustering structures in the phase, pointing out features that can neither be captured by simple curve clustering nor by simple curve alignment. The procedure is illustrated via simulation studies and applications to real data. [Copyright &y& Elsevier]
- Published
- 2010
- Full Text
- View/download PDF
40. Classification non supervisée de courbes basée sur l'information au second ordre : détection de la dégradation de l'état de pistes d'atterrissage
- Author
-
Puechmorel, Stéphane, Nicol, Florence, Gregorutti, Baptiste, Andrieu, Cindie, Ecole Nationale de l'Aviation Civile (ENAC), Safety Line [Paris], AMIES, PEPS Dofin, PEPS1 AMIES, PEPS DOFIN, and Nicol, Florence
- Subjects
[MATH.MATH-PR] Mathematics [math]/Probability [math.PR] ,ComputerApplications_COMPUTERSINOTHERSYSTEMS ,[MATH] Mathematics [math] ,outlier detection ,shape ,données fonctionnelles ,air traffic management ,[STAT.AP] Statistics [stat]/Applications [stat.AP] ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,Similarity measure ,[MATH]Mathematics [math] ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,functional data analysis ,[STAT.AP]Statistics [stat]/Applications [stat.AP] ,[STAT.ME] Statistics [stat]/Methodology [stat.ME] ,manifold ,shape manifold ,curve clustering ,sécurité aéroportuaire ,espace de formes ,[MATH.MATH-PR]Mathematics [math]/Probability [math.PR] ,[MATH.MATH-DG]Mathematics [math]/Differential Geometry [math.DG] ,similarité entre courbes ,airport safety ,Classification non supervisée clustering ,[MATH.MATH-DG] Mathematics [math]/Differential Geometry [math.DG] ,[STAT.ME]Statistics [stat]/Methodology [stat.ME] - Abstract
International audience; In air transportation, especially in airport safety, radar tracks are continuously recorded and may be used for detecting incidents on airport surface. However, all known statistical algorithms, even those based on functional data, are unable to distinguish between a safety critical flight and another one departing from standard behavior, but otherwise safe. In this work, we propose a change of paradigm by representing curves as points in a shape manifold. In this framework, it is possible to use Finsler distances between shapes that explicitly take into account the second derivative and can be able to 1 correctly detect skid situations from deviant trajectories that cannot be considered as a slipped trajectory. This metric is next used in curve clustering for detecting bad runway conditions. Some results on datasets of synthetic and real trajectories are presented, as well as a comparison of existing metrics.; Sur la plupart des plateformes aéroportuaires, les déplacements des aéronefs sont enregistrés en continu par des radars de surveillance et les trajectoires ainsi obtenues peuvent être utilisées pour détecter ou prévenir des incidents lors du roulage, en particu-lier des dérapages. Cependant, l'exploitation de ces données est rendue difficile par le fait que les algorithmes statistiques connus, même ceux basés sur des données fonctionnelles, ne sont pas capables de distinguer les situations réellement dangereuses des déviations au comportement standard qui sont sans gravité. Dans cette étude, nous proposons un chan-gement de paradigme en représentant les trajectoires comme des points dans un espace de formes. Dans ce nouveau cadre, il est possible de construire des distances de type Fins-ler, prenant explicitement en compte les dérivées secondes des trajectoires, et qui sont en mesure de bien séparer les trajectoires présentant un dérapage de celles dont la déviation par rapport au comportement nominal n'est pas due à un manque d'adhérence. Cette métrique est ensuite utilisée dans des méthodes de clustering pour détecter un mauvais état de la piste d'atterrissage. Des résultats sur des jeux de données de trajectoires syn-thétiques et réelles sont présentés ainsi qu'une comparaison avec des métriques existantes. Mots-clés. Similarité entre courbes, espace de formes, données fonctionnelles, classification non supervisée de courbes, sécurité aéroportuaire.
- Published
- 2018
41. Curve clustering based on second order information: application to bad runway condition detection
- Author
-
Puechmorel, Stéphane, Nicol, Florence, Andrieu, Cindie, Gregorutti, Baptiste, Ecole Nationale de l'Aviation Civile (ENAC), Safety Line [Paris], AMIES, PEPS Dofin, and Nicol, Florence
- Subjects
[STAT.AP]Statistics [stat]/Applications [stat.AP] ,[STAT.ME] Statistics [stat]/Methodology [stat.ME] ,shape manifold ,[MATH] Mathematics [math] ,curve clustering ,données fonctionnelles ,sécurité aéroportuaire ,espace de formes ,[STAT.AP] Statistics [stat]/Applications [stat.AP] ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,[MATH.MATH-DG]Mathematics [math]/Differential Geometry [math.DG] ,Similarity measure ,similarité entre courbes ,airport safety ,Classification non supervisée clustering ,[MATH]Mathematics [math] ,[MATH.MATH-DG] Mathematics [math]/Differential Geometry [math.DG] ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,[STAT.ME]Statistics [stat]/Methodology [stat.ME] ,functional data analysis - Abstract
In air transportation, especially in airport safety, radar tracks are continuously recorded and may be used for detecting incidents on airport surface. However, all known statistical algorithms, even those based on functional data, are unable to distinguish between a safety critical flight and another one departing from standard behavior, but otherwise safe. In this work, we propose a change of paradigm by representing curves as points in a shape manifold. In this framework, it is possible to use Finsler distances between shapes that explicitly take into account the second derivative and can be able to 1 correctly detect skid situations from deviant trajectories that cannot be considered as a slipped trajectory. This metric is next used in curve clustering for detecting bad runway conditions. Some results on datasets of synthetic and real trajectories are presented, as well as a comparison of existing metrics., Sur la plupart des plateformes aéroportuaires, les déplacements des aéronefs sont enregistrés en continu par des radars de surveillance et les trajectoires ainsi obtenues peuvent être utilisées pour détecter ou prévenir des incidents lors du roulage, en particu-lier des dérapages. Cependant, l'exploitation de ces données est rendue difficile par le fait que les algorithmes statistiques connus, même ceux basés sur des données fonctionnelles, ne sont pas capables de distinguer les situations réellement dangereuses des déviations au comportement standard qui sont sans gravité. Dans cette étude, nous proposons un chan-gement de paradigme en représentant les trajectoires comme des points dans un espace de formes. Dans ce nouveau cadre, il est possible de construire des distances de type Fins-ler, prenant explicitement en compte les dérivées secondes des trajectoires, et qui sont en mesure de bien séparer les trajectoires présentant un dérapage de celles dont la déviation par rapport au comportement nominal n'est pas due à un manque d'adhérence. Cette métrique est ensuite utilisée dans des méthodes de clustering pour détecter un mauvais état de la piste d'atterrissage. Des résultats sur des jeux de données de trajectoires syn-thétiques et réelles sont présentés ainsi qu'une comparaison avec des métriques existantes. Mots-clés. Similarité entre courbes, espace de formes, données fonctionnelles, classification non supervisée de courbes, sécurité aéroportuaire.
- Published
- 2018
42. Spectral methods for growth curve clustering
- Author
-
Johannes Jung, Snježana Majstorović, Matija Klarić, and Kristian Sabo
- Subjects
021103 operations research ,0211 other engineering and technologies ,02 engineering and technology ,Function (mathematics) ,Management Science and Operations Research ,01 natural sciences ,Growth curve (statistics) ,010104 statistics & probability ,Matrix (mathematics) ,Similarity (network science) ,Curve clustering ,Similarity graph ,Laplacian matrix ,Modularity matrix ,Spectral methods ,Graph (abstract data type) ,Applied mathematics ,0101 mathematics ,Cluster analysis ,Eigendecomposition of a matrix ,Mathematics - Abstract
The growth curve clustering problem is analyzed and its connection with the spectral relaxation method is described. For a given set of growth curves and similarity function, a similarity matrix is defined, from which the corresponding similarity graph is constructed. It is shown that a nearly optimal growth curve partition can be obtained from the eigendecomposition of a specific matrix associated with a similarity graph. The results are illustrated and analyzed on the set of synthetically generated growth curves. One real- world problem is also given.
- Published
- 2018
43. The Recent History of Statistics: Comparing Temporal Patterns of Word Clusters
- Author
-
Arjuna Tuzzi, Matilde Trevisani, Arjuna Tuzzi, Trevisani, Matilde, and Tuzzi, Arjuna
- Subjects
cluster number validation ,Computer science ,media_common.quotation_subject ,History of statistics ,computer.software_genre ,history of statistics ,functional data analysis ,normalisation ,splines ,curve clustering ,Reading (process) ,Cluster analysis ,media_common ,Structure (mathematical logic) ,business.industry ,Functional data analysis ,spline ,Popularity ,Word lists by frequency ,history of statistic ,functional data analysi ,Artificial intelligence ,business ,computer ,Natural language processing ,Word (computer architecture) - Abstract
The abstracts published by the Journal of the American Statistical Association in the time span 1946–2016 have been examined in order to identify relevant timings in the recent history of statistics and retrieve past and current topics that have drawn the attention of one of the most influential communities of statisticians in the world. The focus is on clusters of words that, over time, share a similar trajectory of occurrences in the issues of the journal and on the effect of different choices in the number of clusters. When arrangements in coarser and finer groupings have been compared and contrasted, an interesting nested structure has emerged. Moreover, results have highlighted the conjoint effect of word cycle synchrony and word popularity, which are two of the most important features to be accounted for by the researcher in reading the output of a curve clustering based on observations of word frequencies from a chronological perspective. The research also shows that a knowledge-based system (a computer-based system that supports human learning, endowed with a knowledge-base, a statistical learning engine and a user interface) is able to achieve an effective representation of abstracts and that many elements of the history of statistics may be gleaned by reading the abstracts of a large number of papers and considering ‘texts as data’.
- Published
- 2018
44. Learning the evolution of disciplines from scientific literature: A functional clustering approach to normalized keyword count trajectories
- Author
-
Matilde Trevisani, Arjuna Tuzzi, Trevisani, Matilde, and Tuzzi, Arjuna
- Subjects
Normalization (statistics) ,Information Systems and Management ,Chronological textual data ,Curve clustering ,Diachronic corpora ,Functional data analysis ,Keyword retrieval ,Normalization ,Computer science ,02 engineering and technology ,Scientific literature ,computer.software_genre ,Functional data analysi ,01 natural sciences ,Management Information Systems ,Set (abstract data type) ,010104 statistics & probability ,Artificial Intelligence ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Selection (linguistics) ,0101 mathematics ,Cluster analysis ,business.industry ,Word lists by frequency ,Software ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
The growing availability of large diachronic corpora of scientific literature offers the opportunity of reading the temporal evolution of concepts, methods and applications, i.e., the history of disciplines involved in the strand under investigation. After a retrieval process of the most relevant keywords, bag-of-words approaches produce words × time-points contingency tables, i.e. the frequencies of each word in the set of texts grouped by time-points. Through the analysis of word counts over the observed period of time, main purpose of the study is, after reconstructing the “life-cycle” of words, clustering words that have similar life-cycles and, thus, detecting prototypical or exemplary temporal patterns. Unveiling such relevant and (through expert opinion) meaningful inner dynamics enables us to trace a historical narrative of the discipline of interest. However, different history readings are possible depending on the type of data normalization, which is needed to account for the fluctuating size of texts across time and the general problems of data sparsity and strong asymmetry. This study proposes a methodology consisting of (1) a stepwise information retrieval procedure for keywords’ selection and (2) a functional clustering two-stage approach for statistical learning. Moreover, a sample of possible normalizations of word frequencies is considered, showing that the different concept of curve similarity induced in clustering by the type of transformation heavily affects groups’ composition and size. The corpus of titles of scientific papers published by the American Statistical Association journals in the time span 1888–2012 is examined for illustration.
- Published
- 2018
45. Functional Data Analysis and Knowledge-Based Systems
- Author
-
Matilde Trevisani, Arjuna Tuzzi, and Trevisani, Matilde
- Subjects
Computer science ,business.industry ,media_common.quotation_subject ,distant reading ,trajectory normalisation ,curve clustering ,clustering validation ,clustering agreement ,Scientific literature ,computer.software_genre ,Trace (semiology) ,Knowledge-based systems ,Subject-matter expert ,Reading (process) ,Similarity (psychology) ,Selection (linguistics) ,Artificial intelligence ,Cluster analysis ,business ,computer ,Natural language processing ,media_common - Abstract
In the present study, the challenge is whether a distant reading of the history of a discipline can be achieved by analysing the temporal evolution of keywords retrieved from papers in the discipline’s mainstream journals. This calls for the so-called knowledge-based system (KBS), i.e. a computer-based system that supports human learning not only by acquiring and manipulating large volumes of data and information, but also by integrating knowledge from different sources. In this chapter, we introduce a KBS that, starting from a large database of texts retrieved from scientific articles published over a lengthy period by a selection of the discipline’s premier journals, leads to the construction of a well-founded corpus of scientific literature and from this to a possible outline of the discipline’s history. Our work is based on the idea that the temporal course of a word occurrence is a proxy of the word’s life cycle. We then adopt a functional data analysis (FDA) approach under which we first reconstruct words’ life cycles. Second, by clustering words with similar life cycles, we detect any prototypical or exemplary temporal patterns representing the latent dynamics of word micro-histories. The major dynamics uncovered at this stage are then submitted to subject matter experts for interpretation and guidance in decision-making, thus making it possible to trace a history of the discipline. Moreover, we propose several kinds of data normalisation which involve different concepts of life cycle similarity and hence a different reading of the history of the discipline under examination.
- Published
- 2018
46. A semiparametric mixture regression model for longitudinal data
- Author
-
Tapio Nummi, Janne Salonen, Lasse Koskinen, Jianxin Pan, Luonnontieteiden tiedekunta - Faculty of Natural Sciences, and University of Tampere
- Subjects
Statistics and Probability ,Statistics::Theory ,Penalized likelihood ,Longitudinal data ,EM algoritmi ,kasvukäyrät ,growth curves ,01 natural sciences ,Set (abstract data type) ,010104 statistics & probability ,Tilastotiede - Statistics and probability ,0504 sociology ,Expectation–maximization algorithm ,Statistics ,Statistics::Methodology ,finite mixtures ,Terveystiede - Health care science ,Semiparametric regression ,0101 mathematics ,EM algorithm ,sekoitetut jakaumat ,Mathematics ,05 social sciences ,Matematiikka - Mathematics ,050401 social sciences methods ,Mixture regression ,curve clustering ,Term (time) ,Semiparametric model ,käyrien klusterointi - Abstract
A normal semiparametric mixture regression model is proposed for longitudinal data. The proposed model contains one smooth term and a set of possible linear predictors. Model terms are estimated using the penalized likelihood method with the EM algorithm. A computationally feasible alternative method that provides an approximate solution is also introduced. Simulation experiments and a real data example are used to illustrate the methods.
- Published
- 2018
47. Curve Clustering for Brain Functional Activity and Synchronization
- Author
-
Marian Scott, Jacopo Di Iorio, Anastasia Gorshechnikova, Gaia Bertarelli, and Alice Corbella
- Subjects
Quantitative Biology::Neurons and Cognition ,medicine.diagnostic_test ,Resting state fMRI ,Computer science ,Brain activity and meditation ,business.industry ,fMRI ,Functional boxplot ,Pattern recognition ,Brain mapping ,Curve clustering ,Smoothing ,nervous system ,medicine ,Time domain ,Artificial intelligence ,Settore SECS-S/01 - Statistica ,Functional magnetic resonance imaging ,Cluster analysis ,business ,Settore SECS-S/02 - Statistica per La Ricerca Sperimentale e Tecnologica - Abstract
Functional Magnetic Resonance Imaging (fMRI) has become one of the leading methods for brain mapping in neuroscience and it is an important tool in modern neuroscience investigation. Moreover, the recent advances in fMRI analysis are widely used to define the default state of brain activity, functional connectivity and basal activity. Signal processing schemes have been suggested to analyze the resting state Blood-Oxygenation-Level-Dependent (BOLD) signal from simple correlations to spectral decomposition. Our goal is to determine which brain areas behave similarly in the time domain. To address this question, we apply functional curve clustering methods. We carry out an exploratory study using classical functional clustering of fMRI time series. The analysis confirms the hypothesis of a possible spatial influence on the results and therefore suggests the development of spatial curve clustering methods for brain data.
- Published
- 2018
48. Optimally weighted [formula omitted] distances for spatially dependent functional data.
- Author
-
Romano, Elvira, Diana, Andrea, Miller, Claire, and O'Donnell, Ruth
- Abstract
In recent years, in many application fields, extracting information from data in the form of functions is of most interest rather than investigating traditional multivariate vectors. Often these functions have complex spatial dependences that need to be accounted for using appropriate statistical analysis. Spatial Functional Statistics presents a fruitful analytics framework for this analysis. The definition of a distance measure between spatially dependent functional data is critical for many functional data analysis tasks such as clustering and classification. For this reason, and based on the specific characteristics of functional data, several distance measures have been proposed in the last few years. In this work we develop a weighted L 2 distance for spatially dependent functional data, with an optimized weight function. Assuming a penalized basis representation for the functional data, we consider weight functions depending also on the spatial location in two different situations: a classical georeferenced spatial structure and a connected network one. The performance of the proposed distances are compared using standard metrics applied to both real and simulated data analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
49. Model-Based Co-Clustering of Multivariate Functional Data
- Author
-
Chamroukhi, Faicel, Biernacki, Christophe, Laboratoire de Mathématiques Nicolas Oresme (LMNO), Université de Caen Normandie (UNICAEN), Normandie Université (NU)-Normandie Université (NU)-Centre National de la Recherche Scientifique (CNRS), MOdel for Data Analysis and Learning (MODAL), Laboratoire Paul Painlevé (LPP), Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Evaluation des technologies de santé et des pratiques médicales - ULR 2694 (METRICS), Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-École polytechnique universitaire de Lille (Polytech Lille), Centre National de la Recherche Scientifique (CNRS)-Université de Caen Normandie (UNICAEN), Normandie Université (NU)-Normandie Université (NU), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Paul Painlevé - UMR 8524 (LPP), Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Evaluation des technologies de santé et des pratiques médicales - ULR 2694 (METRICS), Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-École polytechnique universitaire de Lille (Polytech Lille)-Université de Lille, Sciences et Technologies, Laboratoire Paul Painlevé - UMR 8524 (LPP), Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Université de Lille, Sciences et Technologies-Inria Lille - Nord Europe, Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-École polytechnique universitaire de Lille (Polytech Lille), and Biernacki, Christophe
- Subjects
Curve clustering ,Functional data analysis ,[STAT.ME] Statistics [stat]/Methodology [stat.ME] ,Co-clustering ,Mixture modeling ,Variational EM ,Latent block model ,EM algorithms ,[STAT.ME]Statistics [stat]/Methodology [stat.ME] - Abstract
International audience; High dimensional data clustering is an increasingly interesting topic in the statistical analysis of heterogeneous large-scale data. In this paper, we consider the problem of clustering heterogeneous high-dimensional data where the individuals are described by functional variables which exhibit a dynamical longitudinal structure. We address the issue in the framework of model-based co-clustering and propose the functional latent block model (FLBM). The introduced FLBM model allows to simultaneously cluster a sample of multivariate functions into a finite set of blocks, each block being an association of a cluster over individuals and a cluster over functional variables. Furthermore, the homogeneous set within each block is modeled with a dedicated latent process functional regression model which allows its segmentation according to an underlying dynamical structure. The proposed model allows thus to fully exploit the structure of the data, compared to classical latent block clustering models for continuous non functional data, which ignores the functional structure of the observations. The FLBM can therefore serve for simultaneous co-clustering and segmentation of multivariate non-stationary functions. We propose a variational expectation-maximization (EM) algorithm (VEM-FLBM) to monotonically maximize a variational approximation of the observed-data log-likelihood for the unsupervised inference of the FLBM model.
- Published
- 2017
50. Statistical Analysis of Aircraft Trajectories: a Functional Data Analysis Approach
- Author
-
Nicol, Florence, Ecole Nationale de l'Aviation Civile (ENAC), and Nicol, Florence
- Subjects
[STAT.AP]Statistics [stat]/Applications [stat.AP] ,[STAT.ME] Statistics [stat]/Methodology [stat.ME] ,functional statistics ,[STAT.AP] Statistics [stat]/Applications [stat.AP] ,principal component analysis ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,curve clustering ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,air traffic management ,[STAT.ME]Statistics [stat]/Methodology [stat.ME] - Abstract
International audience; In Functional Data Analysis, the underlying structure of a raw observation is functional and data are assumed to be sample paths from a single stochastic process. When data considered are functional in nature thus infinite-dimensional, like curves or images, the multivariate statistical procedures have to be generalized to the infinite-dimensional case. By approximating random functions by a finite number of random score vectors, the Principal Component Analysis approach appears as a dimension reduction technique and offers a visual tool to assess the dominant modes of variation, pattern of interest, clusters in the data and outlier detection. A functional statistics approach is applied to univariate and multivariate aircraft trajectories.
- Published
- 2017
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.