1. Multiway-SIR for longitudinal multi-table data integration
- Author
-
Sautron, Valérie, Chavent, Marie, Viguerie, Nathalie, Villa-Vialaneix, Nathalie, Génétique Physiologie et Systèmes d'Elevage (GenPhySE ), École nationale supérieure agronomique de Toulouse [ENSAT]-Institut National de la Recherche Agronomique (INRA)-Ecole Nationale Vétérinaire de Toulouse (ENVT), Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées, Université de Bordeaux (UB), Institut des Maladies Métaboliques et Cardiovasculaires (I2MC), Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Institut National de la Santé et de la Recherche Médicale (INSERM), Unité de Mathématiques et Informatique Appliquées de Toulouse (MIAT INRA), Institut National de la Recherche Agronomique (INRA), ANR SusOStress, Quality control and dynamic reliability (CQFD), Institut de Mathématiques de Bordeaux (IMB), Université Bordeaux Segalen - Bordeaux 2-Université Sciences et Technologies - Bordeaux 1-Université de Bordeaux (UB)-Institut Polytechnique de Bordeaux (Bordeaux INP)-Centre National de la Recherche Scientifique (CNRS)-Université Bordeaux Segalen - Bordeaux 2-Université Sciences et Technologies - Bordeaux 1-Université de Bordeaux (UB)-Institut Polytechnique de Bordeaux (Bordeaux INP)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), and ProdInra, Migration
- Subjects
[STAT.AP]Statistics [stat]/Applications [stat.AP] ,[STAT.AP] Statistics [stat]/Applications [stat.AP] ,[STAT.CO]Statistics [stat]/Computation [stat.CO] - Abstract
International audience; An extension of DUAL-STATIS to the sliced-inverse regression (SIR) framework is proposed to analyze multi-table datasets with respect to a numeric variable of interest. The method is designed to analyze the case where a data set $\mathbf{X}$, which corresponds to a set of $p$ variables measured $T$ times on the same $n$ subjects is related to a real target variable $\mathbf{y}$, measured on the same $n$ subjects. The approach is an exploratory method which aims at understanding the evolution of the relation between $\mathbf{X}$ and $\mathbf{y}$ through time. The method proceeds in two steps: 1) an inter-structure analysis studies the resemblance between the different time steps by computing similarities between estimates of the covariance of the mean of $\mathbf{X}_{..t}$ conditional to $\mathbf{y}$. Similarly to SIR, the conditional expectation is estimated by slicing the range of $\mathbf{y}$. The result of this analysis is a compromise covariance matrix $\Gamma^c$, which captures a compromise correlation structure of $\mathbb{E}(\mathbf{X}_{..t}|y)$ over $t$; 2) an intra-structure analysis which is a generalized PCA of the compromise. This second step results in graphical outputs which can be used to explore the covariance structure between variables and time steps conditional to $\mathbf{y}$. The method is illustrated on a real problem related to the consequences of a low calorie diet on obese persons in which the target variable of interest is the weight gain/loss.
- Published
- 2016