Back to Search Start Over

Interpretation and visualization of non-linear data fusion in kernel space: study on metabolomic characterization of progression of multiple sclerosis.

Authors :
Agnieszka Smolinska
Lionel Blanchet
Leon Coulier
Kirsten A M Ampt
Theo Luider
Rogier Q Hintzen
Sybren S Wijmenga
Lutgarde M C Buydens
Source :
PLoS ONE, Vol 7, Iss 6, p e38163 (2012)
Publication Year :
2012
Publisher :
Public Library of Science (PLoS), 2012.

Abstract

BACKGROUND: In the last decade data fusion has become widespread in the field of metabolomics. Linear data fusion is performed most commonly. However, many data display non-linear parameter dependences. The linear methods are bound to fail in such situations. We used proton Nuclear Magnetic Resonance and Gas Chromatography-Mass Spectrometry, two well established techniques, to generate metabolic profiles of Cerebrospinal fluid of Multiple Sclerosis (MScl) individuals. These datasets represent non-linearly separable groups. Thus, to extract relevant information and to combine them a special framework for data fusion is required. METHODOLOGY: The main aim is to demonstrate a novel approach for data fusion for classification; the approach is applied to metabolomics datasets coming from patients suffering from MScl at a different stage of the disease. The approach involves data fusion in kernel space and consists of four main steps. The first one is to extract the significant information per data source using Support Vector Machine Recursive Feature Elimination. This method allows one to select a set of relevant variables. In the next step the optimized kernel matrices are merged by linear combination. In step 3 the merged datasets are analyzed with a classification technique, namely Kernel Partial Least Square Discriminant Analysis. In the final step, the variables in kernel space are visualized and their significance established. CONCLUSIONS: We find that fusion in kernel space allows for efficient and reliable discrimination of classes (MScl and early stage). This data fusion approach achieves better class prediction accuracy than analysis of individual datasets and the commonly used mid-level fusion. The prediction accuracy on an independent test set (8 samples) reaches 100%. Additionally, the classification model obtained on fused kernels is simpler in terms of complexity, i.e. just one latent variable was sufficient. Finally, visualization of variables importance in kernel space was achieved.

Subjects

Subjects :
Medicine
Science

Details

Language :
English
ISSN :
19326203
Volume :
7
Issue :
6
Database :
Directory of Open Access Journals
Journal :
PLoS ONE
Publication Type :
Academic Journal
Accession number :
edsdoj.505c42347ac4424ba9c1f069ddc3bc2
Document Type :
article
Full Text :
https://doi.org/10.1371/journal.pone.0038163