1. Normalization techniques for PARAFAC modeling of urine metabolomic data
- Author
-
Marcela Hrdá, Karel Hron, Radana Karlíková, Alžběta Gardlo, Age K. Smilde, David Friedecký, Tomáš Adam, and Biosystems Data Analysis (SILS, FNWI)
- Subjects
0301 basic medicine ,Normalization (statistics) ,Chromatography ,business.industry ,Chemistry ,Endocrinology, Diabetes and Metabolism ,010401 analytical chemistry ,Clinical Biochemistry ,Pattern recognition ,Urine ,01 natural sciences ,Biochemistry ,Distance measures ,0104 chemical sciences ,Euclidean distance ,03 medical and health sciences ,030104 developmental biology ,Metabolomics ,Data analysis ,Artificial intelligence ,business ,Compositional data ,Interpretability - Abstract
IntroductionOne of the body fluids often used in metabolomics studies is urine. The concentrations of metabolites in urine are affected by hydration status of an individual, resulting in dilution differences. This requires therefore normalization of the data to correct for such differences. Two normalization techniques are commonly applied to urine samples prior to their further statistical analysis. First, AUC normalization aims to normalize a group of signals with peaks by standardizing the area under the curve (AUC) within a sample to the median, mean or any other proper representation of the amount of dilution. The second approach uses specific end-product metabolites such as creatinine and all intensities within a sample are expressed relative to the creatinine intensity.ObjectivesAnother way of looking at urine metabolomics data is by realizing that the ratios between peak intensities are the information-carrying features. This opens up possibilities to use another class of data analysis techniques designed to deal with such ratios: compositional data analysis. The aim of this paper is to develop PARAFAC modeling of three-way urine metabolomics data in the context of compositional data analysis and compare this with standard normalization techniques.MethodsIn the compositional data analysis approach, special coordinate systems are defined to deal with the ratio problem. In essence, it comes down to using other distance measures than the Euclidian Distance that is used in the conventional analysis of metabolomic data.ResultsWe illustrate using this type of approach in combination with three-way methods (i.e. PARAFAC) of a longitudinal urine metabolomics study and two simulations. In both cases, the advantage of the compositional approach is established in terms of improved interpretability of the scores and loadings of the PARAFAC model.ConclusionFor urine metabolomics studies, we advocate the use of compositional data analysis approaches. They are easy to use, well established and proof to give reliable results.
- Published
- 2016