UCL - SSH/IMMAQ/ISBA - Institut de Statistique, Biostatistique et Sciences Actuarielles, UCL - SST/ICTM/INMA - Pôle en ingénierie mathématique, Martin, Manon, Legat, Benoît, Leenders, Justine, Vanwinsberghe, Julien, Rousseau, Réjane, Boulanger, Bruno, Eilers, Paul H.C., De Tullio, Pascal, Govaerts, Bernadette, UCL - SSH/IMMAQ/ISBA - Institut de Statistique, Biostatistique et Sciences Actuarielles, UCL - SST/ICTM/INMA - Pôle en ingénierie mathématique, Martin, Manon, Legat, Benoît, Leenders, Justine, Vanwinsberghe, Julien, Rousseau, Réjane, Boulanger, Bruno, Eilers, Paul H.C., De Tullio, Pascal, and Govaerts, Bernadette
In the analysis of biological samples, control over experimental design and data acquisition procedures alone cannot ensure well-conditioned 1H NMR spectra with maximal information recovery for data analysis. A third major element affects the accuracy and robustness of results: the data pre-processing/ pre-treatment for which not enough attention is usually devoted, in particular in metabolomic studies. The usual approach is to use proprietary software provided by the analytical instruments' manufacturers to conduct the entire pre-processing strategy. This widespread practice has a number of advantages such as a user-friendly interface with graphical facilities, but it involves non-negligible drawbacks: a lack of methodological information and automation, a dependency of subjective human choices, only standard processing possibilities and an absence of objective quality criteria to evaluate pre-processing quality. This paper introduces PepsNMR to meet these needs, an R package dedicated to the whole processing chain prior to multivariate data analysis, including, among other tools, solvent signal suppression, internal calibration, phase, baseline and misalignment corrections, bucketing and normalisation. Methodological aspects are discussed and the package is compared to the gold standard procedure with two metabolomic case studies. The use of PepsNMR on these data shows better information recovery and predictive power based on objective and quantitative quality criteria. Other key assets of the package are workflow processing speed, reproducibility, reporting and flexibility, graphical outputs and documented routines.