Back to Search
Start Over
Cocoa origin classifiability through LC-MS data: A statistical approach for large and long-term datasets
- Source :
- Food Research International. 140:109983
- Publication Year :
- 2021
- Publisher :
- Elsevier BV, 2021.
-
Abstract
- Classification of food samples based upon their countries of origin is an important task in food industry for quality assurance and development of fine flavor products. Liquid chromatography -mass spectrometry (LC-MS) provides a fast technique for obtaining in-depth information about chemical composition of foods. However, in a large dataset that is gathered over a period of few years, multiple, incoherent and hard to avoid sources of variations e.g., experimental conditions, transportation, batch and instrumental effects, etc. pose technical challenges that make the study of origin classification a difficult problem. Here, we use a large dataset gathered over a period of four years containing 297 LC-MS profiles of cocoa sourced from 10 countries to demonstrate these challenges by using two popular multivariate analysis methods: principal component analysis (PCA) and linear discriminant analysis (LDA). We show that PCA provides a limited separation in bean origin, while LDA suffers from a strong non-linear dependence on the set of compounds. Further, we show for LDA that a compound selection criterion based on Gaussian distribution of intensities across samples dramatically enhances origin clustering of samples thereby suggesting possibilities for studying marker compounds in such a disparate dataset through this approach. In essence, we show and develop a new approach that maximizes, avoiding overfitting, the utility of multivariate analysis in a highly complex dataset.
- Subjects :
- Multivariate analysis
030309 nutrition & dietetics
Computer science
Gaussian
Feature selection
Overfitting
Set (abstract data type)
03 medical and health sciences
symbols.namesake
0404 agricultural biotechnology
Tandem Mass Spectrometry
Chocolate
Cluster analysis
Cacao
0303 health sciences
business.industry
Discriminant Analysis
Pattern recognition
04 agricultural and veterinary sciences
Linear discriminant analysis
040401 food science
Principal component analysis
symbols
Artificial intelligence
business
Chromatography, Liquid
Food Science
Subjects
Details
- ISSN :
- 09639969
- Volume :
- 140
- Database :
- OpenAIRE
- Journal :
- Food Research International
- Accession number :
- edsair.doi.dedup.....f8bc2007c93ec117c57cb1a2bdf72a6d
- Full Text :
- https://doi.org/10.1016/j.foodres.2020.109983