1. A unified classifiability analysis framework based on meta-learner and its application in spectroscopic profiling data
- Author
-
Zheng-Yong Zhang, Haiyan Wang, and Yinsheng Zhang
- Subjects
Profiling (computer programming) ,Computer science ,business.industry ,Mutual information ,Machine learning ,computer.software_genre ,Pipeline (software) ,Statistical power ,Data set ,Discriminative model ,Artificial Intelligence ,Metric (mathematics) ,Bayes error rate ,Artificial intelligence ,business ,computer - Abstract
Spectroscopic profiling data (e.g., Raman spectroscopy and mass spectroscopy), combined with machine learning, have provided a data-driven approach for discriminative tasks. In these tasks, researchers often start with simple classification models. If one model doesn’t work, they will try more sophisticated models. If all models fail, the researchers will deem the data set as “inseparable.“ This “trial-and-error” practice reveals a fundamental question: does the dataset possess the necessary statistical power for the current discriminative task? This “classifiability analysis” is an implicit and often neglected step in the data-driven pipeline. This paper aims to design a unified methodological framework for classifiability analysis. In this framework, a meta-learner model combines diversified atom metrics (e.g., Bayes error rate / irreducible error, classification accuracy, information gain / mutual information) into one unified metric (d). We have successfully used the proposed framework to analyze a spectroscopic profiling dataset to discriminate vintage liquors of different ages. A significant difference (d = 1.447. d > 0.8 indicates a significant difference) between 5-year and 16-year liquors.
- Published
- 2021
- Full Text
- View/download PDF