Back to Search
Start Over
Machine learning to predict retention time of small molecules in nano-HPLC
- Source :
- Analytical and Bioanalytical Chemistry. 412:7767-7776
- Publication Year :
- 2020
- Publisher :
- Springer Science and Business Media LLC, 2020.
-
Abstract
- Retention time is an important parameter for identification in untargeted LC-MS screening. Precise retention time prediction facilitates the annotation process and is well known for proteomics. However, the lack of available experimental information for a long time has limited the prediction accuracy for small molecules. Recently introduced large databases for small-molecule retention times make possible reliable machine learning-based predictions for the whole diversity of compounds. Applying simple projections may expand these predictions on various LC systems and conditions. In our work, we describe a complex approach to predict retention times for nano-HPLC that includes the consequent deployment of binary and regression gradient boosting models trained on the METLIN small-molecule dataset and simple projection of the results with a small number of easily available compounds onto nano-HPLC separations. The proposed model outperforms previous attempts to use machine learning for predictions with a 46-s mean absolute error. The overall performance after transfer to nano-LC conditions is less than 155 s (10.8%) in terms of the median absolute (relative) error. To illustrate the applicability of the described approach, we successfully managed to eliminate averagely 25 to 42% of false-positives with a filter threshold derived from ROC curves. Thus, the proposed approach should be used in addition to other well-established in silico methods and their integration may broaden the range of correctly identified molecules.
- Subjects :
- Receiver operating characteristic
business.industry
Computer science
Small number
010401 analytical chemistry
02 engineering and technology
Filter (signal processing)
021001 nanoscience & nanotechnology
Machine learning
computer.software_genre
01 natural sciences
Biochemistry
Regression
0104 chemical sciences
Analytical Chemistry
Range (mathematics)
Identification (information)
Artificial intelligence
Gradient boosting
0210 nano-technology
business
Projection (set theory)
computer
Subjects
Details
- ISSN :
- 16182650 and 16182642
- Volume :
- 412
- Database :
- OpenAIRE
- Journal :
- Analytical and Bioanalytical Chemistry
- Accession number :
- edsair.doi.dedup.....be2f15c008ee22a8139dc1057c73b8e5