Back to Search Start Over

Machine learning to predict retention time of small molecules in nano-HPLC

Authors :
Inga Bashkirova
Yury Kostyukevich
Sergey Sosnin
Sergey Osipenko
Maxim V. Fedorov
Eugene N. Nikolaev
Oxana Kovaleva
Source :
Analytical and Bioanalytical Chemistry. 412:7767-7776
Publication Year :
2020
Publisher :
Springer Science and Business Media LLC, 2020.

Abstract

Retention time is an important parameter for identification in untargeted LC-MS screening. Precise retention time prediction facilitates the annotation process and is well known for proteomics. However, the lack of available experimental information for a long time has limited the prediction accuracy for small molecules. Recently introduced large databases for small-molecule retention times make possible reliable machine learning-based predictions for the whole diversity of compounds. Applying simple projections may expand these predictions on various LC systems and conditions. In our work, we describe a complex approach to predict retention times for nano-HPLC that includes the consequent deployment of binary and regression gradient boosting models trained on the METLIN small-molecule dataset and simple projection of the results with a small number of easily available compounds onto nano-HPLC separations. The proposed model outperforms previous attempts to use machine learning for predictions with a 46-s mean absolute error. The overall performance after transfer to nano-LC conditions is less than 155 s (10.8%) in terms of the median absolute (relative) error. To illustrate the applicability of the described approach, we successfully managed to eliminate averagely 25 to 42% of false-positives with a filter threshold derived from ROC curves. Thus, the proposed approach should be used in addition to other well-established in silico methods and their integration may broaden the range of correctly identified molecules.

Details

ISSN :
16182650 and 16182642
Volume :
412
Database :
OpenAIRE
Journal :
Analytical and Bioanalytical Chemistry
Accession number :
edsair.doi.dedup.....be2f15c008ee22a8139dc1057c73b8e5