Back to Search Start Over

Harnessing Data Augmentation and Normalization Preprocessing to Improve the Performance of Chemical Reaction Predictions of Data-Driven Model

Authors :
Boyu Zhang
Jiaping Lin
Lei Du
Liangshun Zhang
Source :
Polymers, Vol 15, Iss 9, p 2224 (2023)
Publication Year :
2023
Publisher :
MDPI AG, 2023.

Abstract

As a template-free, data-driven methodology, the molecular transformer model provides an alternative by which to predict the outcome of chemical reactions and design the route of the retrosynthetic plane in the field of organic synthesis and polymer chemistry. However, in consideration of the small datasets of chemical reactions, the data-driven model suffers from the difficulty of low accuracy in the prediction tasks of chemical reactions. In this contribution, we integrate the molecular transformer model with the strategies of data augmentation and normalization preprocessing to accomplish the three tasks of chemical reactions, including the forward predictions of chemical reactions, and single-step retrosynthetic predictions with and without the reaction classes. It is clearly demonstrated that the prediction accuracy of the molecular transformer model can be significantly raised by the use of proposed strategies for the three tasks of chemical reactions. Notably, after the introduction of the 40-level data augmentation and normalization preprocessing, the top-1 accuracy of the forward prediction increases markedly from 71.6% to 84.2% and the top-1 accuracy of the single-step retrosynthetic prediction with additional reaction class increases from 53.2% to 63.4%. Furthermore, it is found that the superior performance of the data-driven model originates from the correction of the grammatical errors of the SMILES strings, especially for the case of the reaction classes with small datasets.

Details

Language :
English
ISSN :
15092224 and 20734360
Volume :
15
Issue :
9
Database :
Directory of Open Access Journals
Journal :
Polymers
Publication Type :
Academic Journal
Accession number :
edsdoj.1115c58deaa0427baada9ff972a1620b
Document Type :
article
Full Text :
https://doi.org/10.3390/polym15092224