Back to Search Start Over

TSIS with A Comparative Study on Linear Molecular Representation

Authors :
Wu, Juan-Ni
Wang, Tong
Tang, Li-Juan
Wu, Hai-Long
Yu, Ru-Qin
Publication Year :
2024

Abstract

Encoding is the carrier of information. AI models possess basic capabilities in syntax, semantics, and reasoning, but these capabilities are sensitive to specific inputs. In this study, we introduce an encoding algorithm, TSIS (Simplified TSID), to the t-SMILES family as a fragment-based linear molecular representation. TSID has been demonstrated to significantly outperform classical SMILES, DeepSMILES, and SELFIES in previous work. A further comparative analysis in this study reveals that the tree structure used by TSID is more easily learned than anticipated, regardless of whether Transformer or LSTM models are used. Furthermore, TSIS demonstrates comparable performance to TSID and significantly outperforms SMILES, SELFIES, and SAFE. While SEFLIES and SAFE present significant challenges in semantic and syntactic analysis, respectively, due to their inherent complexity.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2402.02164
Document Type :
Working Paper