1. Discovering trends in big data: general discussion.
- Author
-
Albornoz, Ricardo Valencia, Antypov, Dmytro, Blanke, Gerd, Borges Jr., Itamar, Bran, Andres Marulanda, Joshua Cheung, Collins, Christopher M., David, Nicholas, Day, Graeme M., Deringer, Volker L., Draxl, Claudia, Eardley-Brunt, Annabel, Evans, Matthew L., Fairlamb, Ian, Fieseler, Kate, Franklin, Barnabas A., George, Janine, Grundy, Joanna, Johal, Jay, and Kalikadien, Adarsh V.
- Abstract
The article "Discovering trends in big data: general discussion" from Faraday Discussions delves into the interpretability of machine learning models in chemistry, specifically in reaction prediction and retrosynthesis. It evaluates the performance and interpretability of models like FlanT5 and ByT5, highlighting the challenges of training models for multi-task chemistry applications. The text emphasizes the significance of data size and quality in training machine learning models for chemistry, while also discussing issues like overprediction in crystal structure prediction and the importance of crediting data generators in large databases. Additionally, it addresses the sustainable development of larger code projects, advocating for open-source repositories, transparent code guidance, and automated testing. The article also touches on the increasing adoption of FAIR tools, particularly among younger students, for data reuse and sharing, with a focus on standards and interoperability in the industry. [Extracted from the article]
- Published
- 2025
- Full Text
- View/download PDF