Katherine E. Zink, Anam F. Shaikh, Jocelyn Macho, Aswad S Khadilkar, Timothy J. O’Donnell, Jasmine S. Paula, Laura M. Sanchez, Fausto Carnevale Neto, Duy A. Vo, Marnix H. Medema, Tuan Anh Tran, Gregoire Jacob, Darryl Wilson, Dennis Y. Liu, Alison H. Hughes, Barbara R. Terlouw, F. P. Jake Haeckl, Sylvia Soldatou, Catherine McCaughey, Pieter C. Dorrestein, Jung-Ho Lee, Joseph M. Egan, Katherine R. Duncan, Trevor N. Clark, Justin J. J. van der Hooft, Marcy J. Balunas, Laia Castano-Espriu, Alex Hua, Amrit Leen Singh, Derek Bunsko, Mingxun Wang, David A. Delgadillo, Ram P. Neupane, Mercia C. Valentine, Jeffrey A. van Santen, Roger G. Linington, Chen Chang, Jessica C. Little, Melissa M. Galey, Sanghoon Lee, Dasha Iskakova, Nicole LeGrow, and Victor Aniebok
Despite rapid evolution in the area of microbial natural products chemistry, there is currently no open access database containing all microbially produced natural product structures. Lack of availability of these data is preventing the implementation of new technologies in natural products science. Specifically, development of new computational strategies for compound characterization and identification are being hampered by the lack of a comprehensive database of known compounds against which to compare experimental data. The creation of an open access, community-maintained database of microbial natural product structures would enable the development of new technologies in natural products discovery and improve the interoperability of existing natural products data resources. However, these data are spread unevenly throughout the historical scientific literature, including both journal articles and international patents. These documents have no standard format, are often not digitized as machine readable text, and are not publicly available. Further, none of these documents have associated structure files (e.g., MOL, InChI, or SMILES), instead containing images of structures. This makes extraction and formatting of relevant natural products data a formidable challenge. Using a combination of manual curation and automated data mining approaches we have created a database of microbial natural products (The Natural Products Atlas, www.npatlas.org) that includes 24 594 compounds and contains referenced data for structure, compound names, source organisms, isolation references, total syntheses, and instances of structural reassignment. This database is accompanied by an interactive web portal that permits searching by structure, substructure, and physical properties. The Web site also provides mechanisms for visualizing natural products chemical space and dashboards for displaying author and discovery timeline data. These interactive tools offer a powerful knowledge base for natural products discovery with a central interface for structure and property-based searching and presents new viewpoints on structural diversity in natural products. The Natural Products Atlas has been developed under FAIR principles (Findable, Accessible, Interoperable, and Reusable) and is integrated with other emerging natural product databases, including the Minimum Information About a Biosynthetic Gene Cluster (MIBiG) repository, and the Global Natural Products Social Molecular Networking (GNPS) platform. It is designed as a community-supported resource to provide a central repository for known natural product structures from microorganisms and is the first comprehensive, open access resource of this type. It is expected that the Natural Products Atlas will enable the development of new natural products discovery modalities and accelerate the process of structural characterization for complex natural products libraries., The Natural Products Atlas is a new online database of microbially derived natural product structures, designed as a comprehensive open access repository for the scientific community.