Pablo Pérez-García, Runar Stokke, Christoph Gertler, Boguslaw Nocek, Michail M. Yakimov, Mónica Martínez-Martínez, Olga V. Golyshina, Wolfgang R. Streit, Ana Popovic, Stephan Thies, Peter N. Golyshin, Victoria Mesa, Manuel Ferrer, Greg Brown, Frank Oliver Glöckner, Mercedes V. del Pozo, Jürgen Pleiss, Antonio García-Moyano, Celia Méndez-García, Victor Guallar, José Navarro-Fernández, Xiaohui Xu, Alexander Bollinger, Marco A. Distaso, Hong Cui, María Alcaide, Peter J. Stogios, Jesús Sánchez, Cristina Coscolín, Rafael Bargiela, Ida Helene Steen, Antonio Fernandez-Guerra, Tran Hai, Alexander F. Yakunin, Ana Isabel Pelaez, Alexei Savchenko, Patrick C. F. Buchholz, Karl-Erich Jaeger, Tatyana N. Chernikova, Gro Elin Kjæreng Bjerga, Jennifer Chow, Gerard Santiago, Ministerio de Economía y Competitividad (España), European Commission, Ministerio de Economía, Industria y Competitividad (España), Biotechnology and Biological Sciences Research Council (UK), German Research Foundation, Natural Sciences and Engineering Research Council of Canada, Principado de Asturias, and CSIC - Unidad de Recursos de Información Científica para la Investigación (URICI)
Esterases receive special attention because of their wide distribution in biological systems and environments and their importance for physiology and chemical synthesis. The prediction of esterases’ substrate promiscuity level from sequence data and the molecular reasons why certain such enzymes are more promiscuous than others remain to be elucidated. This limits the surveillance of the sequence space for esterases potentially leading to new versatile biocatalysts and new insights into their role in cellular function. Here, we performed an extensive analysis of the substrate spectra of 145 phylogenetically and environmentally diverse microbial esterases, when tested with 96 diverse esters. We determined the primary factors shaping their substrate range by analyzing substrate range patterns in combination with structural analysis and protein–ligand simulations. We found a structural parameter that helps rank (classify) the promiscuity level of esterases from sequence data at 94% accuracy. This parameter, the active site effective volume, exemplifies the topology of the catalytic environment by measuring the active site cavity volume corrected by the relative solvent accessible surface area (SASA) of the catalytic triad. Sequences encoding esterases with active site effective volumes (cavity volume/SASA) above a threshold show greater substrate spectra, which can be further extended in combination with phylogenetic data. This measure provides also a valuable tool for interrogating substrates capable of being converted. This measure, found to be transferred to phosphatases of the haloalkanoic acid dehalogenase superfamily and possibly other enzymatic systems, represents a powerful tool for low-cost bioprospecting for esterases with broad substrate ranges, in large scale sequence data sets., C.C. thanks the Spanish Ministry of Economy, Industry and Competitiveness for a Ph.D. fellowship (Grant BES-2015-073829). This project received funding from the European Union’s Horizon 2020 research and innovation program [Blue Growth: Unlocking the potential of Seas and Oceans] under grant agreement no. 634486 (project acronym INMARE). This research was also supported by the European Community Projects MAGICPAH (FP7-KBBE-2009-245226), ULIXES (FP7-KBBE-2010-266473), and KILLSPILL (FP7-KBBE2012-312139) and grants BIO2011-25012, PCIN-2014-107, BIO2014-54494-R, and CTQ2016-79138-R from the Spanish Ministry of Economy, Industry and Competitiveness. The present investigation was also funded by the Spanish Ministry of Economy, Industry and Competitiveness within the ERA NET IB2, grant no. ERA-IB-14-030 (MetaCat), the UK Biotechnology and Biological Sciences Research Council (BBSRC), grant no. BB/M029085/1, and the German Research Foundation (FOR1296). R.B. and P.N.G. acknowledge the support of the Supercomputing Wales project, which is part-funded by the European Regional Development Fund (ERDF) via the Welsh Government. O.V.G. and P.N.G. acknowledge the support of the Centre of Environmental Biotechnology Project funded by the European Regional Development Fund (ERDF) through the Welsh Government. A.Y. and A.S. gratefully acknowledge funding from Genome Canada (2009-OGI-ABC-1405) and the NSERC Strategic Network grant IBN. A.I.P. was supported by the Counseling of Economy and Employment of the Principality of Asturias, Spain (Grant FC-15-GRUPIN14-107). V.G. acknowledges the joint BSC-CRG-IRB Research Program in Computational Biology. The authors gratefully acknowledge financial support provided by the European Regional Development Fund (ERDF)., We acknowledge support by the CSIC Open Access Publication Initiative through its Unit of Information Resources for Research (URICI).