1. Engineering indel and substitution variants of diverse and ancient enzymes using Graphical Representation of Ancestral Sequence Predictions (GRASP)
- Author
-
Scott Bottoms, Marnie L. Lamprecht, Luke W. Guddat, Jörg Carsten, Volker Sieber, Ariane Mora, Alexandra Essebier, Bostjan Kobe, Brad Balderson, Connie M. Ross, Raine E. S. Thomson, Elizabeth M. J. Gillam, Leander Sützl, Rhys Newell, Burkhard Rost, G. Foley, Julian Zaugg, Ross Barnard, Dietmar Haltrich, Mikael Bodén, Gerhard Schenk, and Yosephine Gumulya
- Subjects
Protein family ,Computer science ,Molecular evolution ,GRASP ,Protein engineering ,Computational biology ,Homology (biology) - Abstract
Ancestral sequence reconstruction is a technique that is gaining widespread use in molecular evolution studies and protein engineering. Accurate reconstruction requires the ability to handle appropriately large numbers of sequences, as well as insertion and deletion (“indel”) events, but available approaches exhibit limitations. To address these limitations, we developed Graphical Representation of Ancestral Sequence Predictions (GRASP), which efficiently implements maximum likelihood methods to enable the inference of ancestors of families with more than 10,000 members. GRASP implements partial order graphs (POGs) to represent and infer insertion and deletion events across ancestors, enabling the identification of building blocks for protein engineering.To validate the capacity to engineer novel proteins from realistic data, we predicted ancestor sequences across three distinct enzyme families: glucose-methanol-choline (GMC) oxidoreductases, cytochromes P450, and dihydroxy/sugar acid dehydratases (DHAD). All tested ancestors demonstrated enzymatic activity. Our study demonstrates the ability of GRASP (1) to support large data sets over 10,000 sequences and (2) to employ insertions and deletions to identify building blocks for engineering biologically active ancestors, by exploring variation over evolutionary time.Author summaryMassive sequencing projects expose the extent of natural, genetic diversity. Here, we describe a method with capacity to perform ancestor sequence reconstruction from data sets in excess of 10,000 sequences, poised to recover ancestral diversity, including the evolutionary events that determine present-time biological function and structure.We introduce a novel strategy for suggesting “indel variants” that are distinct from, but can be explored alongside, substitution variants for creating ancestral libraries. We demonstrate how indels can be used as building blocks to form “hybrid ancestors”; based on this strategy, we synthesise ancestor variants, with varying enzymatic activities, for wide-ranging applications in the biotechnology sector.
- Published
- 2019