1. PARROT is a flexible recurrent neural network framework for analysis of large protein datasets
- Author
-
Daniel Griffith and Alex S Holehouse
- Subjects
Phosphorylation sites ,QH301-705.5 ,Computer science ,Science ,Systems biology ,Activation function ,Machine learning ,computer.software_genre ,General Biochemistry, Genetics and Molecular Biology ,high-throughput methods ,Deep Learning ,proteomics ,Sequence Analysis, Protein ,Humans ,Biology (General) ,Phosphorylation ,Databases, Protein ,General Immunology and Microbiology ,business.industry ,General Neuroscience ,Deep learning ,Computational Biology ,High-Throughput Nucleotide Sequencing ,Proteins ,bioinformatics ,General Medicine ,functional annotation ,Tools and Resources ,machine learning ,Recurrent neural network ,Functional annotation ,Medicine ,Neural Networks, Computer ,Artificial intelligence ,business ,computer ,Software ,Computational and Systems Biology ,Human - Abstract
The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.
- Published
- 2021