1. Using recurrent neural networks to predict aspects of 3D structure of folded copolymer sequences
- Author
-
Reilly, Rg, Kechadi, Mt, Kenneth A. Dawson, Kuznetsov, Ya, and Timoshenko, Eg
- Abstract
Neural networks have been applied with some limited success to the problem of predicting the secondary [1,2] and tertiary [3,4] structure of proteins based on their amino acid residue sequence. The number of sequences for which there is a known 3-D structure is relatively limited. The rate at which 3-D structures are being solved is at least one order of magnitude lower than the rate at which new protein sequences are being determined [3]. In addition, a limitation in the neural network approaches taken to date is their inability to deal with very long sequences, and with the possibility of dependencies between different regions of a sequence [8]. The work described here is an attempt to address these limitations. In order to obtain a large set of sequences with known 3-D structures for training the neural network, we use the approach described in [5] to generate a set of artificial copolymers consisting of hydrophobic and hydrophilic units with a known 3-D structure when folded. By employing recurrent neural networks and building on the approach described in [3, 4], we describe a way to augment a neural network with both with a facility to deal with sequences of realistic length, and with a mechanism for handling possible long-distant interactions between regions of the sequence. These sequences are very approximate models of real proteins, given that we only encode the hydrophobicity of the amino acid side chains, and there is no attempt to model their secondary or super-secondary structure. Nonetheless, the neural network techniques developed using artificial sequences are readily applicable to real proteins.