Back to Search
Start Over
Testing machine learning techniques for general application by using protein secondary structure prediction. A brief survey with studies of pitfalls and benefits using a simple progressive learning approach
- Source :
- Computers in biology and medicine. 138
- Publication Year :
- 2021
-
Abstract
- Many researchers have recently used the prediction of protein secondary structure (local conformational states of amino acid residues) to test advances in predictive and machine learning technology such as Neural Net Deep Learning. Protein secondary structure prediction continues to be a helpful tool in research in biomedicine and the life sciences, but it is also extremely enticing for testing predictive methods such as neural nets that are intended for different or more general purposes. A complication is highlighted here for researchers testing their methods for other applications. Modern protein databases inevitably contain important clues to the answer, so-called "strong buried clues", though often obscurely; they are hard to avoid. This is because most proteins or parts of proteins in a modern protein data base are related to others by biological evolution. For researchers developing machine learning and predictive methods, this can overstate and so confuse understanding of the true quality of a predictive method. However, for researchers using the algorithms as tools, understanding strong buried clues is of great value, because they need to make maximum use of all information available. A simple method related to the GOR methods but with some features of neural nets in the sense of progressive learning of large numbers of weights, is used to explore this. It can acquire tens of millions and hence gigabytes of weights, but they are learned stably by exhaustive sampling. The significance of the findings is discussed in the light of promising recent results from AlphaFold using Google's DeepMind.
- Subjects :
- Artificial neural network
Gigabyte
business.industry
Computer science
Deep learning
media_common.quotation_subject
Proteins
Health Informatics
Biological evolution
Protein secondary structure prediction
Machine learning
computer.software_genre
Protein Structure, Secondary
Computer Science Applications
Machine Learning
Quality (business)
Artificial intelligence
business
Databases, Protein
computer
Biomedicine
Algorithms
media_common
Simple (philosophy)
Subjects
Details
- ISSN :
- 18790534
- Volume :
- 138
- Database :
- OpenAIRE
- Journal :
- Computers in biology and medicine
- Accession number :
- edsair.doi.dedup.....c03287f970f7267b18cc4fbefc851542