Start Over

Ten quick tips for sequence-based prediction of protein properties using machine learning

Authors :: Katharina Waury
Anton Feenstra
Qingzhen Hou
Dea Gogishvili
Computer Science
Bio Informatics (IBIVU)
AIMMS
Bioinformatics
Integrative Bioinformatics
Source :: Hou, Q, Waury, K, Gogishvili, D & Feenstra, K A 2022, ' Ten quick tips for sequence-based prediction of protein properties using machine learning ', PLoS Computational Biology, vol. 18, no. 12, e1010669, pp. 1-15 . https://doi.org/10.1371/journal.pcbi.1010669, PLoS Computational Biology, 18(12):e1010669, 1-15. Public Library of Science
Publication Year :: 2022
Publisher :: Public Library of Science, 2022.
Abstract: The ubiquitous availability of genome sequencing data explains the popularity of machine learning-based methods for the prediction of protein properties from their amino acid sequences. Over the years, while revising our own work, reading submitted manuscripts as well as published papers, we have noticed several recurring issues, which make some reported findings hard to understand and replicate. We suspect this may be due to biologists being unfamiliar with machine learning methodology, or conversely, machine learning experts may miss some of the knowledge needed to correctly apply their methods to proteins. Here, we aim to bridge this gap for developers of such methods. The most striking issues are linked to a lack of clarity: how were annotations of interest obtained; which benchmark metrics were used; how are positives and negatives defined. Others relate to a lack of rigor: If you sneak in structural information, your method is not sequence-based; if you compare your own model to “state-of-the-art,” take the best methods; if you want to conclude that some method is better than another, obtain a significance estimate to support this claim. These, and other issues, we will cover in detail. These points may have seemed obvious to the authors during writing; however, they are not always clear-cut to the readers. We also expect many of these tips to hold for other machine learning-based applications in biology. Therefore, many computational biologists who develop methods in this particular subject will benefit from a concise overview of what to avoid and what to do instead.

Subjects :: Ecology
Chromosome Mapping
Machine Learning
Benchmarking
Cellular and Molecular Neuroscience
Knowledge
Computational Theory and Mathematics
Modeling and Simulation
Genetics
Amino Acid Sequence
Molecular Biology
SDG 4 - Quality Education
Ecology, Evolution, Behavior and Systematics

Details

Language :: English
ISSN :: 15537358 and 1553734X
Volume :: 18
Issue :: 12
Database :: OpenAIRE
Journal :: PLoS Computational Biology
Accession number :: edsair.doi.dedup.....aa4794c05d1a210f7cb458d01ec53110

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Ten quick tips for sequence-based prediction of protein properties using machine learning

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Ten quick tips for sequence-based prediction of protein properties using machine learning

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources