Back to Search Start Over

Homology-Based Annotation of Large Protein Datasets

Authors :
Marco Punta
Jaina Mistry
Biologie Computationnelle et Quantitative = Laboratory of Computational and Quantitative Biology (LCQB)
Université Pierre et Marie Curie - Paris 6 (UPMC)-Institut de Biologie Paris Seine (IBPS)
Université Pierre et Marie Curie - Paris 6 (UPMC)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)
European Bioinformatics Institute [Hinxton] (EMBL-EBI)
EMBL Heidelberg
Carugo
O and Eisenhaber
Institut de Biologie Paris Seine (IBPS)
Université Pierre et Marie Curie - Paris 6 (UPMC)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)
Source :
DATA MINING TECHNIQUES FOR THE LIFE SCIENCES, Carugo, O and Eisenhaber, F. DATA MINING TECHNIQUES FOR THE LIFE SCIENCES, 1415, HUMANA PRESS INC, pp.153-176, 2016, Methods in Molecular Biology, 978-1-4939-3572-7; 978-1-4939-3570-3. ⟨10.1007/978-1-4939-3572-7_8⟩, Methods in Molecular Biology ISBN: 9781493935703
Publication Year :
2016
Publisher :
HAL CCSD, 2016.

Abstract

International audience; Advances in DNA sequencing technologies have led to an increasing amount of protein sequence data being generated. Only a small fraction of this protein sequence data will have experimental annotation associated with them. Here, we describe a protocol for in silico homology-based annotation of large protein datasets that makes extensive use of manually curated collections of protein families. We focus on annotations provided by the Pfam database and suggest ways to identify family outliers and family variations. This protocol may be useful to people who are new to protein data analysis, or who are unfamiliar with the current computational tools that are available.

Details

Language :
English
ISBN :
978-1-4939-3572-7
978-1-4939-3570-3
ISBNs :
9781493935727 and 9781493935703
Database :
OpenAIRE
Journal :
DATA MINING TECHNIQUES FOR THE LIFE SCIENCES, Carugo, O and Eisenhaber, F. DATA MINING TECHNIQUES FOR THE LIFE SCIENCES, 1415, HUMANA PRESS INC, pp.153-176, 2016, Methods in Molecular Biology, 978-1-4939-3572-7; 978-1-4939-3570-3. ⟨10.1007/978-1-4939-3572-7_8⟩, Methods in Molecular Biology ISBN: 9781493935703
Accession number :
edsair.doi.dedup.....96dac742fc4f7d7382d06d36c74a4dbd
Full Text :
https://doi.org/10.1007/978-1-4939-3572-7_8⟩