Back to Search
Start Over
CATH: increased structural coverage of functional space
- Source :
- Nucleic Acids Research
- Publication Year :
- 2020
- Publisher :
- Oxford University Press (OUP), 2020.
-
Abstract
- CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.
- Subjects :
- InterPro
AcademicSubjects/SCI00010
Sequence analysis
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)
Protein domain
Computational biology
Biology
Viral Proteins
03 medical and health sciences
0302 clinical medicine
Protein structure
Protein Domains
Sequence Analysis, Protein
Protein methods
Genetics
Database Issue
Humans
Amino Acid Sequence
Databases, Protein
Epidemics
Peptide sequence
030304 developmental biology
Internet
0303 health sciences
Sequence Homology, Amino Acid
SARS-CoV-2
COVID-19
Computational Biology
Proteins
Molecular Sequence Annotation
030217 neurology & neurosurgery
Subjects
Details
- Language :
- English
- ISSN :
- 13624962 and 03051048
- Database :
- OpenAIRE
- Journal :
- Nucleic Acids Research
- Accession number :
- edsair.doi.dedup.....4d7d1a3a6d725998295e053a25af5873
- Full Text :
- https://doi.org/10.1093/nar/gkaa1079