Back to Search
Start Over
Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D
- Source :
- Database: The Journal of Biological Databases and Curation
- Publication Year :
- 2019
- Publisher :
- Oxford University Press, 2019.
-
Abstract
- Iterative homology search has been widely used in identification of remotely related proteins. Our previous study has found that the query-seeded sequence iterative search can reduce homologous over-extension errors and greatly improve selectivity. However, iterative homology search remains challenging in protein functional prediction. More sensitive scoring models are highly needed to improve the predictive performance of the alignment methods, and alignment annotation with better visualization has also become imperative for result interpretation. Here we report an open-source application PSISearch2D that runs query-seeded iterative sequence search for remotely related protein detection. PSISearch2D retrieves domain annotation from Pfam, UniProtKB, CDD and PROSITE for resulting hits and demonstrates combined domain and sequence alignments in novel visualizations. A scoring model called C-value is newly defined to re-order hits with consideration of the combination of sequence and domain alignments. The benchmarking on the use of C-value indicates that PSISearch2D outperforms the original PSISearch2 tool in terms of both accuracy and specificity. PSISearch2D improves the characterization of unknown proteins in remote protein detection. Our evaluation tests show that PSISearch2D has provided annotation for 77 695 of 139 503 unknown bacteria proteins and 140 751 of 352 757 unknown virus proteins in UniProtKB, about 2.3-fold and 1.8-fold more characterization than the original PSISearch2, respectively. Together with advanced features of auto-iteration mode to handle large-scale data and optional programs for global and local sequence alignments, PSISearch2D enhances remotely related protein search.
- Subjects :
- Models, Molecular
0303 health sciences
Computer science
Sequence analysis
030302 biochemistry & molecular biology
Proteins
Sequence alignment
Computational biology
PROSITE
Functional prediction
General Biochemistry, Genetics and Molecular Biology
Homology (biology)
Visualization
03 medical and health sciences
Annotation
Sequence Analysis, Protein
Original Article
UniProt
General Agricultural and Biological Sciences
Databases, Protein
Sequence Alignment
Algorithms
030304 developmental biology
Information Systems
Subjects
Details
- Language :
- English
- ISSN :
- 17580463
- Volume :
- 2019
- Database :
- OpenAIRE
- Journal :
- Database: The Journal of Biological Databases and Curation
- Accession number :
- edsair.doi.dedup.....2a3b5261e074b9db39c3966207ae4cf0