Back to Search Start Over

Tally: a scoring tool for boundary determination between repetitive and non-repetitive protein sequences

Authors :
Francois Richard
Ronnie Alves
Andrey V. Kajava
Centre de recherche en Biologie Cellulaire (CRBM)
Université Montpellier 2 - Sciences et Techniques (UM2)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Université Montpellier 1 (UM1)
Institut de Biologie Computationnelle (IBC)
Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)
Programa de Pós-Graduação em Ciências Contábeis [Belém, Brazil]
Universidade Federal do Pará - UFPA [Belém, Brazil]-Instituto Tecnológico Vale [Belém, Brazil]
National Research University of Information Technologies, Mechanics and Optics [St. Petersburg] (ITMO)
Université Montpellier 1 (UM1)-Université Montpellier 2 - Sciences et Techniques (UM2)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)
Université de Montpellier (UM)-Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)
Centre de recherche en Biologie cellulaire de Montpellier (CRBM)
Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)
Federal University of Para - Universidade Federal do Pará - UFPA [Belém, Brazil] (UFPA)-Instituto Tecnológico Vale [Belém, Brazil] (ITV)
Source :
Bioinformatics, Bioinformatics, Oxford University Press (OUP), 2016, 32 (13), pp.1952--1958. ⟨10.1093/bioinformatics/btw118⟩, Bioinformatics, 2016, 32 (13), pp.1952--1958. ⟨10.1093/bioinformatics/btw118⟩
Publication Year :
2016
Publisher :
Oxford University Press (OUP), 2016.

Abstract

Motivation: Tandem Repeats (TRs) are abundant in proteins, having a variety of fundamental functions. In many cases, evolution has blurred their repetitive patterns. This leads to the problem of distinguishing between sequences that contain highly imperfect TRs, and the sequences without TRs. The 3D structure of proteins can be used as a benchmarking criterion for TR detection in sequences, because the vast majority of proteins having TRs in sequences are built of repetitive 3D structural blocks. According to our benchmark, none of the existing scoring methods are able to clearly distinguish, based on the sequence analysis, between structures with and without 3D TRs. Results: We developed a scoring tool called Tally, which is based on a machine learning approach. Tally is able to achieve a better separation between sequences with structural TRs and sequences of aperiodic structures, than existing scoring procedures. It performs at a level of 81% sensitivity, while achieving a high specificity of 74% and an Area Under the Receiver Operating Characteristic Curve of 86%. Tally can be used to select a set of structurally and functionally meaningful TRs from all TRs detected in proteomes. The generated dataset is available for benchmarking purposes. Availability and implementation: Source code is available upon request. Tool and dataset can be accessed through our website: http://bioinfo.montp.cnrs.fr/?r=Tally. Contact: andrey.kajava@crbm.cnrs.fr Supplementary information: Supplementary data are available at Bioinformatics online.

Details

ISSN :
13674811, 13674803, and 14602059
Volume :
32
Database :
OpenAIRE
Journal :
Bioinformatics
Accession number :
edsair.doi.dedup.....c9e99cbe125e4875a3f9300d072e3125
Full Text :
https://doi.org/10.1093/bioinformatics/btw118