Back to Search Start Over

A comparison of several similarity indices used in the classification of protein sequences: a multivariate analysis.

Authors :
Landès C
Hénaut A
Risler JL
Source :
Nucleic acids research [Nucleic Acids Res] 1992 Jul 25; Vol. 20 (14), pp. 3631-7.
Publication Year :
1992

Abstract

The present work describes an attempt to identify reliable criteria which could be used as distance indices between protein sequences. Seven different criteria have been tested: i and ii) the scores of the alignments as given by the BESTFIT and the FASTA programs; iii) the ratio parameter, i.e. the BESTFIT score divided by the length of the aligned peptides; iv and v) the statistical significance (Z-scores) of the scores calculated by BESTFIT and FASTA, as obtained by comparison with shuffled sequences; vi) the Z-scores provided by the program RELATE which performs a segment-by-segment comparison of 2 sequences, and vii) an original distance index calculated by the program DOCMA from all the pairwise dotplots between the sequences. These 7 criteria have been tested against the aminoacid sequences of 39 globins and those of the 20 aminoacyl-tRNA synthetases from E. coli. The distances between the sequences were analyzed by the multivariate analysis techniques. The results show that the distances calculated from the scores of the pairwise alignments are not adequately sensitive. The Z-score from RELATE is not selective enough and too demanding in computer time. Three criteria gave a classification consistent with the known similarities between the sequences in the sets, namely the Z-scores from BESTFIT and FASTA and the multiple dotplot comparison distance index from DOCMA.

Details

Language :
English
ISSN :
0305-1048
Volume :
20
Issue :
14
Database :
MEDLINE
Journal :
Nucleic acids research
Publication Type :
Academic Journal
Accession number :
1641329
Full Text :
https://doi.org/10.1093/nar/20.14.3631