Start Over

Computational space reduction and parallelization of a new clustering approach for large groups of sequences.

Authors :: Trelles O
Andrade MA
Valencia A
Zapata EL
Carazo JM
Source :: Bioinformatics (Oxford, England) [Bioinformatics] 1998 Jun; Vol. 14 (5), pp. 439-51.
Publication Year :: 1998
Abstract: Motivation: The explosive growth of the biological sequences databases stimulated by genome projects has modified the framework of several applications in the biological sequence analysis area. In most cases, this new scenario is characterized by studies on large sets of sequences, suggesting the need for effective and automatic methods for their clustering. A more effective clustering of the database could be followed by the application of common family analysis schemes to the groups so formed.<br />Results: In this work, we present a new strategy to reduce the computational cost associated with the clustering of large sets of sequences which are expected to contain several families. The strategy is based on the grouping of the sequences into families by using a dynamic threshold on a pairwise sequence similarity criterion. Routine clustering of large data sets can now be done very efficiently. The method developed here achieves a computational space reduction of about an order of magnitude over more traditional ones of all-versus-all comparisons. The outcome of this approach produces family groupings that reproduce closely already accepted biological results. Our work includes a parallel implementation for distributed memory multiprocessors with a dynamic scheduling strategy for performance optimization.<br />Availability: By anonymous ftp at ftp.ac.uma.es (/pub/ots/pCluster directory), or from our Web site http://www.cnb. uam.es/www/software/software&#95;index.html<br />Contact: ots@ac.uma.es

Subjects :: Algorithms
Cluster Analysis
Computational Biology
Evaluation Studies as Topic
Genome
Proteins genetics
Sequence Alignment statistics & numerical data
Databases, Factual
Sequence Alignment methods

Details

Language :: English
ISSN :: 1367-4803
Volume :: 14
Issue :: 5
Database :: MEDLINE
Journal :: Bioinformatics (Oxford, England)
Publication Type :: Academic Journal
Accession number :: 9682057
Full Text :: https://doi.org/10.1093/bioinformatics/14.5.439

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Computational space reduction and parallelization of a new clustering approach for large groups of sequences.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Computational space reduction and parallelization of a new clustering approach for large groups of sequences.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources