Back to Search Start Over

Locality-sensitive hashing for protein classification

Authors :
Li, X
Liu, L
Ong, K L
Zhao, Y
Buckingham, Lawrence
Hogan, Jim
Geva, Shlomo
Kelly, Wayne
Li, X
Liu, L
Ong, K L
Zhao, Y
Buckingham, Lawrence
Hogan, Jim
Geva, Shlomo
Kelly, Wayne
Source :
Data Mining and Analytics 2014: Proceedings of the 12th Australasian Data Mining Conference [Conferences in Research and Practice in Information Technology, Volume 158]
Publication Year :
2014

Abstract

Determination of sequence similarity is a central issue in computational biology, a problem addressed primarily through BLAST, an alignment based heuristic which has underpinned much of the analysis and annotation of the genomic era. Despite their success, alignment-based approaches scale poorly with increasing data set size, and are not robust under structural sequence rearrangements. Successive waves of innovation in sequencing technologies – so-called Next Generation Sequencing (NGS) approaches – have led to an explosion in data availability, challenging existing methods and motivating novel approaches to sequence representation and similarity scoring, including adaptation of existing methods from other domains such as information retrieval. In this work, we investigate locality-sensitive hashing of sequences through binary document signatures, applying the method to a bacterial protein classification task. Here, the goal is to predict the gene family to which a given query protein belongs. Experiments carried out on a pair of small but biologically realistic datasets (the full protein repertoires of families of Chlamydia and Staphylococcus aureus genomes respectively) show that a measure of similarity obtained by locality sensitive hashing gives highly accurate results while offering a number of avenues which will lead to substantial performance improvements over BLAST..

Details

Database :
OAIster
Journal :
Data Mining and Analytics 2014: Proceedings of the 12th Australasian Data Mining Conference [Conferences in Research and Practice in Information Technology, Volume 158]
Notes :
application/pdf, application/pdf
Publication Type :
Electronic Resource
Accession number :
edsoai.on1146605746
Document Type :
Electronic Resource