Back to Search Start Over

MetaProFi: A Protein-Based Bloom Filter for Storing and Querying Sequence Data for Accurate Identification of Functionally Relevant Genetic Variants

Authors :
Sebastian Keller
Olga V. Kalinina
Sanjay Kumar Srikakulam
Robert Bals
Fawaz Dabbaghie
Source :
SSRN Electronic Journal.
Publication Year :
2021
Publisher :
Elsevier BV, 2021.

Abstract

Technological advances of next-generation sequencing present new computational challenges to develop methods to store and query these data in time- and memory-efficient ways. We present MetaProFi (https://github.com/kalininalab/metaprofi), a Bloom filter-based tool that, in addition to supporting nucleotide sequences, can for the first time directly store and query amino acid sequences and translated nucleotide sequences, thus bringing sequence comparison to a more biologically relevant protein level. Owing to the properties of Bloom filters, it has a zero false-negative rate, allows for exact and inexact searches, and leverages disk storage and Zstandard compression to achieve high time and space efficiency. We demonstrate the utility of MetaProFi by indexing UniProtKB datasets at organism- and at sequence-level in addition to the indexing of Tara Oceans dataset and the 2585 human RNA-seq experiments, showing that MetaProFi consumes far less disk space than state-of-the-art-tools while also improving performance.

Details

ISSN :
15565068
Database :
OpenAIRE
Journal :
SSRN Electronic Journal
Accession number :
edsair.doi.dedup.....6babebcbc25f14f87632f97bca4b707c