Back to Search
Start Over
MetaProFi: A Protein-Based Bloom Filter for Storing and Querying Sequence Data for Accurate Identification of Functionally Relevant Genetic Variants
- Source :
- SSRN Electronic Journal.
- Publication Year :
- 2021
- Publisher :
- Elsevier BV, 2021.
-
Abstract
- Technological advances of next-generation sequencing present new computational challenges to develop methods to store and query these data in time- and memory-efficient ways. We present MetaProFi (https://github.com/kalininalab/metaprofi), a Bloom filter-based tool that, in addition to supporting nucleotide sequences, can for the first time directly store and query amino acid sequences and translated nucleotide sequences, thus bringing sequence comparison to a more biologically relevant protein level. Owing to the properties of Bloom filters, it has a zero false-negative rate, allows for exact and inexact searches, and leverages disk storage and Zstandard compression to achieve high time and space efficiency. We demonstrate the utility of MetaProFi by indexing UniProtKB datasets at organism- and at sequence-level in addition to the indexing of Tara Oceans dataset and the 2585 human RNA-seq experiments, showing that MetaProFi consumes far less disk space than state-of-the-art-tools while also improving performance.
- Subjects :
- History
Polymers and Plastics
Computer science
Search engine indexing
Genetic variants
A protein
Bloom filter
computer.software_genre
Industrial and Manufacturing Engineering
Identification (information)
Data sequences
Disk storage
Data mining
UniProt
Business and International Management
computer
Subjects
Details
- ISSN :
- 15565068
- Database :
- OpenAIRE
- Journal :
- SSRN Electronic Journal
- Accession number :
- edsair.doi.dedup.....6babebcbc25f14f87632f97bca4b707c