Back to Search
Start Over
The Terabase Search Engine: a large-scale relational database of short-read sequences
- Source :
- Bioinformatics. 35:665-670
- Publication Year :
- 2018
- Publisher :
- Oxford University Press (OUP), 2018.
-
Abstract
- Motivation DNA sequencing archives have grown to enormous scales in recent years, and thousands of human genomes have already been sequenced. The size of these data sets has made searching the raw read data infeasible without high-performance data-query technology. Additionally, it is challenging to search a repository of short-read data using relational logic and to apply that logic across samples from multiple whole-genome sequencing samples. Results We have built a compact, efficiently-indexed database that contains the raw read data for over 250 human genomes, encompassing trillions of bases of DNA, and that allows users to search these data in real-time. The Terabase Search Engine enables retrieval from this database of all the reads for any genomic location in a matter of seconds. Users can search using a range of positions or a specific sequence that is aligned to the genome on the fly. Availability and implementation Public access to the Terabase Search Engine database is available at http://tse.idies.jhu.edu. Supplementary information Supplementary data are available at Bioinformatics online.
- Subjects :
- Statistics and Probability
Computer science
Relational database
Biochemistry
Genome
DNA sequencing
03 medical and health sciences
Search engine
chemistry.chemical_compound
Databases, Genetic
Humans
Molecular Biology
030304 developmental biology
0303 health sciences
Information retrieval
Genome, Human
030302 biochemistry & molecular biology
Genomics
Sequence Analysis, DNA
Short read
Original Papers
Computer Science Applications
Search Engine
Computational Mathematics
Computational Theory and Mathematics
chemistry
Human genome
Scale (map)
Software
DNA
Range (computer programming)
Subjects
Details
- ISSN :
- 13674811 and 13674803
- Volume :
- 35
- Database :
- OpenAIRE
- Journal :
- Bioinformatics
- Accession number :
- edsair.doi.dedup.....73d48604c7545971d302053a39824b69