Back to Search
Start Over
A System Architecture for Efficient Transmission of Massive DNA Sequencing Data
A System Architecture for Efficient Transmission of Massive DNA Sequencing Data
- Source :
- Journal of Computational Biology. 24:1081-1088
- Publication Year :
- 2017
- Publisher :
- Mary Ann Liebert Inc, 2017.
-
Abstract
- The DNA sequencing data analysis pipelines require significant computational resources. In that sense, cloud computing infrastructures appear as a natural choice for this processing. However, the first practical difficulty in reaching the cloud computing services is the transmission of the massive DNA sequencing data from where they are produced to where they will be processed. The daily practice here begins with compressing the data in FASTQ file format, and then sending these data via fast data transmission protocols. In this study, we address the weaknesses in that daily practice and present a new system architecture that incorporates the computational resources available on the client side while dynamically adapting itself to the available bandwidth. Our proposal considers the real-life scenarios, where the bandwidth of the connection between the parties may fluctuate, and also the computing power on the client side may be of any size ranging from moderate personal computers to powerful workstations. The proposed architecture aims at utilizing both the communication bandwidth and the computing resources for satisfying the ultimate goal of reaching the results as early as possible. We present a prototype implementation of the proposed architecture, and analyze several real-life cases, which provide useful insights for the sequencing centers, especially on deciding when to use a cloud service and in what conditions.
- Subjects :
- 0301 basic medicine
FASTQ format
Workstation
Computer science
Distributed computing
0206 medical engineering
Cloud computing
02 engineering and technology
law.invention
World Wide Web
03 medical and health sciences
law
Genetics
Bandwidth (computing)
Humans
Molecular Biology
business.industry
Systems Biology
Computational Biology
High-Throughput Nucleotide Sequencing
Genomics
Sequence Analysis, DNA
Client-side
File format
Computational Mathematics
030104 developmental biology
Computational Theory and Mathematics
Modeling and Simulation
Systems architecture
business
Software
020602 bioinformatics
Data transmission
Subjects
Details
- ISSN :
- 15578666
- Volume :
- 24
- Database :
- OpenAIRE
- Journal :
- Journal of Computational Biology
- Accession number :
- edsair.doi.dedup.....143279364dcdaaa4d6f1545d1660f48c
- Full Text :
- https://doi.org/10.1089/cmb.2017.0016