1. Closha 2.0: a bio-workflow design system for massive genome data analysis on high performance cluster infrastructure
- Author
-
Gunhwan Ko, Pan-Gyu Kim, Byung-Ha Yoon, JaeHee Kim, Wangho Song, IkSu Byeon, JongCheol Yoon, Byungwook Lee, and Young-Kuk Kim
- Subjects
Closha 2.0 ,Next-generation sequencing (NGS) ,Cloud computing ,Bioinformatics workflow ,High-performance computing (HPC) ,Genomic data analysis ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and significant computational challenges. As the cost of next-generation sequencing (NGS) has decreased, the amount of genomic data has surged globally. However, the cost and complexity of the computational resources required continue to be substantial barriers to leveraging big data. A promising solution to these computational challenges is cloud computing, which provides researchers with the necessary CPUs, memory, storage, and software tools. Results Here, we present Closha 2.0, a cloud computing service that offers a user-friendly platform for analyzing massive genomic datasets. Closha 2.0 is designed to provide a cloud-based environment that enables all genomic researchers, including those with limited or no programming experience, to easily analyze their genomic data. The new 2.0 version of Closha has more user-friendly features than the previous 1.0 version. Firstly, the workbench features a script editor that supports Python, R, and shell script programming, enabling users to write scripts and integrate them into their pipelines. This functionality is particularly useful for downstream analysis. Second, Closha 2.0 runs on containers, which execute each tool in an independent environment. This provides a stable environment and prevents dependency issues and version conflicts among tools. Additionally, users can execute each step of a pipeline individually, allowing them to test applications at each stage and adjust parameters to achieve the desired results. We also updated a high-speed data transmission tool called GBox that facilitates the rapid transfer of large datasets. Conclusions The analysis pipelines on Closha 2.0 are reproducible, with all analysis parameters and inputs being permanently recorded. Closha 2.0 simplifies multi-step analysis with drag-and-drop functionality and provides a user-friendly interface for genomic scientists to obtain accurate results from NGS data. Closha 2.0 is freely available at https://www.kobic.re.kr/closha2 .
- Published
- 2024
- Full Text
- View/download PDF