Back to Search
Start Over
NuChart-II: The road to a fast and scalable tool for Hi-C data analysis
- Source :
- The international journal of high performance computing applications 31 (2017): 196–211. doi:10.1177/1094342016668567, info:cnr-pdr/source/autori:Tordini, Fabio; Drocco, Maurizio; Misale, Claudia; Milanesi, Luciano; Lio, Pietro; Merelli, Ivan; Torquati, Massimo; Aldinucci, Marco/titolo:NuChart-II: The road to a fast and scalable tool for Hi-C data analysis/doi:10.1177%2F1094342016668567/rivista:The international journal of high performance computing applications/anno:2017/pagina_da:196/pagina_a:211/intervallo_pagine:196–211/volume:31
- Publication Year :
- 2016
- Publisher :
- SAGE Publications, 2016.
-
Abstract
- Recent advances in molecular biology and bioinformatic techniques have brought about an explosion of information about the spatial organisation of the DNA in the nucleus of a cell. High-throughput molecular biology techniques provide a genome-wide capture of the spatial organisation of chromosomes at unprecedented scales, which permit one to identify physical interactions between genetic elements located throughout a genome. This important information is, however, hampered by the lack of biologist-friendly analysis and visualisation software: these disciplines are literally caught in a flood of data and are now facing many of the scale-out issues that high-performance computing has been addressing for years. Data must be managed, analysed and integrated, with substantial requirements of speed (in terms of execution time), application scalability and data representation. In this work, we present NuChart-II, an efficient and highly optimised tool for genomic data analysis that provides a gene-centric, graph-based representation of genomic information and which proposes an ex-post normalisation technique for Hi-C data. While designing NuChart-II, we addressed several common issues in the parallelisation of memory-bound algorithms for shared-memory systems.
- Subjects :
- 0301 basic medicine
Spatial organisation
Computer science
02 engineering and technology
bioinformatics
Hi-C data analysis
High-performance computing
memory-bound algorithms
parallel computing
Theoretical Computer Science
Software
Hardware and Architecture
External Data Representation
computer.software_genre
Genome
03 medical and health sciences
High-performance computing, Bioinformatics, Hi-C data analysis, parallel computing, memory-bound algorithms
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
business.industry
Supercomputer
Data science
Visualization
030104 developmental biology
Scalability
Graph (abstract data type)
Data mining
business
computer
Subjects
Details
- ISSN :
- 17412846 and 10943420
- Volume :
- 31
- Database :
- OpenAIRE
- Journal :
- The International Journal of High Performance Computing Applications
- Accession number :
- edsair.doi.dedup.....b987de96d3a302e7a533590a74e93382