Back to Search
Start Over
A Pipeline for Reconstructing Somatic Copy Number Alternation’s Subclonal Population-Based Next-Generation Sequencing Data
- Source :
- Frontiers in Genetics, Vol 10 (2020), Frontiers in Genetics
- Publication Year :
- 2020
- Publisher :
- Frontiers Media S.A., 2020.
-
Abstract
- State-of-the-art next-generation sequencing (NGS)-based subclonal reconstruction methods perform poorly on somatic copy number alternations (SCNAs), due to not only it needs to simultaneously estimate the subclonal population frequency and the absolute copy number for each SCNA, but also there exist complex bias and noise in the tumor and its paired normal sequencing data. Both existing NGS-based SCNA detection methods and SCNA’s subclonal population frequency inferring tools use the read count on radio (RCR) of tumor to its paired normal as the key feature of tumor sequencing data; however, the sequencing error and bias have great impact on RCR, which leads to a large number of redundant SCNA segments that make the subsequent process of SCNA’s subclonal population frequency inferring and subclonal reconstruction time-consuming and inaccurate. We perform a mathematical analysis of the solution number of SCNA’s subclonal frequency, and we propose a computational algorithm to reduce the impact of false breakpoints based on it. We construct a new probability model that incorporates the RCR bias correction algorithm, and by stringing it with the false breakpoint filtering algorithm, we construct a whole SCNA’s subclonal population reconstruction pipeline. The experimental result shows that our pipeline outperforms the existing subclonal reconstruction programs both on simulated data and TCGA data. Source code is publicly available as a Python package at https://github.com/dustincys/msphy-SCNAClonal.
- Subjects :
- 0301 basic medicine
Source code
lcsh:QH426-470
Computer science
media_common.quotation_subject
Pipeline (computing)
Population
SCNA
computer.software_genre
DNA sequencing
03 medical and health sciences
somatic copy number alternation
0302 clinical medicine
Genetics
education
subclonal frequency
Genetics (clinical)
Original Research
media_common
education.field_of_study
absolute copy number
Breakpoint
bias correction
Hierarchical clustering
lcsh:Genetics
030104 developmental biology
Feature (computer vision)
030220 oncology & carcinogenesis
Molecular Medicine
subclonal reconstruction
Data mining
computer
Subjects
Details
- Language :
- English
- ISSN :
- 16648021
- Volume :
- 10
- Database :
- OpenAIRE
- Journal :
- Frontiers in Genetics
- Accession number :
- edsair.doi.dedup.....256b136e57bbf435c13ae9e34ec3317e
- Full Text :
- https://doi.org/10.3389/fgene.2019.01374/full