Back to Search Start Over

CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data.

Authors :
Yuan X
Yu J
Xi J
Yang L
Shang J
Li Z
Duan J
Source :
IEEE/ACM transactions on computational biology and bioinformatics [IEEE/ACM Trans Comput Biol Bioinform] 2021 Mar-Apr; Vol. 18 (2), pp. 539-549. Date of Electronic Publication: 2021 Apr 08.
Publication Year :
2021

Abstract

Accurate detection of copy number variations (CNVs) from short-read sequencing data is challenging due to the uneven distribution of reads and the unbalanced amplitudes of gains and losses. The direct use of read depths to measure CNVs tends to limit performance. Thus, robust computational approaches equipped with appropriate statistics are required to detect CNV regions and boundaries. This study proposes a new method called CNV_IFTV to address this need. CNV_IFTV assigns an anomaly score to each genome bin through a collection of isolation trees. The trees are trained based on isolation forest algorithm through conducting subsampling from measured read depths. With the anomaly scores, CNV_IFTV uses a total variation model to smooth adjacent bins, leading to a denoised score profile. Finally, a statistical model is established to test the denoised scores for calling CNVs. CNV_IFTV is tested on both simulated and real data in comparison to several peer methods. The results indicate that the proposed method outperforms the peer methods. CNV_IFTV is a reliable tool for detecting CNVs from short-read sequencing data even for low-level coverage and tumor purity. The detection results on tumor samples can aid to evaluate known cancer genes and to predict target drugs for disease diagnosis.

Details

Language :
English
ISSN :
1557-9964
Volume :
18
Issue :
2
Database :
MEDLINE
Journal :
IEEE/ACM transactions on computational biology and bioinformatics
Publication Type :
Academic Journal
Accession number :
31180897
Full Text :
https://doi.org/10.1109/TCBB.2019.2920889