Back to Search Start Over

Detecting genomic deletions from high-throughput sequence data with unsupervised learning.

Authors :
Li X
Wu Y
Source :
BMC bioinformatics [BMC Bioinformatics] 2023 Jan 27; Vol. 23 (Suppl 8), pp. 568. Date of Electronic Publication: 2023 Jan 27.
Publication Year :
2023

Abstract

Background: Structural variation (SV), which ranges from 50 bp to [Formula: see text] 3 Mb in size, is an important type of genetic variations. Deletion is a type of SV in which a part of a chromosome or a sequence of DNA is lost during DNA replication. Three types of signals, including discordant read-pairs, reads depth and split reads, are commonly used for SV detection from high-throughput sequence data. Many tools have been developed for detecting SVs by using one or multiple of these signals.<br />Results: In this paper, we develop a new method called EigenDel for detecting the germline submicroscopic genomic deletions. EigenDel first takes advantage of discordant read-pairs and clipped reads to get initial deletion candidates, and then it clusters similar candidates by using unsupervised learning methods. After that, EigenDel uses a carefully designed approach for calling true deletions from each cluster. We conduct various experiments to evaluate the performance of EigenDel on low coverage sequence data.<br />Conclusions: Our results show that EigenDel outperforms other major methods in terms of improving capability of balancing accuracy and sensitivity as well as reducing bias. EigenDel can be downloaded from https://github.com/lxwgcool/EigenDel .<br /> (© 2023. The Author(s).)

Details

Language :
English
ISSN :
1471-2105
Volume :
23
Issue :
Suppl 8
Database :
MEDLINE
Journal :
BMC bioinformatics
Publication Type :
Academic Journal
Accession number :
36707775
Full Text :
https://doi.org/10.1186/s12859-023-05139-w