Back to Search
Start Over
Detecting genomic deletions from high-throughput sequence data with unsupervised learning.
- Source :
-
BMC bioinformatics [BMC Bioinformatics] 2023 Jan 27; Vol. 23 (Suppl 8), pp. 568. Date of Electronic Publication: 2023 Jan 27. - Publication Year :
- 2023
-
Abstract
- Background: Structural variation (SV), which ranges from 50 bp to [Formula: see text] 3 Mb in size, is an important type of genetic variations. Deletion is a type of SV in which a part of a chromosome or a sequence of DNA is lost during DNA replication. Three types of signals, including discordant read-pairs, reads depth and split reads, are commonly used for SV detection from high-throughput sequence data. Many tools have been developed for detecting SVs by using one or multiple of these signals.<br />Results: In this paper, we develop a new method called EigenDel for detecting the germline submicroscopic genomic deletions. EigenDel first takes advantage of discordant read-pairs and clipped reads to get initial deletion candidates, and then it clusters similar candidates by using unsupervised learning methods. After that, EigenDel uses a carefully designed approach for calling true deletions from each cluster. We conduct various experiments to evaluate the performance of EigenDel on low coverage sequence data.<br />Conclusions: Our results show that EigenDel outperforms other major methods in terms of improving capability of balancing accuracy and sensitivity as well as reducing bias. EigenDel can be downloaded from https://github.com/lxwgcool/EigenDel .<br /> (© 2023. The Author(s).)
Details
- Language :
- English
- ISSN :
- 1471-2105
- Volume :
- 23
- Issue :
- Suppl 8
- Database :
- MEDLINE
- Journal :
- BMC bioinformatics
- Publication Type :
- Academic Journal
- Accession number :
- 36707775
- Full Text :
- https://doi.org/10.1186/s12859-023-05139-w