Back to Search Start Over

Symbiont-Screener: a reference-free filter to automatically separate host sequences and contaminants for long reads or co-barcoded reads by unsupervised clustering

Authors :
Lidong Guo
Jianwei Chen
Xin Liu
Guangyi Fan
Chengcheng Shi
X. D. Liu
Mengyang Xu
Publication Year :
2020
Publisher :
Cold Spring Harbor Laboratory, 2020.

Abstract

Decontamination is necessary for eliminating the effect of foreign genomes on the symbiont studies and biomedical discoveries. However, direct extraction of host sequencing reads with no references remains challenging. Here, we present a triobased method to classify the host error-prone long reads or sparse co-barcoded reads prior to assembly, free of any alignments against DNA or protein references. This method first identifies high-confident host reads by haplotype-specific k-mers inherited from parents, and then groups remaining host reads by the unsupervised clustering. Experimental results demonstrated that this approach successfully classified up to 97.38% of the host human long reads with the precision rate of 99.9999%, and 79.95% host co-barcoded reads with the precision rate of 98.36% using an artificially mixed data. Moreover, the tool also exhibited a good performance on the decontamination of the real algae data. The purified reads reconstructed two haplotypes and improved the assembly with larger contig NGA50 value and less misassemblies. Symbiont-Screener can be freely downloaded at https://github.com/BGI-Qingdao/Symbiont-Screener.

Details

Database :
OpenAIRE
Accession number :
edsair.doi...........53636c79cccaf04fe131fa13fcb8518e