Back to Search Start Over

STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci

Authors :
Phillipa J. Lamont
Laurel Hiatt
Nigel G. Laing
Heather C Mefford
Gianina Ravenscroft
Sarah J. Beecroft
Richard Roxburgh
Joseph Brown
Aaron R. Quinlan
Brent S. Pedersen
Miriam J. Rodrigues
Amy Lacroix
Harriet Dashnow
Mark M. Davis
Source :
Genome biology. 23(1)
Publication Year :
2021

Abstract

Expansions of short tandem repeats (STRs) cause dozens of rare Mendelian diseases. However, STR expansions, especially those arising from repeats not present in the reference genome, are challenging to detect from short-read sequencing data. Such “novel” STRs include new repeat units occurring at known STR loci, or entirely new STR loci where the sequence is absent from the reference genome. A primary cause of difficulty detecting STR expansions is that reads arising from STR expansions are frequently mismapped or unmapped. To address this challenge, we have developed STRling, a new STR detection algorithm that counts k-mers (short DNA sequences of length k) in DNA sequencing reads, to efficiently recover reads that inform the presence and size of STR expansions. As a result, STRling can call expansions at both known and novel STR loci. STRling has a sensitivity of 83% for 14 known STR disease loci, including the novel STRs that cause CANVAS and DBQD2. It is the first method to resolve the position of novel STR expansions to base pair accuracy. Such accuracy is essential to interpreting the consequence of each expansion. STRling has an estimated 0.078 false discovery rate for known pathogenic loci in unaffected individuals and a 0.20 false discovery rate for genome-wide loci in unaffected individuals when using variants called from long-read data as truth. STRling is fast, scalable on cloud computing, open-source, and freely available at https://github.com/quinlan-lab/STRling.

Details

ISSN :
1474760X
Volume :
23
Issue :
1
Database :
OpenAIRE
Journal :
Genome biology
Accession number :
edsair.doi.dedup.....f093a700c69e55757c865d25767389a7