1. STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci
- Author
-
Phillipa J. Lamont, Laurel Hiatt, Nigel G. Laing, Heather C Mefford, Gianina Ravenscroft, Sarah J. Beecroft, Richard Roxburgh, Joseph Brown, Aaron R. Quinlan, Brent S. Pedersen, Miriam J. Rodrigues, Amy Lacroix, Harriet Dashnow, and Mark M. Davis
- Subjects
False discovery rate ,Base pair ,High-Throughput Nucleotide Sequencing ,Computational biology ,Sequence Analysis, DNA ,Biology ,DNA sequencing ,symbols.namesake ,k-mer ,Mendelian inheritance ,symbols ,Microsatellite ,Reference genome ,Sequence (medicine) ,Microsatellite Repeats - Abstract
Expansions of short tandem repeats (STRs) cause dozens of rare Mendelian diseases. However, STR expansions, especially those arising from repeats not present in the reference genome, are challenging to detect from short-read sequencing data. Such “novel” STRs include new repeat units occurring at known STR loci, or entirely new STR loci where the sequence is absent from the reference genome. A primary cause of difficulty detecting STR expansions is that reads arising from STR expansions are frequently mismapped or unmapped. To address this challenge, we have developed STRling, a new STR detection algorithm that counts k-mers (short DNA sequences of length k) in DNA sequencing reads, to efficiently recover reads that inform the presence and size of STR expansions. As a result, STRling can call expansions at both known and novel STR loci. STRling has a sensitivity of 83% for 14 known STR disease loci, including the novel STRs that cause CANVAS and DBQD2. It is the first method to resolve the position of novel STR expansions to base pair accuracy. Such accuracy is essential to interpreting the consequence of each expansion. STRling has an estimated 0.078 false discovery rate for known pathogenic loci in unaffected individuals and a 0.20 false discovery rate for genome-wide loci in unaffected individuals when using variants called from long-read data as truth. STRling is fast, scalable on cloud computing, open-source, and freely available at https://github.com/quinlan-lab/STRling.
- Published
- 2021