1. Tehnologije dolgih odčitkov za izboljšano sestavljanje kompleksnih genomov
- Author
-
Hozjan, Žan and Jakše, Jernej
- Subjects
zaporedje DNA ,kartiranje genov ,DNA sequence ,sekvenciranje ,udc:601.4:575.116.4:577.212.3(043.2) ,sequencing ,gene mapping ,genom ,genome - Abstract
Tehnologije dolgih odčitkov so še vedno karakterizirane kot počasne tehnologije, ki generirajo veliko napak. Karakterizacija temelji na dejstvih iz preteklosti, saj so tehnologije dolgih odčitkov danes mnogo hitrejše, cenejše in natančnejše. Razvoji tehnologij, kot je krožno sekvenciranje podjetja PacBio, ter različni novi razviti algoritmi, kot sta Canu in Flye, omogočajo natančnost sekvenciranja, ki se lahko kosa s tisto, ki jo dobimo pri sekvenciranju s tehnologijami kratkih odčitkov (99,8 % pri krožnem sekvenciranju in 99 % pri algoritmu Canu). Poleg primerljive natančnosti s tehnologijami kratkih odčitkov pa dolgi odčitki ponujajo bistvene prednosti pri kartiranju in anotiranju genomov, sekvenciranju haplotipov, odkrivanju velikih strukturnih variacij in branju rastlinskih genomov z ogromnim številom dolgih ponovljivih regij (te lahko presegajo tudi več kot 80 % genoma). To omogočajo izjemne dolžine odčitkov, ki lahko presegajo tudi milijon baznih parov. V dolžini odčitkov prednjači tehnologija podjetja Oxford Nanopore, ki temelji na prehodu DNA zaporedja skozi biološko nanoporo. Danes ponujajo naprave, ki lahko v 48 urah proizvedejo 6 Tb podatkov in so po pretoku informacij konkurenčne NGS tehnikam kratkih odčitkov. Alternativa dolgim odčitkom so sintetične metode dolgih odčitkov, ki se ponašajo z nižjimi cenami in natančnimi prvimi branji. Tehnologija podjetja 10X Genomics ponuja možnost kartiranja genoma (doseženo je bilo 90 % pokrivanje referenčnega genoma človeka) brez sekvenciranja dolgih odčitkov. Efektivno kartiranje genomov omogoča tudi optično kartiranje, s katerim se da uspešno kartirati visoko ponovljive poliploidne genome, kot je genom pšenice, dolžine 17 Gb. Long-read technologies are still regarded as slow, error-prone technologies. These assumptions are made from the facts of the past. Long read technologies of today have matured, they were once niche tools and today they represent an important technology in de-novo sequencing of complex genomes. Advancements in technology have enabled much higher throughput and lower error rates at substantially lower costs. Technologies like circular consensus sequencing – CCS developed by Pacific Biosciences and new algorithms like Canu and Flye help long-read technologies achieve low error rates similar to those achieved by short-read sequencing (99,8% with CCS and 99% with Canu). Long-read technologies now offer sufficient read accuracies while also offering great advantages in genome mapping, haplotype sequencing, structural variants detection and sequencing of huge plant genomes with long repetitive regions that can stretch over 90% of the genome. This is possible because of the enormous lengths of long reads, which can achieve over one million base pairs. Leading technology for creating reads of huge length has been developed by Oxford Nanopore and it works by passing the DNA molecule through the biological nanopore. The sequencers they offer today are capable of high throughputs (6 Tb of data in 48 hours) that puts them in line with NGS short-read technologies. Alternative to long reads are synthetic long-read technologies. 10X Genomics synthetic long reads offer accurate genome mapping (90% coverage of reference human genome was achieved) without long-read sequencing. Another viable method for genome mapping is optical mapping. The large, polyploid, highly repetitive genome of wheat (17Gb) has already been successfully mapped with the mentioned technology.
- Published
- 2019