25 results on '"Haynes Heaton"'
Search Results
2. Chromosomal reference genome sequences for the malaria mosquito, Anopheles coustani, Laveran, 1900 [version 1; peer review: 2 approved]
- Author
-
Shane A. McCarthy, Damon-Lee B. Pointon, Jonathan M.D. Wood, Ying Sims, James W. Torrance, Mara K N Lawniczak, Marcela Uliano-Silva, Martin G. Wagah, Diego Ayala, Harriet F. Johnson, Ksenia Krasheninnikova, Haynes Heaton, Alan Tracey, Katharina von Wyschetzki, Alex Makunin, Daniel E. Neafsey, Boris K. Makanga, Lemonde B. A. Bouafou, Joanna C. Collins, Sarah E. Pelan, and Nil Rahola
- Subjects
Anopheles coustani ,African malaria mosquito ,genome sequence ,chromosomal ,eng ,Medicine ,Science - Abstract
We present genome assembly from individual female An. coustani (African malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae) from Lopé, Gabon. The genome sequence is 270 megabases in span. Most of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled for both species. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.
- Published
- 2024
- Full Text
- View/download PDF
3. A chromosomal reference genome sequence for the malaria mosquito, Anopheles gambiae, Giles, 1902, Ifakara strain [version 2; peer review: 2 approved]
- Author
-
Ying Sims, Shane A. McCarthy, Damon-Lee B. Pointon, Jonathan MD Wood, James W. Torrance, Harriet Johnson, Ksenia Krasheninnikova, Haynes Heaton, Joanna Collins, Alan Tracey, Mara Lawniczak, Marcela Uliano Da Silva, Katharina von Wyschetzki, Alex Makunin, Daniel E. Neafsey, Mara K.N. Lawniczak, Tibebu Habtewold, Mgeni Mohamed Tambwe, Martin Wagah, Nikolai Windbichler, Sarah Moore, Sarah E. Pelan, and George Christophides
- Subjects
Anopheles gambiae ,African malaria mosquito ,genome sequence ,chromosomal ,eng ,Medicine ,Science - Abstract
We present a genome assembly from an individual female Anopheles gambiae (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae), Ifakara strain. The genome sequence is 264 megabases in span. Most of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.
- Published
- 2024
- Full Text
- View/download PDF
4. A chromosomal reference genome sequence for the malaria mosquito, Anopheles moucheti, Evans, 1925 [version 1; peer review: 2 approved]
- Author
-
Shane A. McCarthy, Damon-Lee B. Pointon, Ying Sims, James W. Torrance, Jean-Pierre Agbor, Sandrine N. Nsango, Martin G. Wagah, Diego Ayala, Harriet F. Johnson, Jonathan M. D. Wood, Joanna C. Collins, Ksenia Krasheninnikova, Haynes Heaton, Alan Tracey, Marcela Uliano Da Silva, Katharina von Wyschetzki, Alex Makunin, Daniel E. Neafsey, Mara Lawniczak, and Sarah E. Pelan
- Subjects
Anopheles moucheti ,African malaria mosquito ,genome sequence ,chromosomal ,eng ,Medicine ,Science - Abstract
We present a genome assembly from an individual male Anopheles moucheti (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae), from a wild population in Cameroon. The genome sequence is 271 megabases in span. The majority of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.5 kilobases in length.
- Published
- 2023
- Full Text
- View/download PDF
5. The genome sequence of the malaria mosquito, Anopheles funestus, Giles, 1900 [version 2; peer review: 2 approved]
- Author
-
James Torrance, Ying Sims, Marcela Uliano-Silva, Sarah Pelan, Ousman Akone-Ella, Diego Ayala, Harriet Johnson, Pierre Kengne, Ksenia Krasheninnikova, Haynes Heaton, Joanna Collins, Alan Tracey, Damon-Lee Pointon, Katharina von Wyschetzki, Alex Makunin, Daniel Neafsey, Mara Lawniczak, Shane McCarthy, and Jonathan Wood
- Subjects
Anopheles funestus ,African malaria mosquito ,genome sequence ,chromosomal inversions ,eng ,Medicine ,Science - Abstract
We present a genome assembly from an individual female Anopheles funestus (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae). The genome sequence is 251 megabases in span. The majority of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.
- Published
- 2023
- Full Text
- View/download PDF
6. A chromosomal reference genome sequence for the malaria mosquito, Anopheles gambiae, Giles, 1902, Ifakara strain [version 1; peer review: 2 approved]
- Author
-
Ying Sims, Shane A. McCarthy, Damon-Lee B. Pointon, Jonathan MD Wood, James W. Torrance, Harriet Johnson, Ksenia Krasheninnikova, Haynes Heaton, Joanna Collins, Alan Tracey, Marcela Uliano Da Silva, Katharina von Wyschetzki, Alex Makunin, Daniel E. Neafsey, Mara Lawniczak, Tibebu Habtewold, Mgeni Mohamed Tambwe, Martin Wagah, Nikolai Windbichler, Sarah Moore, Sarah E. Pelan, and George Christophides
- Subjects
Anopheles gambiae ,African malaria mosquito ,genome sequence ,chromosomal ,eng ,Medicine ,Science - Abstract
We present a genome assembly from an individual female Anopheles gambiae (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae), Ifakara strain. The genome sequence is 264 megabases in span. Most of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.
- Published
- 2023
- Full Text
- View/download PDF
7. The genome sequence of the malaria mosquito, Anopheles funestus, Giles, 1900 [version 1; peer review: 2 approved]
- Author
-
James Torrance, Ying Sims, Marcela Uliano-Silva, Sarah Pelan, Ousman Akone-Ella, Diego Ayala, Harriet Johnson, Pierre Kengne, Ksenia Krasheninnikova, Haynes Heaton, Joanna Collins, Alan Tracey, Damon-Lee Pointon, Katharina von Wyschetzki, Alex Makunin, Daniel Neafsey, Mara Lawniczak, Shane McCarthy, and Jonathan Wood
- Subjects
Anopheles funestus ,African malaria mosquito ,genome sequence ,chromosomal inversions ,eng ,Medicine ,Science - Abstract
We present a genome assembly from an individual female Anopheles funestus (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae). The genome sequence is 251 megabases in span. The majority of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.
- Published
- 2022
- Full Text
- View/download PDF
8. Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes
- Author
-
Mara K. N. Lawniczak, Martin Hemberg, Maria Imaz, Richard Durbin, Arthur M. Talman, Andrew J Knights, Haynes Heaton, and Daniel J. Gaffney
- Subjects
0303 health sciences ,Lysis ,Cell ,Genetic variants ,RNA ,RNA-Seq ,Cell Biology ,Computational biology ,Biology ,Biochemistry ,03 medical and health sciences ,medicine.anatomical_structure ,Genotype ,medicine ,Base sequence ,Cluster analysis ,Molecular Biology ,030304 developmental biology ,Biotechnology - Abstract
Methods to deconvolve single-cell RNA-sequencing (scRNA-seq) data are necessary for samples containing a mixture of genotypes, whether they are natural or experimentally combined. Multiplexing across donors is a popular experimental design that can avoid batch effects, reduce costs and improve doublet detection. By using variants detected in scRNA-seq reads, it is possible to assign cells to their donor of origin and identify cross-genotype doublets that may have highly similar transcriptional profiles, precluding detection by transcriptional profile. More subtle cross-genotype variant contamination can be used to estimate the amount of ambient RNA. Ambient RNA is caused by cell lysis before droplet partitioning and is an important confounder of scRNA-seq analysis. Here we develop souporcell, a method to cluster cells using the genetic variants detected within the scRNA-seq reads. We show that it achieves high accuracy on genotype clustering, doublet detection and ambient RNA estimation, as demonstrated across a range of challenging scenarios.
- Published
- 2020
- Full Text
- View/download PDF
9. An open resource for accurately benchmarking small variant and reference calls
- Author
-
Justin M. Zook, Len Trigg, Chunlin Xiao, Justin Wagner, Sean A. Irvine, Haynes Heaton, Jennifer McDaniel, Hemang Parikh, Francisco M. De La Vega, Cory Y. McLean, Stephen T. Sherry, Nathan D. Olson, Rebecca Truty, and Marc L. Salit
- Subjects
Computer science ,Biomedical Engineering ,Bioengineering ,Genomics ,Context (language use) ,computer.software_genre ,Polymorphism, Single Nucleotide ,Applied Microbiology and Biotechnology ,Genome ,Article ,03 medical and health sciences ,0302 clinical medicine ,Resource (project management) ,INDEL Mutation ,Humans ,030304 developmental biology ,0303 health sciences ,Genome, Human ,Computational Biology ,Genetic Variation ,High-Throughput Nucleotide Sequencing ,Benchmarking ,Pipeline (software) ,Personal Genome Project ,Benchmark (computing) ,Molecular Medicine ,Data mining ,computer ,Software ,030217 neurology & neurosurgery ,Biotechnology - Abstract
Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle (GIAB) Consortium, we apply a reproducible, cloud-based pipeline to integrate multiple short- and linked-read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a 'first of its kind' resource that is available to the community for multiple downstream applications. We produce 17% more benchmark single nucleotide variations, 176% more indels and 12% larger benchmark regions than previously published GIAB benchmarks. We demonstrate that this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive. Finally, we identify strengths and weaknesses of callsets by stratifying performance according to variant type and genome context.
- Published
- 2019
- Full Text
- View/download PDF
10. Peer review of 'Genetic demultiplexing of pooled single-cell RNA-sequencing samples in cancer facilitates effective experimental design'
- Author
-
Haynes Heaton
- Abstract
This is the open peer reviewers comments and recommendations regarding the submitted GigaScience article and/or dataset.
- Published
- 2021
- Full Text
- View/download PDF
11. souporcell: Robust clustering of single cell RNAseq by genotype and ambient RNA inference without reference genotypes
- Author
-
Martin Hemberg, Daniel J. Gaffney, Mara K. N. Lawniczak, Maria Imaz, Arthur M. Talman, Andrew J Knights, Richard Durbin, and Haynes Heaton
- Subjects
0303 health sciences ,Computer science ,Inference ,RNA ,Computational biology ,03 medical and health sciences ,0302 clinical medicine ,Genotype ,Range (statistics) ,A priori and a posteriori ,Multiplex ,Cluster analysis ,Genotyping ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
Methods to deconvolve single-cell RNA sequencing (scRNAseq) data are necessary for samples containing a natural mixture of genotypes and for scRNAseq experiments that multiplex cells from different donors1. Multiplexing across donors is a popular experimental design with many benefits including avoiding batch effects2, reducing costs, and improving doublet detection. Using variants detected in the RNAseq reads, it is possible to assign cells to the individuals from which they arose. These variants can also be used to identify and remove cross-genotype doublet cells that may have highly similar transcriptional profiles precluding detection by transcriptional profile. More subtle cross-genotype variant contamination can be used to estimate the amount of ambient RNA in the system. Ambient RNA is caused by cell lysis prior to droplet partitioning and is an important confounder of scRNAseq analysis3. Souporcell is a novel method to cluster cells using only the genetic variants detected within the scRNAseq reads. We show that it achieves high accuracy on genotype clustering, doublet detection, and ambient RNA estimation as demonstrated across a wide range of challenging scenarios.
- Published
- 2019
- Full Text
- View/download PDF
12. Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes
- Author
-
Haynes, Heaton, Arthur M, Talman, Andrew, Knights, Maria, Imaz, Daniel J, Gaffney, Richard, Durbin, Martin, Hemberg, and Mara K N, Lawniczak
- Subjects
Base Sequence ,Genotype ,Cluster Analysis ,Humans ,RNA ,RNA-Seq ,Single-Cell Analysis ,Polymorphism, Single Nucleotide ,Sensitivity and Specificity ,Algorithms ,Software ,Cell Line - Abstract
Methods to deconvolve single-cell RNA-sequencing (scRNA-seq) data are necessary for samples containing a mixture of genotypes, whether they are natural or experimentally combined. Multiplexing across donors is a popular experimental design that can avoid batch effects, reduce costs and improve doublet detection. By using variants detected in scRNA-seq reads, it is possible to assign cells to their donor of origin and identify cross-genotype doublets that may have highly similar transcriptional profiles, precluding detection by transcriptional profile. More subtle cross-genotype variant contamination can be used to estimate the amount of ambient RNA. Ambient RNA is caused by cell lysis before droplet partitioning and is an important confounder of scRNA-seq analysis. Here we develop souporcell, a method to cluster cells using the genetic variants detected within the scRNA-seq reads. We show that it achieves high accuracy on genotype clustering, doublet detection and ambient RNA estimation, as demonstrated across a range of challenging scenarios.
- Published
- 2019
13. Multi-platform discovery of haplotype-resolved structural variation in human genomes
- Author
-
Paul Flicek, Kai Ye, Diana C.J. Spierings, David U. Gorkin, Susan Fairley, Mark Chaisson, Shantao Li, Xinghua Shi, Ming Xiao, Jee Young Kwon, Danny Antaki, Patrick Marks, Anne Marie E. Welch, Qihui Zhu, Katherine M. Munson, Sau Peng Lee, Deanna M. Church, Pui-Yan Kwok, Han Cao, Goo Jun, Joey Flores, Sascha Meiers, Chong-Lek Koh, Jonathan Sebat, Thomas Anantharaman, Alistair Ward, Ryan L. Collins, Zechen Chong, Aaron M. Wenger, Chong Chen, Ali Bashir, Fabio C. P. Navarro, Wan-Ping Lee, Sergei Yakneen, Amina Noor, Sushant Kumar, Xiangmeng Kong, Chen-Shan Chin, Peter A. Audano, Peter M. Lansdorp, Scott E. Devine, Steven A. McCarroll, Dillon Lee, Gabriel Rosanio, Ernesto Lowy, Jan O. Korbel, Adrian M. Stütz, Ernest T. Lam, Victor Guryev, Madhusudan Gujral, Tobias Marschall, Li Guo, Oscar L. Rodriguez, Fereydoun Hormozdiari, Zev N. Kronenberg, Mallory Ryan, Bradley J. Nelson, Ankit Malhotra, Joyce V. Lee, Xian Fan, Nelson T. Chuang, Eugene J. Gardner, Timur R. Galeev, Robert E. Handsaker, David Porubsky, Jonas Korlach, Conor Nodzak, Laura Clarke, Tobias Rausch, Michael E. Talkowski, Chengsheng Zhang, Ryan E. Mills, Jong Eun Lee, Andy Wing Chun Pang, Andrew Farrell, Li Ding, Mark Gerstein, Yunjiang Qiu, Sofia Kyriazopoulou-Panagiotopoulou, Karine A. Viaud-Martinez, Xiangqun Zheng-Bradley, Stuart Cantsilieris, Bing Ren, Christine C. Lambert, Xintong Chen, Xuefang Zhao, Ken Chen, Ashley D. Sanders, Charles Lee, William Haynes Heaton, Evan E. Eichler, Gabor T. Marth, Jia Wen, Wei Xu, Alex Hastie, Eliza Cerveira, Harrison Brand, Groningen Research Institute for Asthma and COPD (GRIAC), Damage and Repair in Cancer Development and Cancer Treatment (DARE), and Stem Cell Aging Leukemia and Lymphoma (SALL)
- Subjects
0301 basic medicine ,Cancer Research ,Science ,General Physics and Astronomy ,Genomics ,02 engineering and technology ,Computational biology ,Human genetic variation ,Biology ,Genome ,General Biochemistry, Genetics and Molecular Biology ,DNA sequencing ,Article ,Structural variation ,03 medical and health sciences ,Databases ,Genetic ,INDEL Mutation ,Databases, Genetic ,Genetics ,Humans ,2.1 Biological and endogenous factors ,1000 Genomes Project ,Aetiology ,lcsh:Science ,Whole genome sequencing ,Multidisciplinary ,Whole Genome Sequencing ,Genome, Human ,Human Genome ,Chromosome Mapping ,High-Throughput Nucleotide Sequencing ,General Chemistry ,021001 nanoscience & nanotechnology ,030104 developmental biology ,Haplotypes ,Genomic Structural Variation ,lcsh:Q ,Human genome ,Generic health relevance ,0210 nano-technology ,human activities ,Algorithms ,Human ,Biotechnology - Abstract
The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (, Structural variants (SVs) in human genomes contribute diversity and diseases. Here, the authors use a multi-platform strategy to generate haplotype-resolved SVs for three human parent–child trios.
- Published
- 2019
14. The Malaria Cell Atlas: a comprehensive reference of single parasite transcriptomes across the complete Plasmodium life cycle
- Author
-
Haynes Heaton, Kedar Nath Natarajan, Adam J. Reid, Virginia M. Howick, Lisa H. Verzier, Arthur M. Talman, Julian C. Rayner, Mara K. N. Lawniczak, Matthew Berriman, Martin Hemberg, Hellen Butungi, Andrew Russell, Oliver Billker, Jeremy K. Herren, Tom Metcalf, and Tallulah S. Andrews
- Subjects
030213 general clinical medicine ,0303 health sciences ,Cell ,Biology ,medicine.disease ,biology.organism_classification ,Life stage ,3. Good health ,Transcriptome ,03 medical and health sciences ,0302 clinical medicine ,medicine.anatomical_structure ,Evolutionary biology ,parasitic diseases ,medicine ,Parasite hosting ,Plasmodium berghei ,Gene ,Malaria ,030304 developmental biology - Abstract
Malaria parasites adopt a remarkable variety of morphological life stages as they transition through multiple mammalian host and mosquito vector environments. Here we profile the single-cell transcriptomes of thousands of individual parasites, deriving the first high-resolution transcriptional atlas of the entire Plasmodium berghei life cycle. We then use our atlas to precisely define developmental stages of single cells from three different human malaria parasite species, including parasites isolated directly from infected individuals. The Malaria Cell Atlas provides both a comprehensive view of gene usage in a complex eukaryotic parasite and an open access reference data set for the study of malaria parasites.
- Published
- 2019
- Full Text
- View/download PDF
15. The Malaria Cell Atlas: Single parasite transcriptomes across the complete
- Author
-
Virginia M, Howick, Andrew J C, Russell, Tallulah, Andrews, Haynes, Heaton, Adam J, Reid, Kedar, Natarajan, Hellen, Butungi, Tom, Metcalf, Lisa H, Verzier, Julian C, Rayner, Matthew, Berriman, Jeremy K, Herren, Oliver, Billker, Martin, Hemberg, Arthur M, Talman, and Mara K N, Lawniczak
- Subjects
Life Cycle Stages ,Atlases as Topic ,Plasmodium berghei ,Anopheles ,Genes, Protozoan ,Animals ,Humans ,Single-Cell Analysis ,Transcriptome ,HeLa Cells ,Malaria - Abstract
Malaria parasites adopt a remarkable variety of morphological life stages as they transition through multiple mammalian host and mosquito vector environments. We profiled the single-cell transcriptomes of thousands of individual parasites, deriving the first high-resolution transcriptional atlas of the entire
- Published
- 2019
16. A High-Quality De novo Genome Assembly from a Single Mosquito Using PacBio Sequencing
- Author
-
Sarah B. Kingan, Jonas Korlach, Christine C. Lambert, Brendan Galvin, Haynes Heaton, Juliana Cudini, Mara K. N. Lawniczak, Richard Durbin, Primo Baybayan, Kingan, Sarah B [0000-0002-4900-0189], Heaton, Haynes [0000-0002-9649-525X], Durbin, Richard [0000-0002-9130-1006], Korlach, Jonas [0000-0003-3047-4250], Lawniczak, Mara KN [0000-0002-3006-2080], and Apollo - University of Cambridge Repository
- Subjects
0106 biological sciences ,0301 basic medicine ,lcsh:QH426-470 ,0206 medical engineering ,Genome, Insect ,Sequence assembly ,mosquito ,02 engineering and technology ,Computational biology ,010603 evolutionary biology ,01 natural sciences ,Genome ,Article ,03 medical and health sciences ,Contig Mapping ,Anopheles ,low-input DNA ,Genetics ,Animals ,long-read SMRT sequencing ,Gene ,Genome size ,Genetics (clinical) ,030304 developmental biology ,Comparative genomics ,0303 health sciences ,Ploidies ,Polymorphism, Genetic ,Contig ,de novo genome assembly ,Sequence Analysis, DNA ,genomic DNA ,lcsh:Genetics ,030104 developmental biology ,020602 bioinformatics ,Reference genome - Abstract
A high-quality reference genome is a fundamental resource for functional genetics, comparative genomics, and population genomics, and is increasingly important for conservation biology. PacBio Single Molecule, Real-Time (SMRT) sequencing generates long reads with uniform coverage and high consensus accuracy, making it a powerful technology for de novo genome assembly. Improvements in throughput and concomitant reductions in cost have made PacBio an attractive core technology for many large genome initiatives, however, relatively high DNA input requirements (~5 µ, g for standard library protocol) have placed PacBio out of reach for many projects on small organisms that have lower DNA content, or on projects with limited input DNA for other reasons. Here we present a high-quality de novo genome assembly from a single Anopheles coluzzii mosquito. A modified SMRTbell library construction protocol without DNA shearing and size selection was used to generate a SMRTbell library from just 100 ng of starting genomic DNA. The sample was run on the Sequel System with chemistry 3.0 and software v6.0, generating, on average, 25 Gb of sequence per SMRT Cell with 20 h movies, followed by diploid de novo genome assembly with FALCON-Unzip. The resulting curated assembly had high contiguity (contig N50 3.5 Mb) and completeness (more than 98% of conserved genes were present and full-length). In addition, this single-insect assembly now places 667 (>, 90%) of formerly unplaced genes into their appropriate chromosomal contexts in the AgamP4 PEST reference. We were also able to resolve maternal and paternal haplotypes for over 1/3 of the genome. By sequencing and assembling material from a single diploid individual, only two haplotypes were present, simplifying the assembly process compared to samples from multiple pooled individuals. The method presented here can be applied to samples with starting DNA amounts as low as 100 ng per 1 Gb genome size. This new low-input approach puts PacBio-based assemblies in reach for small highly heterozygous organisms that comprise much of the diversity of life.
- Published
- 2019
17. Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials
- Author
-
Justin M. Zook, Jennifer McDaniel, Hemang Parikh, Haynes Heaton, Sean A. Irvine, Len Trigg, Rebecca Truty, Cory Y. McLean, Francisco M. De La Vega, Chunlin Xiao, Stephen Sherry, and Marc Salit
- Subjects
0303 health sciences ,Computer science ,Genomics ,Single-nucleotide polymorphism ,Context (language use) ,Computational biology ,Genome ,Personal Genome Project ,03 medical and health sciences ,0302 clinical medicine ,030220 oncology & carcinogenesis ,Human genome ,International HapMap Project ,Indel ,030304 developmental biology ,Reference genome - Abstract
Benchmark small variant calls from the Genome in a Bottle Consortium (GIAB) for the CEPH/HapMap genome NA12878 (HG001) have been used extensively for developing, optimizing, and demonstrating performance of sequencing and bioinformatics methods. Here, we improve and simplify the methods we use to integrate multiple sequencing datasets, with the intention of deploying a reproducible cloud-based pipeline for application to arbitrary human genomes. We use these reproducible methods to form high-confidence calls with respect to GRCh37 and GRCh38 for HG001 and 4 additional broadly-consented genomes from the Personal Genome Project that are available as NIST Reference Materials. Our new methods produce 17% more SNPs and 176% more indels than our previously published calls for HG001. We also phase 99.5% of the variants in HG001 and call about 90% of the reference genome with high-confidence, increased from 78% previously. Our calls only contain 108 differences from the Illumina Platinum Genomes calls in GRCh37, only 14 of which are ambiguous or likely to be errors in our calls. By comparing several callsets to our new calls, our previously published calls, and Illumina Platinum Genomes calls, we highlight challenges in interpreting performance metrics when benchmarking against imperfect high-confidence calls. Our new calls address some of these challenges, but performance metrics should always be interpreted carefully. Benchmarking tools from the Global Alliance for Genomics and Health are useful for stratifying performance metrics by variant type and genome context to elucidate strengths and weaknesses of a method. We also explore differences between comparing to high-confidence calls for the 5 GIAB genomes, and show that performance metrics for one pipeline are largely similar but not identical when comparing to the 5 genomes. Finally, to explore applicability of our methods for genomes that have fewer datasets, we form high-confidence calls using only Illumina and 10x Genomics, and find that they have more high-confidence calls but have a higher error rate. These newly characterized genomes have a broad, open consent with few restrictions availability of samples and data, enabling a uniquely diverse array of applications.
- Published
- 2018
- Full Text
- View/download PDF
18. Resolving the full spectrum of human genome variation using Linked-Reads
- Author
-
Francesca Meschi, Indira Wu, David Stafford, Andrew Wei Xu, Heather Ordonez, Jill Herschleb, Esty Holt, Tony Makarewicz, Shazia Mahamdallie, Elise Ruark, Josh Delaney, Adam Lowe, Pranav Patel, Stephen R. Williams, Christopher Hindson, Sarah T. Garcia, Nikka Keivanfar, Alvaro Martinez Barrio, Ian T. Fiddes, Keith Bjornson, Sheila Seal, Preyas Shah, Ariel Royall, Claudia Catalanotti, Patrick Marks, Jamie L. Marshall, Daniel G. MacArthur, Rajiv Bharadwaj, Nazneen Rahman, Bill Kengli Lin, Sofia Kyriazopoulou-Panagiotopoulou, Susanna Jett, Adrian Fehr, Haynes Heaton, Christopher J. O'Keefe, Deanna M. Church, Andrew D. Price, Shamoni Maheshwari, Brendan Galvin, Cassandra B. Jabara, Kamila Belhocine, Monkol Lek, Michael Schnall-Levin, and Jorge Bernate
- Subjects
Population ,Method ,Computational biology ,Biology ,Genome ,Data type ,Cell Line ,03 medical and health sciences ,0302 clinical medicine ,Genetics ,Humans ,education ,Gene ,Genetics (clinical) ,030304 developmental biology ,Sequence (medicine) ,0303 health sciences ,education.field_of_study ,Polymorphism, Genetic ,Whole Genome Sequencing ,Genome, Human ,Haplotype ,Membrane Proteins ,Survival of Motor Neuron 1 Protein ,Survival of Motor Neuron 2 Protein ,Intercellular Signaling Peptides and Proteins ,Human genome ,030217 neurology & neurosurgery ,STRC ,Genome-Wide Association Study - Abstract
Large-scale population analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short-read whole-genome sequencing. However, these short-read approaches fail to give a complete picture of a genome. They struggle to identify structural events, cannot access repetitive regions, and fail to resolve the human genome into haplotypes. Here, we describe an approach that retains long range information while maintaining the advantages of short reads. Starting from ∼1 ng of high molecular weight DNA, we produce barcoded short-read libraries. Novel informatic approaches allow for the barcoded short reads to be associated with their original long molecules producing a novel data type known as “Linked-Reads”. This approach allows for simultaneous detection of small and large variants from a single library. In this manuscript, we show the advantages of Linked-Reads over standard short-read approaches for reference-based analysis. Linked-Reads allow mapping to 38 Mb of sequence not accessible to short reads, adding sequence in 423 difficult-to-sequence genes including disease-relevant genes STRC, SMN1, and SMN2. Both Linked-Read whole-genome and whole-exome sequencing identify complex structural variations, including balanced events and single exon deletions and duplications. Further, Linked-Reads extend the region of high-confidence calls by 68.9 Mb. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.
- Published
- 2018
19. Multi-platform discovery of haplotype-resolved structural variation in human genomes
- Author
-
Haynes Heaton
- Published
- 2018
- Full Text
- View/download PDF
20. Resolving the Full Spectrum of Human Genome Variation using Linked-Reads
- Author
-
Patrick Marks, Sarah Garcia, Alvaro Martinez Barrio, Kamila Belhocine, Jorge Bernate, Rajiv Bharadwaj, Keith Bjornson, Claudia Catalanotti, Josh Delaney, Adrian Fehr, Ian T. Fiddes, Brendan Galvin, Haynes Heaton, Jill Herschleb, Christopher Hindson, Esty Holt, Cassandra B. Jabara, Susanna Jett, Nikka Keivanfar, Sofia Kyriazopoulou-Panagiotopoulou, Monkol Lek, Bill Lin, Adam Lowe, Shazia Mahamdallie, Shamoni Maheshwari, Tony Makarewicz, Jamie Marshall, Francesca Meschi, Chris O’keefe, Heather Ordonez, Pranav Patel, Andrew Price, Ariel Royall, Elise Ruark, Sheila Seal, Michael Schnall-Levin, Preyas Shah, Stephen Williams, Indira Wu, Andrew Wei Xu, Nazneen Rahman, Daniel MacArthur, and Deanna M. Church
- Subjects
0303 health sciences ,Computer science ,Haplotype ,Sequence assembly ,Genomics ,Computational biology ,Genome ,03 medical and health sciences ,chemistry.chemical_compound ,Exon ,0302 clinical medicine ,chemistry ,Human genome ,Ploidy ,Gene ,030217 neurology & neurosurgery ,Exome sequencing ,DNA ,030304 developmental biology - Abstract
Large-scale population based analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short read whole genome sequencing. However, standard short-read approaches, used primarily due to accuracy, throughput and costs, fail to give a complete picture of a genome. They struggle to identify large, balanced structural events, cannot access repetitive regions of the genome and fail to resolve the human genome into its two haplotypes. Here we describe an approach that retains long range information while harnessing the advantages of short reads. Starting from only ∼1ng of DNA, we produce barcoded short read libraries. The use of novel informatic approaches allows for the barcoded short reads to be associated with the long molecules of origin producing a novel datatype known as ‘Linked-Reads’. This approach allows for simultaneous detection of small and large variants from a single Linked-Read library. We have previously demonstrated the utility of whole genome Linked-Reads (lrWGS) for performing diploid, de novo assembly of individual genomes (Weisenfeld et al. 2017). In this manuscript, we show the advantages of Linked-Reads over standard short read approaches for reference based analysis. We demonstrate the ability of Linked-Reads to reconstruct megabase scale haplotypes and to recover parts of the genome that are typically inaccessible to short reads, including phenotypically important genes such as STRC, SMN1 and SMN2. We demonstrate the ability of both lrWGS and Linked-Read Whole Exome Sequencing (lrWES) to identify complex structural variations, including balanced events, single exon deletions, and single exon duplications. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.
- Published
- 2017
- Full Text
- View/download PDF
21. Multi-platform discovery of haplotype-resolved structural variation in human genomes
- Author
-
Mark J.P. Chaisson, Ashley D. Sanders, Xuefang Zhao, Ankit Malhotra, David Porubsky, Tobias Rausch, Eugene J. Gardner, Oscar Rodriguez, Li Guo, Ryan L. Collins, Xian Fan, Jia Wen, Robert E. Handsaker, Susan Fairley, Zev N. Kronenberg, Xiangmeng Kong, Fereydoun Hormozdiari, Dillon Lee, Aaron M. Wenger, Alex Hastie, Danny Antaki, Peter Audano, Harrison Brand, Stuart Cantsilieris, Han Cao, Eliza Cerveira, Chong Chen, Xintong Chen, Chen-Shan Chin, Zechen Chong, Nelson T. Chuang, Christine C. Lambert, Deanna M. Church, Laura Clarke, Andrew Farrell, Joey Flores, Timur Galeev, David Gorkin, Madhusudan Gujral, Victor Guryev, William Haynes Heaton, Jonas Korlach, Sushant Kumar, Jee Young Kwon, Jong Eun Lee, Joyce Lee, Wan-Ping Lee, Sau Peng Lee, Shantao Li, Patrick Marks, Karine Viaud-Martinez, Sascha Meiers, Katherine M. Munson, Fabio Navarro, Bradley J. Nelson, Conor Nodzak, Amina Noor, Sofia Kyriazopoulou-Panagiotopoulou, Andy Pang, Yunjiang Qiu, Gabriel Rosanio, Mallory Ryan, Adrian Stütz, Diana C.J. Spierings, Alistair Ward, AnneMarie E. Welch, Ming Xiao, Wei Xu, Chengsheng Zhang, Qihui Zhu, Xiangqun Zheng-Bradley, Ernesto Lowy, Sergei Yakneen, Steven McCarroll, Goo Jun, Li Ding, Chong Lek Koh, Bing Ren, Paul Flicek, Ken Chen, Mark B. Gerstein, Pui-Yan Kwok, Peter M. Lansdorp, Gabor Marth, Jonathan Sebat, Xinghua Shi, Ali Bashir, Kai Ye, Scott E. Devine, Michael Talkowski, Ryan E. Mills, Tobias Marschall, Jan O. Korbel, Evan E. Eichler, and Charles Lee
- Subjects
Genetics ,0303 health sciences ,Genomics ,Human genetic variation ,Computational biology ,Biology ,Genome ,DNA sequencing ,Structural variation ,03 medical and health sciences ,0302 clinical medicine ,Human genome ,1000 Genomes Project ,Indel ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, and strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent–child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (
- Published
- 2017
- Full Text
- View/download PDF
22. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing
- Author
-
Rajiv Bharadwaj, Hanlee P. Ji, Serge Saxonov, Alex Kindwall, Melissa Luo, Patrick Marks, Clara Bermejo, Landon Merrill, Francesca Meschi, Jessica M. Terry, Adrian Fehr, John Bell, Gerard M Vurens, Kristina Giorda, Adam Lowe, Heather Ordonez, Michael Schnall-Levin, Jorge Bernate, Josephine Y Lee, Phillip Belgrader, Glenn K. Lockwood, Steven W Short, Sukhvinder Kaur, Shawn Gauby, Lawrence Greenfield, Geoffrey P. McDermott, Stephanie Greer, Pranav Patel, Andrew D. Price, Benjamin J. Hindson, Nikola O Kondov, Grace X.Y. Zheng, Sofia Kyriazopoulou-Panagiotopoulou, David E Birch, Luz Montesclaros, Alexander Wong, Kamila Belhocine, Susan M. Grimes, Ryan Wilson, Donald A. Masquelier, Patrice A Mudivarti, Kevin D. Ness, Mirna Jarosz, Adrian Chan, Indira Wu, Erik S. Hopmans, Paul Wyatt, David Stafford, Paul Hardenbol, Anthony J. Makarewicz, Joshua Delaney, Yuan Li, Zachary Bent, Christopher Hindson, Christina Wood, Keith Bjornson, Billy T. Lau, and William Haynes Heaton
- Subjects
0301 basic medicine ,Cancer genome sequencing ,Oncogene Proteins, Fusion ,Biomedical Engineering ,Bioengineering ,Genomics ,Biology ,Applied Microbiology and Biotechnology ,Polymorphism, Single Nucleotide ,DNA sequencing ,03 medical and health sciences ,Neoplasms ,Humans ,Polymorphism ,Fusion ,Exome sequencing ,Whole genome sequencing ,Genetics ,Oncogene Proteins ,Genome ,Shotgun sequencing ,Genome, Human ,High-Throughput Nucleotide Sequencing ,Sequence Analysis, DNA ,DNA ,Single Nucleotide ,3. Good health ,030104 developmental biology ,Germ Cells ,Haplotypes ,Genomic Structural Variation ,Molecular Medicine ,Nucleic Acid Conformation ,Human genome ,Sequence Analysis ,Biotechnology ,Personal genomics ,Human - Abstract
Haplotyping of human chromosomes is a prerequisite for cataloguing the full repertoire of genetic variation. We present a microfluidics-based, linked-read sequencing technology that can phase and haplotype germline and cancer genomes using nanograms of input DNA. This high-throughput platform prepares barcoded libraries for short-read sequencing and computationally reconstructs long-range haplotype and structural variant information. We generate haplotype blocks in a nuclear trio that are concordant with expected inheritance patterns and phase a set of structural variants. We also resolve the structure of the EML4-ALK gene fusion in the NCI-H2228 cancer cell line using phased exome sequencing. Finally, we assign genetic aberrations to specific megabase-scale haplotypes generated from whole-genome sequencing of a primary colorectal adenocarcinoma. This approach resolves haplotype information using up to 100 times less genomic DNA than some methods and enables the accurate detection of structural variants.
- Published
- 2016
23. Abstract 3602: Linked-Reads enable detailed, phased resolution of structural variation in the cancer genome
- Author
-
Hanlee P. Ji, Cassandra B. Jabara, Patrick Marks, Heather Ordonez, Kristina Giorda, Michael Schnall-Levin, Billy T. Lau, John Bell, Haynes Heaton, and Sofia Kyriazopoulou-Panagiotopoulou
- Subjects
Genetics ,Cancer Research ,Computational biology ,Biology ,Barcode ,law.invention ,Structural variation ,chemistry.chemical_compound ,genomic DNA ,Oncology ,chemistry ,law ,International HapMap Project ,Indel ,Gene ,DNA ,Segmental duplication - Abstract
Studies have shown that somatic structural variation (SV) plays a key role in the oncogenic process. Traditionally SVs in the cancer genome have been detected using low resolution cytogenetic approaches, such as FISH, or microarray-based techniques. More recently, next-generation sequencing (NGS)-based technologies have been employed to detect SVs, including indels and translocations. However, both short- and long-read NGS-based approaches are limited in their ability to accurately identify SV events and delineate their breakpoints due to the limitations inherent in assembly of billions of short-read sequences across a heterogeneous cancer sample, as well as the costly and burdensome laboratory infrastructure associated with long-read sequencers. We utilized a novel technology that combines microfluidics and molecular barcoding to generate libraries that are sequenced with an Illumina system. Open-source bioinformatics software produces linked-reads that maintain long-range information and single molecule sensitivity. Cell lines and cancer samples were obtained from commercial sources, and genomic DNA was extracted. DNA sample indexing and partitioning was performed using the 10X Genomicx GemCode instrument. One ng of sample DNA was used as input for each reaction, and DNA molecules were partitioned into droplets to fragment the DNA and introduce molecular barcodes. Following barcoding, droplets were fractured, and library DNA was purified and sequenced on Illumina sequencers. The GemCode Long Ranger software suite was used to map sequencing reads back to original long molecules of DNA, generating reads linked to partition barcodes. Thus we can generate phased sequences covering many 10's to 100's of kilobases. We first benchmarked the ability to call multiple SV types using a well-characterized germline HapMap sample (NA12878) as well as two recently characterized haploid hydatidiform moles (CHM1 and CHM13) that have been studied with multiple orthogonal technologies. Regions with evidence for structural variation were reassembled into distinct haplotypes. The barcode information allowed us to both phase the structural variants we detected and disambiguate calls within highly repetitive regions, such as segmental duplications. We demonstrated high concordance with alternative approaches across all major classes of SVs, including long insertions and deletions as well as copy-neutral events. In cancer cell lines, we detected well-annotated gene fusions, such as the EML4/ALK and ALK/PTPN3 fusions in the lung cancer cell line NCI-H2228, and the SLC26A/PRKAR2A fusion in the triple negative breast cancer cell line HCC38. Citation Format: Sofia Kyriazopoulou-Panagiotopoulou, Patrick Marks, Haynes Heaton, Heather Ordonez, Kristina Giorda, Cassandra Jabara, Billy Lau, John M. Bell, Michael Schnall-Levin, Hanlee P. Ji. Linked-Reads enable detailed, phased resolution of structural variation in the cancer genome. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 3602.
- Published
- 2016
- Full Text
- View/download PDF
24. Molecular Cytogenetics Using Linked-Reads
- Author
-
Sofia Kyriazopoulou-Panagiotopoulou, Andrew Wei Xu, Deanna M. Church, Mark Pratt, Kristina Giorda, Ryan L. Collins, Paul Hardenbol, Michael E. Talkowski, Patrick Marks, Cassandra B. Jabara, Adrian Fehr, Heather Ordonez, Michael Schnall-Levin, and Haynes Heaton
- Subjects
Molecular cytogenetics ,Cancer Research ,Genetics ,Computational biology ,Biology ,Molecular Biology - Published
- 2016
- Full Text
- View/download PDF
25. Multi-platform discovery of haplotype-resolved structural variation in human genomes
- Author
-
Mark J. P. Chaisson, Ashley D. Sanders, Xuefang Zhao, Ankit Malhotra, David Porubsky, Tobias Rausch, Eugene J. Gardner, Oscar L. Rodriguez, Li Guo, Ryan L. Collins, Xian Fan, Jia Wen, Robert E. Handsaker, Susan Fairley, Zev N. Kronenberg, Xiangmeng Kong, Fereydoun Hormozdiari, Dillon Lee, Aaron M. Wenger, Alex R. Hastie, Danny Antaki, Thomas Anantharaman, Peter A. Audano, Harrison Brand, Stuart Cantsilieris, Han Cao, Eliza Cerveira, Chong Chen, Xintong Chen, Chen-Shan Chin, Zechen Chong, Nelson T. Chuang, Christine C. Lambert, Deanna M. Church, Laura Clarke, Andrew Farrell, Joey Flores, Timur Galeev, David U. Gorkin, Madhusudan Gujral, Victor Guryev, William Haynes Heaton, Jonas Korlach, Sushant Kumar, Jee Young Kwon, Ernest T. Lam, Jong Eun Lee, Joyce Lee, Wan-Ping Lee, Sau Peng Lee, Shantao Li, Patrick Marks, Karine Viaud-Martinez, Sascha Meiers, Katherine M. Munson, Fabio C. P. Navarro, Bradley J. Nelson, Conor Nodzak, Amina Noor, Sofia Kyriazopoulou-Panagiotopoulou, Andy W. C. Pang, Yunjiang Qiu, Gabriel Rosanio, Mallory Ryan, Adrian Stütz, Diana C. J. Spierings, Alistair Ward, AnneMarie E. Welch, Ming Xiao, Wei Xu, Chengsheng Zhang, Qihui Zhu, Xiangqun Zheng-Bradley, Ernesto Lowy, Sergei Yakneen, Steven McCarroll, Goo Jun, Li Ding, Chong Lek Koh, Bing Ren, Paul Flicek, Ken Chen, Mark B. Gerstein, Pui-Yan Kwok, Peter M. Lansdorp, Gabor T. Marth, Jonathan Sebat, Xinghua Shi, Ali Bashir, Kai Ye, Scott E. Devine, Michael E. Talkowski, Ryan E. Mills, Tobias Marschall, Jan O. Korbel, Evan E. Eichler, and Charles Lee
- Subjects
Science - Abstract
Structural variants (SVs) in human genomes contribute diversity and diseases. Here, the authors use a multi-platform strategy to generate haplotype-resolved SVs for three human parent–child trios.
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.