43 results on '"Primo Baybayan"'
Search Results
2. Approaches to long-read sequencing in a clinical setting to improve diagnostic rate
- Author
-
Erica Sanford Kobayashi, Serge Batalov, Aaron M. Wenger, Christine Lambert, Harsharan Dhillon, Richard J. Hall, Primo Baybayan, Yan Ding, Seema Rego, Kristen Wigby, Jennifer Friedman, Charlotte Hobbs, and Matthew N. Bainbridge
- Subjects
Medicine ,Science - Abstract
Abstract Over the past decade, advances in genetic testing, particularly the advent of next-generation sequencing, have led to a paradigm shift in the diagnosis of molecular diseases and disorders. Despite our present collective ability to interrogate more than 90% of the human genome, portions of the genome have eluded us, resulting in stagnation of diagnostic yield with existing methodologies. Here we show how application of a new technology, long-read sequencing, has the potential to improve molecular diagnostic rates. Whole genome sequencing by long reads was able to cover 98% of next-generation sequencing dead zones, which are areas of the genome that are not interpretable by conventional industry-standard short-read sequencing. Through the ability of long-read sequencing to unambiguously call variants in these regions, we discovered an immunodeficiency due to a variant in IKBKG in a subject who had previously received a negative genome sequencing result. Additionally, we demonstrate the ability of long-read sequencing to detect small variants on par with short-read sequencing, its superior performance in identifying structural variants, and thirdly, its capacity to determine genomic methylation defects in native DNA. Though the latter technical abilities have been demonstrated, we demonstrate the clinical application of this technology to successfully identify multiple types of variants using a single test.
- Published
- 2022
- Full Text
- View/download PDF
3. P472: Highly scalable pharmacogenomic panel testing with hybrid capture and long-read sequencing
- Author
-
Nina Gonzaludo, Sarah Kingan, John Harting, Primo Baybayan, Siyuan Zhang, Tina Han, Leonardo Arbiza, Yao Yang, Nathan Hammond, and Stuart Scott
- Subjects
Genetics ,QH426-470 ,Medicine - Published
- 2023
- Full Text
- View/download PDF
4. P592: HiFi reads provide accurate detection of variants and DNA methylation in challenging regions of the genome
- Author
-
Greg Young, Aaron Wenger, Matthew Boitano, Armin Toepfer, Christine Lambert, Primo Baybayan, and Emilia Mollova
- Subjects
Genetics ,QH426-470 ,Medicine - Published
- 2023
- Full Text
- View/download PDF
5. Structural variation and its potential impact on genome instability: Novel discoveries in the EGFR landscape by long-read sequencing.
- Author
-
George W Cook, Michael G Benton, Wallace Akerley, George F Mayhew, Cynthia Moehlenkamp, Denise Raterman, Daniel L Burgess, William J Rowell, Christine Lambert, Kevin Eng, Jenny Gu, Primo Baybayan, John T Fussell, Heath D Herbold, John M O'Shea, Thomas K Varghese, and Lyska L Emerson
- Subjects
Medicine ,Science - Abstract
Structural variation (SV) is typically defined as variation within the human genome that exceeds 50 base pairs (bp). SV may be copy number neutral or it may involve duplications, deletions, and complex rearrangements. Recent studies have shown SV to be associated with many human diseases. However, studies of SV have been challenging due to technological constraints. With the advent of third generation (long-read) sequencing technology, exploration of longer stretches of DNA not easily examined previously has been made possible. In the present study, we utilized third generation (long-read) sequencing techniques to examine SV in the EGFR landscape of four haplotypes derived from two human samples. We analyzed the EGFR gene and its landscape (+/- 500,000 base pairs) using this approach and were able to identify a region of non-coding DNA with over 90% similarity to the most common activating EGFR mutation in non-small cell lung cancer. Based on previously published Alu-element genome instability algorithms, we propose a molecular mechanism to explain how this non-coding region of DNA may be interacting with and impacting the stability of the EGFR gene and potentially generating this cancer-driver gene. By these techniques, we were also able to identify previously hidden structural variation in the four haplotypes and in the human reference genome (hg38). We applied previously published algorithms to compare the relative stabilities of these five different EGFR gene landscape haplotypes to estimate their relative potentials to generate the EGFR exon 19, 15 bp canonical deletion. To our knowledge, the present study is the first to use the differences in genomic architecture between targeted cancer-linked phased haplotypes to estimate their relative potentials to form a common cancer-linked driver mutation.
- Published
- 2020
- Full Text
- View/download PDF
6. Comparative Genome Analysis of an Extensively Drug-Resistant Isolate of Avian Sequence Type 167 Escherichia coli Strain Sanji with Novel In Silico Serotype O89b:H9
- Author
-
Xiancheng Zeng, Xuelin Chi, Brian T. Ho, Damee Moon, Christine Lambert, Richard J. Hall, Primo Baybayan, Shihua Wang, Brenda A. Wilson, and Mengfei Ho
- Subjects
O-antigen ,antibiotic resistance ,capsular polysaccharide ,extensively drug resistant ,genome comparison ,insertion sequence ,Microbiology ,QR1-502 - Abstract
ABSTRACT Extensive drug resistance (XDR) is an escalating global problem. Escherichia coli strain Sanji was isolated from an outbreak of pheasant colibacillosis in Fujian province, China, in 2011. This strain has XDR properties, exhibiting sensitivity to carbapenems but no other classes of known antibiotics. Whole-genome sequencing revealed a total of 32 known antibiotic resistance genes, many associated with insertion sequence 26 (IS26) elements. These were found on the Sanji chromosome and 2 of its 6 plasmids, pSJ_255 and pSJ_82. The Sanji chromosome also harbors a type 2 secretion system (T2SS), a type 3 secretion system (T3SS), a type 6 secretion system (T6SS), and several putative prophages. Sanji and other ST167 strains have a previously uncharacterized O-antigen (O89b) that is most closely related to serotype O89 as determined on the basis of analysis of the wzm-wzt genes and in silico serotyping. This O89b-antigen gene cluster was also found in the genomes of a few other pathogenic sequence type 617 (ST617) and ST10 complex strains. A time-scaled phylogeny inferred from comparative single nucleotide variant analysis indicated that development of these O89b-containing lineages emerged about 30 years ago. Comparative sequence analysis revealed that the core genome of Sanji is nearly identical to that of several recently sequenced strains of pathogenic XDR E. coli belonging to the ST167 group. Comparison of the mobile elements among the different ST167 genomes revealed that each genome carries a distinct set of multidrug resistance genes on different types of plasmids, indicating that there are multiple paths toward the emergence of XDR in E. coli. IMPORTANCE E. coli strain Sanji is the first sequenced and analyzed genome of the recently emerged pathogenic XDR strains with sequence type ST167 and novel in silico serotype O89b:H9. Comparison of the genomes of Sanji with other ST167 strains revealed distinct sets of different plasmids, mobile IS elements, and antibiotic resistance genes in each genome, indicating that there exist multiple paths toward achieving XDR. The emergence of these pathogenic ST167 E. coli strains with diverse XDR capabilities highlights the difficulty of preventing or mitigating the development of XDR properties in bacteria and points to the importance of better understanding of the shared underlying virulence mechanisms and physiology of pathogenic bacteria. Author Video: An author video summary of this article is available.
- Published
- 2019
- Full Text
- View/download PDF
7. Reference Grade Characterization of Polymorphisms in Full-Length HLA Class I and II Genes With Short-Read Sequencing on the ION PGM System and Long-Reads Generated by Single Molecule, Real-Time Sequencing on the PacBio Platform
- Author
-
Shingo Suzuki, Swati Ranade, Ken Osaki, Sayaka Ito, Atsuko Shigenari, Yuko Ohnuki, Akira Oka, Anri Masuya, John Harting, Primo Baybayan, Miwako Kitazume, Junichi Sunaga, Satoko Morishima, Yasuo Morishima, Hidetoshi Inoko, Jerzy K. Kulski, and Takashi Shiina
- Subjects
human leukocyte antigen ,HLA ,next-generation sequencing ,NGS ,SMRT sequencing ,genotyping ,Immunologic diseases. Allergy ,RC581-607 - Abstract
Although NGS technologies fuel advances in high-throughput HLA genotyping methods for identification and classification of HLA genes to assist with precision medicine efforts in disease and transplantation, the efficiency of these methods are impeded by the absence of adequately-characterized high-frequency HLA allele reference sequence databases for the highly polymorphic HLA gene system. Here, we report on producing a comprehensive collection of full-length HLA allele sequences for eight classical HLA loci found in the Japanese population. We augmented the second-generation short read data generated by the Ion Torrent technology with long amplicon spanning consensus reads delivered by the third-generation SMRT sequencing method to create reference grade high-quality sequences of HLA class I and II gene alleles resolved at the genomic coding and non-coding level. Forty-six DNAs were obtained from a reference set used previously to establish the HLA allele frequency data in Japanese subjects. The samples included alleles with a collective allele frequency in the Japanese population of more than 99.2%. The HLA loci were independently amplified by long-range PCR using previously designed HLA-locus specific primers and subsequently sequenced using SMRT and Ion PGM sequencers. The mapped long and short-reads were used to produce a reference library of consensus HLA allelic sequences with the help of the reference-aware software tool LAA for SMRT Sequencing. A total of 253 distinct alleles were determined for 46 healthy subjects. Of them, 137 were novel alleles: 101 SNVs and/or indels and 36 extended alleles at a partial or full-length level. Comparing the HLA sequences from the perspective of nucleotide diversity revealed that HLA-DRB1 was the most divergent among the eight HLA genes, and that the HLA-DPB1 gene sequences diverged into two distinct groups, DP2 and DP5, with evidence of independent polymorphisms generated in exon 2. We also identified two specific intronic variations in HLA-DRB1 that might be involved in rheumatoid arthritis. In conclusion, full-length HLA allele sequencing by third-generation and second-generation technologies has provided polymorphic gene reference sequences at a genomic allelic resolution including allelic variations assigned up to the field-4 level for a stronger foundation in precision medicine and HLA-related disease and transplantation studies.
- Published
- 2018
- Full Text
- View/download PDF
8. Long‐read HiFi sequencing of NUDT15 : Phased full‐gene haplotyping and pharmacogenomic allele discovery
- Author
-
Erick R. Scott, Yao Yang, Mariana R. Botton, Yoshinori Seki, Keito Hoshitsuki, John Harting, Primo Baybayan, Neal Cody, Paola Nicoletti, Takaya Moriyama, Shreyasee Chakraborty, Jun J. Yang, Lisa Edelmann, Eric E. Schadt, Jonas Korlach, and Stuart A. Scott
- Subjects
Genotype ,Haplotypes ,Pharmacogenetics ,Genetics ,Humans ,Sequence Analysis, DNA ,Alleles ,Genetics (clinical) - Abstract
To determine the phase of NUDT15 sequence variants for more comprehensive star (*) allele diplotyping, we developed a novel long-read single-molecule real-time HiFi amplicon sequencing method. A 10.5 kb NUDT15 amplicon assay was validated using reference material positive controls and additional samples for specimen type and blinded accuracy assessment. Triplicate NUDT15 HiFi sequencing of two reference material samples had nonreference genotype concordances of99.9%, indicating that the assay is robust. Notably, short-read genome sequencing of a subset of samples was unable to determine the phase of star (*) allele-defining NUDT15 variants, resulting in ambiguous diplotype results. In contrast, long-read HiFi sequencing phased all variants across the NUDT15 amplicons, including a *2/*9 diplotype that previously was characterized as *1/*2 in the 1000 Genomes Project v3 data set. Assay throughput was also tested using 8.5 kb amplicons from 100 Ashkenazi Jewish individuals, which identified a novel NUDT15 *1 suballele (c.-121GA) and a rare likely deleterious coding variant (p.Pro129Arg). Both novel alleles were Sanger confirmed and assigned as *1.007 and *20, respectively, by the PharmVar Consortium. Taken together, NUDT15 HiFi amplicon sequencing is an innovative method for phased full-gene characterization and novel allele discovery, which could improve NUDT15 pharmacogenomic testing and subsequent phenotype prediction.
- Published
- 2022
- Full Text
- View/download PDF
9. Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes
- Author
-
Ana S.A. Cohen, Emily G. Farrow, Ahmed T. Abdelmoity, Joseph T. Alaimo, Shivarajan M. Amudhavalli, John T. Anderson, Lalit Bansal, Lauren Bartik, Primo Baybayan, Bradley Belden, Courtney D. Berrios, Rebecca L. Biswell, Pawel Buczkowicz, Orion Buske, Shreyasee Chakraborty, Warren A. Cheung, Keith A. Coffman, Ashley M. Cooper, Laura A. Cross, Tom Curran, Thuy Tien T. Dang, Mary M. Elfrink, Kendra L. Engleman, Erin D. Fecske, Cynthia Fieser, Keely Fitzgerald, Emily A. Fleming, Randi N. Gadea, Jennifer L. Gannon, Rose N. Gelineau-Morel, Margaret Gibson, Jeffrey Goldstein, Elin Grundberg, Kelsee Halpin, Brian S. Harvey, Bryce A. Heese, Wendy Hein, Suzanne M. Herd, Susan S. Hughes, Mohammed Ilyas, Jill Jacobson, Janda L. Jenkins, Shao Jiang, Jeffrey J. Johnston, Kathryn Keeler, Jonas Korlach, Jennifer Kussmann, Christine Lambert, Caitlin Lawson, Jean-Baptiste Le Pichon, James Steven Leeder, Vicki C. Little, Daniel A. Louiselle, Michael Lypka, Brittany D. McDonald, Neil Miller, Ann Modrcin, Annapoorna Nair, Shelby H. Neal, Christopher M. Oermann, Donna M. Pacicca, Kailash Pawar, Nyshele L. Posey, Nigel Price, Laura M.B. Puckett, Julio F. Quezada, Nikita Raje, William J. Rowell, Eric T. Rush, Venkatesh Sampath, Carol J. Saunders, Caitlin Schwager, Richard M. Schwend, Elizabeth Shaffer, Craig Smail, Sarah Soden, Meghan E. Strenk, Bonnie R. Sullivan, Brooke R. Sweeney, Jade B. Tam-Williams, Adam M. Walter, Holly Welsh, Aaron M. Wenger, Laurel K. Willig, Yun Yan, Scott T. Younger, Dihong Zhou, Tricia N. Zion, Isabelle Thiffault, and Tomi Pastinen
- Subjects
Genome ,Rare Diseases ,High-Throughput Nucleotide Sequencing ,Humans ,Genomics ,Sequence Analysis, DNA ,Child ,Genetics (clinical) ,Pedigree - Abstract
This study aimed to provide comprehensive diagnostic and candidate analyses in a pediatric rare disease cohort through the Genomic Answers for Kids program.Extensive analyses of 960 families with suspected genetic disorders included short-read exome sequencing and short-read genome sequencing (srGS); PacBio HiFi long-read genome sequencing (HiFi-GS); variant calling for single nucleotide variants (SNV), structural variant (SV), and repeat variants; and machine-learning variant prioritization. Structured phenotypes, prioritized variants, and pedigrees were stored in PhenoTips database, with data sharing through controlled access the database of Genotypes and Phenotypes.Diagnostic rates ranged from 11% in patients with prior negative genetic testing to 34.5% in naive patients. Incorporating SVs from genome sequencing added up to 13% of new diagnoses in previously unsolved cases. HiFi-GS yielded increased discovery rate with4-fold more rare coding SVs compared with srGS. Variants and genes of unknown significance remain the most common finding (58% of nondiagnostic cases).Computational prioritization is efficient for diagnostic SNVs. Thorough identification of non-SNVs remains challenging and is partly mitigated using HiFi-GS sequencing. Importantly, community research is supported by sharing real-time data to accelerate gene validation and by providing HiFi variant (SNV/SV) resources from1000 human alleles to facilitate implementation of new sequencing platforms for rare disease diagnoses.
- Published
- 2022
- Full Text
- View/download PDF
10. Long-read trio sequencing of individuals with unsolved intellectual disability
- Author
-
Erdi Kucuk, Shreyasee Chakraborty, Marcel R. Nelen, Han G. Brunner, Lisenka E.L.M. Vissers, Primo Baybayan, Michael Kwint, Bart van der Sanden, Alexander Hoischen, Ronny Derks, Marc Pauper, Aaron M. Wenger, Christian Gilissen, MUMC+: DA Klinische Genetica (5), Klinische Genetica, and RS: GROW - R4 - Reproductive and Perinatal Medicine
- Subjects
Proband ,Concordance ,lnfectious Diseases and Global Health Radboud Institute for Molecular Life Sciences [Radboudumc 4] ,DE-NOVO ,Biology ,VARIANTS ,Genome ,Article ,03 medical and health sciences ,symbols.namesake ,0302 clinical medicine ,Genetics ,Coding region ,HUMAN GENOME ,DNA sequencing ,Gene ,Genetics (clinical) ,Exome sequencing ,030304 developmental biology ,UTILITY ,0303 health sciences ,Neurodevelopmental disorders Donders Center for Medical Neuroscience [Radboudumc 7] ,Other Research Radboud Institute for Health Sciences [Radboudumc 0] ,Metabolic Disorders Radboud Institute for Molecular Life Sciences [Radboudumc 6] ,DISCOVERY ,Mendelian inheritance ,symbols ,Structural variation ,030217 neurology & neurosurgery ,Reference genome - Abstract
Contains fulltext : 235027.pdf (Publisher’s version ) (Open Access) Long-read sequencing (LRS) has the potential to comprehensively identify all medically relevant genome variation, including variation commonly missed by short-read sequencing (SRS) approaches. To determine this potential, we performed LRS around 15×-40× genome coverage using the Pacific Biosciences Sequel I System for five trios. The respective probands were diagnosed with intellectual disability (ID) whose etiology remained unresolved after SRS exomes and genomes. Systematic assessment of LRS coverage showed that ~35 Mb of the human reference genome was only accessible by LRS and not SRS. Genome-wide structural variant (SV) calling yielded on average 28,292 SV calls per individual, totaling 12.9 Mb of sequence. Trio-based analyses which allowed to study segregation, showed concordance for up to 95% of these SV calls across the genome, and 80% of the LRS SV calls were not identified by SRS. De novo mutation analysis did not identify any de novo SVs, confirming that these are rare events. Because of high sequence coverage, we were also able to call single nucleotide substitutions. On average, we identified 3 million substitutions per genome, with a Mendelian inheritance concordance of up to 97%. Of these, ~100,000 were located in the ~35 Mb of the genome that was only captured by LRS. Moreover, these variants affected the coding sequence of 64 genes, including 32 known Mendelian disease genes. Our data show the potential added value of LRS compared to SRS for identifying medically relevant genome variation.
- Published
- 2020
11. IGenomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes
- Author
-
Jill Jacobson, Keith A Coffman, Susan S Hughes, Caitlin Lawson, Erin D Fecske, Ahmed T Abdelmoity, Thuy Tien T Dang, Annapoorna Nair, Janda L Jenkins, Kendra L Engleman, Daniel A Louiselle, Orion Buske, Nigel Price, Dihong Zhou, Michael Lypka, Courtney D Berrios, Laura Mb Puckett, Kelsee Halpin, Ana Sa Cohen, Nikita Raje, Aaron M Wenger, Emily G Farrow, Keely Fitzgerald, Mohammed Ilyas, Kailash Pawar, Joseph T Alaimo, Jennifer L Gannon, Laurel K Willig, Jean-Baptiste Le Pichon, Shivarajan M Amudhavalli, Christopher M Oermann, Rebecca L Biswell, Shelby H Neal, Lalit Bansal, Elizabeth Shaffer, Brittany D McDonald, Bonnie R Sullivan, Isabelle Thiffault, Christine Lambert, Ashley M Cooper, Suzanne M Herd, Holly Welsh, Julio F Quezada, Carol J Saunders, Caitlin Schwager, Brian S Harvey, Adam M Walter, Donna M Pacicca, Jennifer Kussmann, Rose N Gelineau-Morel, Margaret Gibson, Elin Grundberg, Shao Jiang, Scott T Younger, Steve Leeder, Richard M Schwend, John T Anderson, Venkatesh Sampath, Jonas Korlach, Bryce A Heese, Meghan E Strenk, Neil Miller, Vicki C Little, Ann Modrcin, Brooke R Sweeney, Randi N Gadea, Nyshele L Posey, Emily A Fleming, Wendy Hein, Cynthia Fieser, Eric T Rush, Laura A Cross, Craig Smail, William J Rowell, Kathryn Keeler, Jeffrey Goldstein, Tricia N Zion, Warren A. Cheung, Sarah Soden, Lauren Bartik, Bradley Belden, Thomas Curran, Pawel Buczkowicz, Shreyasee Chakraborty, Yun Yan, Tomi Pastinen, Primo Baybayan, Mary M Elfrink, Jeffrey J Johnston, and Jade B Tam-Williams
- Subjects
Unknown Significance ,Pedigree chart ,Computational biology ,Allele ,Biology ,Gene ,Genome ,Exome ,DNA sequencing ,Rare disease - Abstract
PURPOSETo provide comprehensive diagnostic and candidate analyses in a pediatric rare disease cohort through the Genomic Answers for Kids (GA4K) program.METHODSExtensive analyses of 960 families with suspected genetic disorders including short-read exome (ES) and genome sequencing (srGS); PacBio HiFi long-read GS (HiFi-GS); variant calling for small-nucleotide (SNV), structural (SV) and repeat variants; and machine-learning variant prioritization. Structured phenotypes, prioritized variants and pedigrees are stored in PhenoTips database, with data sharing through controlled access (dbGAP).RESULTSDiagnostic rates ranged from 11% for cases with prior negative genetic tests to 34.5% in naïve patients. Incorporating SVs from GS added up to 13% of new diagnoses in previously unsolved cases. HiFi-GS yielded increased discovery rate with >4-fold more rare coding SVs than srGS. Variants and genes of unknown significance (VUS/GUS) remain the most common finding (58% of non-diagnostic cases).CONCLUSIONComputational prioritization is efficient for diagnostic SNVs. Thorough identification of non-SNVs remains challenging and is partly mitigated by HiFi-GS sequencing. Importantly, community research is supported by sharing real-time data to accelerate gene validation, and by providing HiFi variant (SNV/SV) resources from >1,000 human alleles to facilitate implementation of new sequencing platforms for rare disease diagnoses.
- Published
- 2021
- Full Text
- View/download PDF
12. Correction: Long-read trio sequencing of individuals with unsolved intellectual disability
- Author
-
Aaron M. Wenger, Christian Gilissen, Erdi Kucuk, Shreyasee Chakraborty, Bart van der Sanden, Lisenka E.L.M. Vissers, Michael Kwint, Alexander Hoischen, Ronny Derks, Han G. Brunner, Primo Baybayan, Marc Pauper, and Marcel R. Nelen
- Subjects
medicine.medical_specialty ,Polymorphism, Genetic ,Published Erratum ,MEDLINE ,Correction ,Sequence Analysis, DNA ,medicine.disease ,Pedigree ,Intellectual Disability ,Intellectual disability ,Mutation ,Genetics ,medicine ,Humans ,Genetic Testing ,Psychology ,Psychiatry ,Genetics (clinical) - Abstract
Long-read sequencing (LRS) has the potential to comprehensively identify all medically relevant genome variation, including variation commonly missed by short-read sequencing (SRS) approaches. To determine this potential, we performed LRS around 15×-40× genome coverage using the Pacific Biosciences Sequel I System for five trios. The respective probands were diagnosed with intellectual disability (ID) whose etiology remained unresolved after SRS exomes and genomes. Systematic assessment of LRS coverage showed that ~35 Mb of the human reference genome was only accessible by LRS and not SRS. Genome-wide structural variant (SV) calling yielded on average 28,292 SV calls per individual, totaling 12.9 Mb of sequence. Trio-based analyses which allowed to study segregation, showed concordance for up to 95% of these SV calls across the genome, and 80% of the LRS SV calls were not identified by SRS. De novo mutation analysis did not identify any de novo SVs, confirming that these are rare events. Because of high sequence coverage, we were also able to call single nucleotide substitutions. On average, we identified 3 million substitutions per genome, with a Mendelian inheritance concordance of up to 97%. Of these, ~100,000 were located in the ~35 Mb of the genome that was only captured by LRS. Moreover, these variants affected the coding sequence of 64 genes, including 32 known Mendelian disease genes. Our data show the potential added value of LRS compared to SRS for identifying medically relevant genome variation.
- Published
- 2021
13. Variant phasing and haplotypic expression from long-read sequencing in maize
- Author
-
Peter Van Buren, Bo Wang, Michael Regulski, Elizabeth Tseng, Liya Wang, Kevin Eng, Andrew Olson, Yinping Jiao, Primo Baybayan, Doreen Ware, and Kapeel Chougule
- Subjects
Sequence analysis ,Population ,Medicine (miscellaneous) ,Biology ,Genes, Plant ,Zea mays ,Genetic analysis ,Genome ,Article ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,0302 clinical medicine ,Inbred strain ,Gene Expression Regulation, Plant ,RNA, Messenger ,education ,lcsh:QH301-705.5 ,Data mining ,Gene ,Alleles ,Plant Proteins ,030304 developmental biology ,Genetics ,0303 health sciences ,education.field_of_study ,Sequence Analysis, RNA ,Gene Expression Profiling ,Haplotype ,Plants, Genetically Modified ,Endosperm ,Natural variation in plants ,lcsh:Biology (General) ,Haplotypes ,Mutation ,General Agricultural and Biological Sciences ,Genomic imprinting ,Genome, Plant ,030217 neurology & neurosurgery - Abstract
Haplotype phasing maize genetic variants is important for genome interpretation, population genetic analysis and functional analysis of allelic activity. We performed an isoform-level phasing study using two maize inbred lines and their reciprocal crosses, based on single-molecule, full-length cDNA sequencing. To phase and analyze transcripts between hybrids and parents, we developed IsoPhase. Using this tool, we validated the majority of SNPs called against matching short-read data from embryo, endosperm and root tissues, and identified allele-specific, gene-level and isoform-level differential expression between the inbred parental lines and hybrid offspring. After phasing 6907 genes in the reciprocal hybrids, we annotated the SNPs and identified large-effect genes. In addition, we identified parent-of-origin isoforms, distinct novel isoforms in maize parent and hybrid lines, and imprinted genes from different tissues. Finally, we characterized variation in cis- and trans-regulatory effects. Our study provides measures of haplotypic expression that could increase accuracy in studies of allelic expression., Bo Wang et al. report an isoform-level phasing study in maize using long-read cDNA sequencing and a new method, IsoPhase, to annotate allele-specific, gene-level and isoform-level expression. They identify novel gene isoforms, imprinted genes, and variation in cis- and trans-regulatory effects.
- Published
- 2020
- Full Text
- View/download PDF
14. Clinical application of long-read sequencing in unsolved rare disease
- Author
-
Primo Baybayan, William J Rowell, Shreyasee Chakraborty, Andres Larrea, Tomi Pastinen, Aaron M. Wenger, Isabelle Thiffault, Emily G. Farrow, Christine C. Lambert, and Neil A. Miller
- Subjects
medicine.medical_specialty ,Endocrinology ,business.industry ,Endocrinology, Diabetes and Metabolism ,Genetics ,medicine ,business ,Molecular Biology ,Biochemistry ,Dermatology ,Rare disease - Published
- 2021
- Full Text
- View/download PDF
15. Full-length sequencing of CYP2D6 variants with PacBio HiFi reads
- Author
-
Lei Zhu, Aaron M. Wenger, Josiah Wilcots, and Primo Baybayan
- Subjects
Endocrinology ,Endocrinology, Diabetes and Metabolism ,Genetics ,Computational biology ,Biology ,Molecular Biology ,Biochemistry - Published
- 2021
- Full Text
- View/download PDF
16. Sequence and annotation of 42 cannabis genomes reveals extensive copy number variation in cannabinoid synthesis and pathogen resistance genes
- Author
-
Biao Liu, Stephen H. McLaughlin, Heather Ebling, Alberto Riva, William B. Barbazuk, Kevin McKernan, Timothy T. Harkins, Zachary Eaton, Primo Baybayan, Mark Jordan, Sarah B. Kingan, Liam T. Kane, Lei Zhang, Yvonne Helbert, and Gregory T. Concepcion
- Subjects
Genetics ,Genetic diversity ,biology ,medicine.medical_treatment ,biology.organism_classification ,Genome ,MRNA Sequencing ,Tetrahydrocannabinolic acid ,medicine ,Cannabis ,Copy-number variation ,Cannabinoid ,Gene ,medicine.drug - Abstract
Cannabis is a diverse and polymorphic species. To better understand cannabinoid synthesis inheritance and its impact on pathogen resistance, we shotgun sequenced and assembled aCannabistrio (sibling pair and their offspring) utilizing long read single molecule sequencing. This resulted in the most contiguousCannabis sativaassemblies to date. These reference assemblies were further annotated with full-length male and female mRNA sequencing (Iso-Seq) to help inform isoform complexity, gene model predictions and identification of the Y chromosome. To further annotate the genetic diversity in the species, 40 male, female, and monoecious cannabis and hemp varietals were evaluated for copy number variation (CNV) and RNA expression. This identified multiple CNVs governing cannabinoid expression and 82 genes associated with resistance toGolovinomyces chicoracearum, the causal agent of powdery mildew in cannabis. Results indicated that breeding for plants with low tetrahydrocannabinolic acid (THCA) concentrations may result in deletion of pathogen resistance genes. Low THCA cultivars also have a polymorphism every 51 bases while dispensary grade high THCA cannabis exhibited a variant every 73 bases. A refined genetic map of the variation in cannabis can guide more stable and directed breeding efforts for desired chemotypes and pathogen-resistant cultivars.Sequence and annotation of 42 cannabis genomes reveals extensive copy number variation in cannabinoid synthesis and pathogen resistance genes
- Published
- 2020
- Full Text
- View/download PDF
17. Variant Phasing and Haplotypic Expression from Single-molecule Long-read Sequencing in Maize
- Author
-
Kevin Eng, Peter Van Buren, Kapeel Chougule, Michael Regulski, Elizabeth Tseng, Liya Wang, Andrew Olson, Doreen Ware, Bo Wang, Yinping Jiao, and Primo Baybayan
- Subjects
0106 biological sciences ,2. Zero hunger ,Genetics ,0303 health sciences ,education.field_of_study ,Population ,Haplotype ,Single-nucleotide polymorphism ,Biology ,01 natural sciences ,Genetic analysis ,Genome ,03 medical and health sciences ,education ,Genomic imprinting ,Gene ,Functional genomics ,030304 developmental biology ,010606 plant biology & botany - Abstract
Haplotype phasing of genetic variants in maize is important for interpretation of the genome, population genetic analysis and functional genomic analysis of allelic activity. Accordingly, accurate methods for phasing the full-length isoforms are essential for functional genomics studies. We performed an isoform-level phasing study in maize, using two inbred lines and their reciprocal crosses, based on the single-molecule full-length cDNA sequencing. To phase and analyze the full-length transcripts between hybrids and parents, we developed a tool called IsoPhase. Using this tool, we validated the majority of SNPs called against matching short-read data and identified cases of allele-specific, gene-level and isoform-level expression. Our results revealed that maize parental lines and hybrid lines exhibit different splicing activities. After phasing 6,907 genes in two reciprocal hybrids using embryo, endosperm and root tissues, we annotated the SNPs and identified large-effect genes. In addition, based on single-molecule sequencing, we identified parent-of-origin isoforms in maize hybrids, distinct novel isoforms in maize parent and hybrid lines, and imprinted genes from different tissues. Finally, we characterized variation in cis- and trans-regulatory effects. Our study provides measures of haplotypic expression that could increase accuracy in studies of allelic expression.
- Published
- 2019
- Full Text
- View/download PDF
18. A High-Quality Genome Assembly from a Single, Field-collected Spotted Lanternfly (Lycorma delicatula) using the PacBio Sequel II System
- Author
-
Anna K. Childers, Brad S. Coates, Scott M. Geib, Christine C. Lambert, Julie M. Urban, Kevin J. Hackett, Sarah B. Kingan, Primo Baybayan, Brian E. Scheffler, and Jonas Korlach
- Subjects
0106 biological sciences ,Sequence analysis ,Genome, Insect ,Sequence assembly ,Health Informatics ,Computational biology ,Biology ,Data Note ,01 natural sciences ,Genome ,DNA sequencing ,Spotted lanternfly ,03 medical and health sciences ,Animals ,Genomic library ,Gene ,Gene Library ,030304 developmental biology ,0303 health sciences ,Contig ,Diptera ,Haplotype ,Genomics ,Sequence Analysis, DNA ,Computer Science Applications ,010602 entomology ,Female ,Introduced Species ,Reference genome - Abstract
BackgroundA high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region.ResultsThe DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ∼20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ∼36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig.ConclusionsWe demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.
- Published
- 2019
- Full Text
- View/download PDF
19. Single-Molecule Real-Time (SMRT) Full-Length RNA-Sequencing Reveals Novel and Distinct mRNA Isoforms in Human Bone Marrow Cell Subpopulations
- Author
-
Elizabeth Tseng, Miloslav Sanda, Primo Baybayan, Garrett T. Graham, Anna T. Riegel, Marcel O. Schmidt, Anton Wellstein, Jean-Baptiste Mazarati, Robert Sebra, and Anne Deslattes Mays
- Subjects
0301 basic medicine ,Protein isoform ,Gene isoform ,lcsh:QH426-470 ,Population ,bone marrow cell subpopulations ,Bone Marrow Cells ,Biology ,Article ,Transcriptome ,03 medical and health sciences ,0302 clinical medicine ,Exome Sequencing ,Genetics ,medicine ,Transcriptional regulation ,Humans ,Cell Lineage ,RNA, Messenger ,education ,mRNA isoforms ,Genetics (clinical) ,education.field_of_study ,Messenger RNA ,High-Throughput Nucleotide Sequencing ,RNA ,Genomics ,Molecular biology ,Single Molecule Imaging ,Alternative Splicing ,lcsh:Genetics ,030104 developmental biology ,medicine.anatomical_structure ,full length RNAseq ,protein isoforms ,Bone marrow ,030217 neurology & neurosurgery - Abstract
Hematopoietic cells are continuously replenished from progenitor cells that reside in the bone marrow. To evaluate molecular changes during this process, we analyzed the transcriptomes of freshly harvested human bone marrow progenitor (lineage-negative) and differentiated (lineage-positive) cells by single-molecule real-time (SMRT) full-length RNA-sequencing. This analysis revealed a ~5-fold higher number of transcript isoforms than previously detected and showed a distinct composition of individual transcript isoforms characteristic for bone marrow subpopulations. A detailed analysis of messenger RNA (mRNA) isoforms transcribed from the ANXA1 and EEF1A1 loci confirmed their distinct composition. The expression of proteins predicted from the transcriptome analysis was evaluated by mass spectrometry and validated previously unknown protein isoforms predicted e.g., for EEF1A1. These protein isoforms distinguished the lineage negative cell population from the lineage positive cell population. Finally, transcript isoforms expressed from paralogous gene loci (e.g., CFD, GATA2, HLA-A, B, and C) also distinguished cell subpopulations but were only detectable by full-length RNA sequencing. Thus, qualitatively distinct transcript isoforms from individual genomic loci separate bone marrow cell subpopulations indicating complex transcriptional regulation and protein isoform generation during hematopoiesis.
- Published
- 2019
20. Single molecule real time (SMRT) full length RNA-sequencing reveals novel and distinct mRNA isoforms in human bone marrow cell subpopulations
- Author
-
Anne Deslattes Mays, Jean-Baptiste Mazarati, Miloslav Sanda, Primo Baybayan, Robert Sebra, Marcel O. Schmidt, Elizabeth Tseng, Garrett T. Graham, Anna T. Riegel, and Anton Wellstein
- Subjects
Gene isoform ,Protein isoform ,0303 health sciences ,education.field_of_study ,Population ,RNA ,Biology ,Molecular biology ,Eukaryotic translation elongation factor 1 alpha 1 ,Transcriptome ,03 medical and health sciences ,0302 clinical medicine ,medicine.anatomical_structure ,Transcriptional regulation ,medicine ,Bone marrow ,education ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
Hematopoietic cells are continuously replenished from progenitor cells that reside in the bone marrow. To evaluate molecular changes during this process, we analyzed the transcriptomes of freshly harvested human bone marrow progenitor (lineage-negative) and differentiated (lineage-positive) cells by single molecule, real time (SMRT) full length RNA sequencing. This analysis revealed a ∼5-fold higher number of transcript isoforms than previously detected and showed a distinct composition of individual transcript isoforms characteristic for bone marrow subpopulations. A detailed analysis of mRNA isoforms transcribed from the ANXA1 and EEF1A1 loci confirmed their distinct composition. The expression of proteins predicted from the transcriptome analysis was validated by mass spectrometry and validated previously unknown protein isoforms predicted e.g. for EEF1A1. These protein isoforms distinguished the lineage negative cell population from the lineage positive cell population. Finally, transcript isoforms expressed from paralogous gene loci (e.g. CFD, GATA2, HLA-A, B & C) also distinguished cell subpopulations but were only detectable by full length RNA sequencing. Thus, qualitatively distinct transcript isoforms from individual genomic loci separate bone marrow cell subpopulations indicating complex transcriptional regulation and protein isoform generation during hematopoiesis.
- Published
- 2019
- Full Text
- View/download PDF
21. Comparative Genome Analysis of an Extensively Drug-Resistant Isolate of Avian Sequence Type 167 In Silico Serotype O89b:H9
- Author
-
Richard Hall, Christine C. Lambert, Brian T. Ho, Primo Baybayan, Damee Moon, Xiancheng Zeng, Shihua Wang, Brenda A. Wilson, Mengfei Ho, and Xuelin Chi
- Subjects
insertion sequence ,antibiotic resistance ,prophage ,Physiology ,Sequence analysis ,lcsh:QR1-502 ,Biology ,medicine.disease_cause ,Biochemistry ,Genome ,Microbiology ,plasmid-mediated resistance ,secretion systems ,lcsh:Microbiology ,genome comparison ,Host-Microbe Biology ,03 medical and health sciences ,Plasmid ,Genetics ,medicine ,Insertion sequence ,Molecular Biology ,Escherichia coli ,Ecology, Evolution, Behavior and Systematics ,Prophage ,030304 developmental biology ,2. Zero hunger ,0303 health sciences ,extensively drug resistant ,030306 microbiology ,pathogen evolution ,Plasmid-mediated resistance ,O-antigen ,capsular polysaccharide ,QR1-502 ,3. Good health ,Computer Science Applications ,Modeling and Simulation ,Mobile genetic elements ,Research Article - Abstract
E. coli strain Sanji is the first sequenced and analyzed genome of the recently emerged pathogenic XDR strains with sequence type ST167 and novel in silico serotype O89b:H9. Comparison of the genomes of Sanji with other ST167 strains revealed distinct sets of different plasmids, mobile IS elements, and antibiotic resistance genes in each genome, indicating that there exist multiple paths toward achieving XDR. The emergence of these pathogenic ST167 E. coli strains with diverse XDR capabilities highlights the difficulty of preventing or mitigating the development of XDR properties in bacteria and points to the importance of better understanding of the shared underlying virulence mechanisms and physiology of pathogenic bacteria., Extensive drug resistance (XDR) is an escalating global problem. Escherichia coli strain Sanji was isolated from an outbreak of pheasant colibacillosis in Fujian province, China, in 2011. This strain has XDR properties, exhibiting sensitivity to carbapenems but no other classes of known antibiotics. Whole-genome sequencing revealed a total of 32 known antibiotic resistance genes, many associated with insertion sequence 26 (IS26) elements. These were found on the Sanji chromosome and 2 of its 6 plasmids, pSJ_255 and pSJ_82. The Sanji chromosome also harbors a type 2 secretion system (T2SS), a type 3 secretion system (T3SS), a type 6 secretion system (T6SS), and several putative prophages. Sanji and other ST167 strains have a previously uncharacterized O-antigen (O89b) that is most closely related to serotype O89 as determined on the basis of analysis of the wzm-wzt genes and in silico serotyping. This O89b-antigen gene cluster was also found in the genomes of a few other pathogenic sequence type 617 (ST617) and ST10 complex strains. A time-scaled phylogeny inferred from comparative single nucleotide variant analysis indicated that development of these O89b-containing lineages emerged about 30 years ago. Comparative sequence analysis revealed that the core genome of Sanji is nearly identical to that of several recently sequenced strains of pathogenic XDR E. coli belonging to the ST167 group. Comparison of the mobile elements among the different ST167 genomes revealed that each genome carries a distinct set of multidrug resistance genes on different types of plasmids, indicating that there are multiple paths toward the emergence of XDR in E. coli. IMPORTANCE E. coli strain Sanji is the first sequenced and analyzed genome of the recently emerged pathogenic XDR strains with sequence type ST167 and novel in silico serotype O89b:H9. Comparison of the genomes of Sanji with other ST167 strains revealed distinct sets of different plasmids, mobile IS elements, and antibiotic resistance genes in each genome, indicating that there exist multiple paths toward achieving XDR. The emergence of these pathogenic ST167 E. coli strains with diverse XDR capabilities highlights the difficulty of preventing or mitigating the development of XDR properties in bacteria and points to the importance of better understanding of the shared underlying virulence mechanisms and physiology of pathogenic bacteria.
- Published
- 2019
- Full Text
- View/download PDF
22. A High-Quality De novo Genome Assembly from a Single Mosquito Using PacBio Sequencing
- Author
-
Sarah B. Kingan, Jonas Korlach, Christine C. Lambert, Brendan Galvin, Haynes Heaton, Juliana Cudini, Mara K. N. Lawniczak, Richard Durbin, Primo Baybayan, Kingan, Sarah B [0000-0002-4900-0189], Heaton, Haynes [0000-0002-9649-525X], Durbin, Richard [0000-0002-9130-1006], Korlach, Jonas [0000-0003-3047-4250], Lawniczak, Mara KN [0000-0002-3006-2080], and Apollo - University of Cambridge Repository
- Subjects
0106 biological sciences ,0301 basic medicine ,lcsh:QH426-470 ,0206 medical engineering ,Genome, Insect ,Sequence assembly ,mosquito ,02 engineering and technology ,Computational biology ,010603 evolutionary biology ,01 natural sciences ,Genome ,Article ,03 medical and health sciences ,Contig Mapping ,Anopheles ,low-input DNA ,Genetics ,Animals ,long-read SMRT sequencing ,Gene ,Genome size ,Genetics (clinical) ,030304 developmental biology ,Comparative genomics ,0303 health sciences ,Ploidies ,Polymorphism, Genetic ,Contig ,de novo genome assembly ,Sequence Analysis, DNA ,genomic DNA ,lcsh:Genetics ,030104 developmental biology ,020602 bioinformatics ,Reference genome - Abstract
A high-quality reference genome is a fundamental resource for functional genetics, comparative genomics, and population genomics, and is increasingly important for conservation biology. PacBio Single Molecule, Real-Time (SMRT) sequencing generates long reads with uniform coverage and high consensus accuracy, making it a powerful technology for de novo genome assembly. Improvements in throughput and concomitant reductions in cost have made PacBio an attractive core technology for many large genome initiatives, however, relatively high DNA input requirements (~5 µ, g for standard library protocol) have placed PacBio out of reach for many projects on small organisms that have lower DNA content, or on projects with limited input DNA for other reasons. Here we present a high-quality de novo genome assembly from a single Anopheles coluzzii mosquito. A modified SMRTbell library construction protocol without DNA shearing and size selection was used to generate a SMRTbell library from just 100 ng of starting genomic DNA. The sample was run on the Sequel System with chemistry 3.0 and software v6.0, generating, on average, 25 Gb of sequence per SMRT Cell with 20 h movies, followed by diploid de novo genome assembly with FALCON-Unzip. The resulting curated assembly had high contiguity (contig N50 3.5 Mb) and completeness (more than 98% of conserved genes were present and full-length). In addition, this single-insect assembly now places 667 (>, 90%) of formerly unplaced genes into their appropriate chromosomal contexts in the AgamP4 PEST reference. We were also able to resolve maternal and paternal haplotypes for over 1/3 of the genome. By sequencing and assembling material from a single diploid individual, only two haplotypes were present, simplifying the assembly process compared to samples from multiple pooled individuals. The method presented here can be applied to samples with starting DNA amounts as low as 100 ng per 1 Gb genome size. This new low-input approach puts PacBio-based assemblies in reach for small highly heterozygous organisms that comprise much of the diversity of life.
- Published
- 2019
23. Reference Grade Characterization of Polymorphisms in Full-Length HLA Class I and II Genes With Short-Read Sequencing on the ION PGM System and Long-Reads Generated by Single Molecule, Real-Time Sequencing on the PacBio Platform
- Author
-
Ken Osaki, Yuko Ohnuki, Akira Oka, Yasuo Morishima, Anri Masuya, John Harting, Jerzy K. Kulski, Shingo Suzuki, Takashi Shiina, Satoko Morishima, Atsuko Shigenari, Sayaka Ito, Miwako Kitazume, Primo Baybayan, Hidetoshi Inoko, Swati Ranade, and Junichi Sunaga
- Subjects
0301 basic medicine ,Adult ,Male ,lcsh:Immunologic diseases. Allergy ,PacBio RS II ,Genotype ,Genotyping Techniques ,Immunology ,Genes, MHC Class II ,Genes, MHC Class I ,Human leukocyte antigen ,SMRT sequencing ,Biology ,DNA sequencing ,Arthritis, Rheumatoid ,03 medical and health sciences ,Gene Frequency ,human leukocyte antigen ,Ion PGM ,Immunology and Allergy ,Humans ,Genetic Predisposition to Disease ,Allele frequency ,Genotyping ,Alleles ,Genetic Association Studies ,Phylogeny ,Original Research ,Aged ,Genetics ,Aged, 80 and over ,Polymorphism, Genetic ,Computational Biology ,High-Throughput Nucleotide Sequencing ,Ion semiconductor sequencing ,Genomics ,Sequence Analysis, DNA ,Amplicon ,Middle Aged ,Transplantation ,HLA ,030104 developmental biology ,genotyping ,NGS ,Female ,next-generation sequencing ,lcsh:RC581-607 ,Software ,Reference genome - Abstract
Although NGS technologies fuel advances in high-throughput HLA genotyping methods for identification and classification of HLA genes to assist with precision medicine efforts in disease and transplantation, the efficiency of these methods are impeded by the absence of adequately-characterized high-frequency HLA allele reference sequence databases for the highly polymorphic HLA gene system. Here, we report on producing a comprehensive collection of full-length HLA allele sequences for eight classical HLA loci found in the Japanese population. We augmented the second-generation short read data generated by the Ion Torrent technology with long amplicon spanning consensus reads delivered by the third-generation SMRT sequencing method to create reference grade high-quality sequences of HLA class I and II gene alleles resolved at the genomic coding and non-coding level. Forty-six DNAs were obtained from a reference set used previously to establish the HLA allele frequency data in Japanese subjects. The samples included alleles with a collective allele frequency in the Japanese population of more than 99.2%. The HLA loci were independently amplified by long-range PCR using previously designed HLA-locus specific primers and subsequently sequenced using SMRT and Ion PGM sequencers. The mapped long and short-reads were used to produce a reference library of consensus HLA allelic sequences with the help of the reference-aware software tool LAA for SMRT Sequencing. A total of 253 distinct alleles were determined for 46 healthy subjects. Of them, 137 were novel alleles: 101 SNVs and/or indels and 36 extended alleles at a partial or full-length level. Comparing the HLA sequences from the perspective of nucleotide diversity revealed that HLA-DRB1 was the most divergent among the eight HLA genes, and that the HLA-DPB1 gene sequences diverged into two distinct groups, DP2 and DP5, with evidence of independent polymorphisms generated in exon 2. We also identified two specific intronic variations in HLA-DRB1 that might be involved in rheumatoid arthritis. In conclusion, full-length HLA allele sequencing by third-generation and second-generation technologies has provided polymorphic gene reference sequences at a genomic allelic resolution including allelic variations assigned up to the field-4 level for a stronger foundation in precision medicine and HLA-related disease and transplantation studies.
- Published
- 2018
- Full Text
- View/download PDF
24. A High-Quality, Long-Read De Novo Genome Assembly to Aid Conservation of Hawaii’s Last Remaining Crow Species
- Author
-
M. Renee Bellinger, Jolene T. Sutton, Primo Baybayan, Oliver A. Ryder, Martin Helmkampf, Richard Hall, Jonas Korlach, Jenny Gu, Cynthia C. Steiner, Sarah B. Kingan, Bryce M. Masuda, and Jill Muehling
- Subjects
0301 basic medicine ,0106 biological sciences ,Hawaiian crow ,lcsh:QH426-470 ,Population ,runs of homozygosity (ROH) ,Endangered species ,Sequence assembly ,Genomics ,SMRT sequencing ,Runs of Homozygosity ,Biology ,010603 evolutionary biology ,01 natural sciences ,Genome ,Article ,03 medical and health sciences ,Captive breeding ,Genetics ,education ,Genetics (clinical) ,Wildlife conservation ,030304 developmental biology ,Comparative genomics ,0303 health sciences ,education.field_of_study ,behavior ,fungi ,food and beverages ,15. Life on land ,biology.organism_classification ,major histocompatibility complex ,lcsh:Genetics ,030104 developmental biology ,toll-like receptors ,Evolutionary biology ,inbreeding depression - Abstract
Genome-level data can provide researchers with unprecedented precision to examine the causes and genetic consequences of population declines, which can inform conservation management. Here, we present a high-quality, long-read, de novo genome assembly for one of the world’s most endangered bird species, the ʻAlalā (Corvus hawaiiensis, Hawaiian crow). As the only remaining native crow species in Hawaiʻi, the ʻAlalā survived solely in a captive-breeding program from 2002 until 2016, at which point a long-term reintroduction program was initiated. The high-quality genome assembly was generated to lay the foundation for both comparative genomics studies and the development of population-level genomic tools that will aid conservation and recovery efforts. We illustrate how the quality of this assembly places it amongst the very best avian genomes assembled to date, comparable to intensively studied model systems. We describe the genome architecture in terms of repetitive elements and runs of homozygosity, and we show that compared with more outbred species, the ʻAlalā genome is substantially more homozygous. We also provide annotations for a subset of immunity genes that are likely to be important in conservation management, and we discuss how this genome is currently being used as a roadmap for downstream conservation applications.
- Published
- 2018
- Full Text
- View/download PDF
25. Structural variation and its potential impact on genome instability: Novel discoveries in the EGFR landscape by long-read sequencing
- Author
-
Cook George W, Denise Raterman, Lyska Emerson, Michael G. Benton, William J Rowell, Primo Baybayan, Daniel Burgess, Jenny Gu, Heath D. Herbold, Thomas K. Varghese, John M. O’Shea, Cynthia Moehlenkamp, George F. Mayhew, Wallace Akerley, Christine C. Lambert, John T. Fussell, and Kevin Eng
- Subjects
Genome instability ,Lung Neoplasms ,Heredity ,Gene Identification and Analysis ,Genetic Networks ,Biochemistry ,Genome ,Database and Informatics Methods ,0302 clinical medicine ,Carcinoma, Non-Small-Cell Lung ,Basic Cancer Research ,Medicine and Health Sciences ,Macromolecular Structure Analysis ,0303 health sciences ,Multidisciplinary ,High-Throughput Nucleotide Sequencing ,Genomics ,Genetic Mapping ,Oncology ,030220 oncology & carcinogenesis ,Medicine ,Sequence Analysis ,Network Analysis ,Research Article ,Computer and Information Sciences ,Protein Structure ,Bioinformatics ,Science ,Alu element ,Computational biology ,Biology ,Research and Analysis Methods ,Genomic Instability ,Human Genomics ,Structural variation ,03 medical and health sciences ,Cancer Genomics ,Genomic Medicine ,Alu Elements ,Sequence Motif Analysis ,Genetics ,Humans ,Computer Simulation ,Repeated Sequences ,Molecular Biology ,Gene ,030304 developmental biology ,Genome, Human ,Genetic Variation ,Biology and Life Sciences ,Proteins ,Genes, erbB-1 ,Sequence Analysis, DNA ,Haplotypes ,Human genome ,Protein Structure Networks ,Sequence Alignment ,Reference genome - Abstract
Structural variation (SV) is typically defined as variation within the human genome that exceeds 50 base pairs (bp). SV may be copy number neutral or it may involve duplications, deletions, and complex rearrangements. Recent studies have shown SV to be associated with many human diseases. However, studies of SV have been challenging due to technological constraints. With the advent of third generation (long-read) sequencing technology, exploration of longer stretches of DNA not easily examined previously has been made possible. In the present study, we utilized third generation (long-read) sequencing techniques to examine SV in the EGFR landscape of four haplotypes derived from two human samples. We analyzed the EGFR gene and its landscape (+/- 500,000 base pairs) using this approach and were able to identify a region of non-coding DNA with over 90% similarity to the most common activating EGFR mutation in non-small cell lung cancer. Based on previously published Alu-element genome instability algorithms, we propose a molecular mechanism to explain how this non-coding region of DNA may be interacting with and impacting the stability of the EGFR gene and potentially generating this cancer-driver gene. By these techniques, we were also able to identify previously hidden structural variation in the four haplotypes and in the human reference genome (hg38). We applied previously published algorithms to compare the relative stabilities of these five different EGFR gene landscape haplotypes to estimate their relative potentials to generate the EGFR exon 19, 15 bp canonical deletion. To our knowledge, the present study is the first to use the differences in genomic architecture between targeted cancer-linked phased haplotypes to estimate their relative potentials to form a common cancer-linked driver mutation.
- Published
- 2020
- Full Text
- View/download PDF
26. Real-Time DNA Sequencing from Single Polymerase Molecules
- Author
-
Jeremy Gray, Mark Trulson, Patrick Marks, John Dixon, Ravindra V. Dalal, Fred Christians, Adrian Fehr, Jon M. Sorenson, Stephen Turner, Alfred Gaertner, Sonya Clark, Geoff Otto, Gregory J. Kearns, John Lyle, Alex DeWinter, Brad Bettman, Ronald Kuse, Primo Baybayan, Steven Lin, Denis Zaccarin, John Vieceli, Joy Roy, Cheryl Heiner, David R. Rank, Kevin Travers, Robert Sebra, Mathieu Foquet, Thang Pham, Dawn Wu, Keith Bjornson, Michael Phillips, Arkadiusz Bibillo, Bidhan Chaudhuri, Gene Shen, Alicia Yang, Mark Maxham, Peter Zhao, Khai Luong, Paul Hardenbol, Insil Park, Jonas Korlach, Paul Lundquist, Jeffrey Wegener, Kevin Hester, David P. Holden, Paul Peluso, Congcong Ma, Frank Zhong, Yves Lacroix, Austin B. Tomaney, Devon Murphy, John Eid, Xiangxu Kong, and Ronald L. Cicero
- Subjects
DNA nanoball sequencing ,Deoxyribonucleotides ,DNA, Single-Stranded ,DNA-Directed DNA Polymerase ,Sequencing by hybridization ,Consensus Sequence ,Polymerase ,Fluorescent Dyes ,Multidisciplinary ,DNA clamp ,Base Sequence ,biology ,Oligonucleotide ,Multiple displacement amplification ,DNA ,Sequence Analysis, DNA ,Enzymes, Immobilized ,Molecular biology ,Nanostructures ,Sequencing by ligation ,Kinetics ,Spectrometry, Fluorescence ,biology.protein ,Biophysics ,DNA, Circular ,Single molecule real time sequencing - Abstract
We present single-molecule, real-time sequencing data obtained from a DNA polymerase performing uninterrupted template-directed synthesis using four distinguishable fluorescently labeled deoxyribonucleoside triphosphates (dNTPs). We detected the temporal order of their enzymatic incorporation into a growing DNA strand with zero-mode waveguide nanostructure arrays, which provide optical observation volume confinement and enable parallel, simultaneous detection of thousands of single-molecule sequencing reactions. Conjugation of fluorophores to the terminal phosphate moiety of the dNTPs allows continuous observation of DNA synthesis over thousands of bases without steric hindrance. The data report directly on polymerase dynamics, revealing distinct polymerization states and pause sites corresponding to DNA secondary structure. Sequence data were aligned with the known reference sequence to assay biophysical parameters of polymerization for each template position. Consensus sequences were generated from the single-molecule reads at 15-fold coverage, showing a median accuracy of 99.3%, with no systematic error beyond fluorophore-dependent error rates.
- Published
- 2009
- Full Text
- View/download PDF
27. Complete genome sequence of Streptomyces sp. strain CFMR 7, a natural rubber degrading actinomycete isolated from Penang, Malaysia
- Author
-
Kim-Hou Chia, Todd D. Taylor, Shinji Kondo, Nazalan Najimudin, Siddharth Singh, Gincy P. Thottathil, Primo Baybayan, Kumar Sudesh, and Jayaram Nanthini
- Subjects
DNA, Bacterial ,Latex ,Bioengineering ,Biology ,complex mixtures ,Applied Microbiology and Biotechnology ,Streptomyces ,Microbiology ,Bacterial protein ,Natural rubber ,Bacterial Proteins ,Botany ,Genome size ,Gene ,Whole genome sequencing ,Strain (chemistry) ,Contig ,technology, industry, and agriculture ,Malaysia ,General Medicine ,Sequence Analysis, DNA ,biology.organism_classification ,body regions ,visual_art ,visual_art.visual_art_medium ,psychological phenomena and processes ,Genome, Bacterial ,Biotechnology - Abstract
Streptomyces sp. strain CFMR 7, which naturally degrades rubber, was isolated from a rubber plantation. Whole genome sequencing and assembly resulted in 2 contigs with total genome size of 8.248 Mb. Two latex clearing protein (lcp) genes which are responsible for rubber degrading activities were identified.
- Published
- 2015
28. Complete Genome Sequence of the Hypervirulent Bacterium Clostridium difficile Strain G46, Ribotype 027
- Author
-
Saheer E. Gharbia, Jane F. Turton, Tom Gaulton, Richard Hall, Haroun N. Shah, Raju Misra, Steve Picton, Graham Rose, Primo Baybayan, Jane Freeman, and Jonas Korlach
- Subjects
Whole genome sequencing ,Strain (biology) ,Outbreak ,Clostridium difficile ,Biology ,biology.organism_classification ,Microbiology ,Diarrhea ,Genetics ,medicine ,Prokaryotes ,medicine.symptom ,Molecular Biology ,Bacteria - Abstract
Clostridium difficile is one of the leading causes of antibiotic-associated diarrhea in health care facilities worldwide. Here, we report the genome sequence of C. difficile strain G46, ribotype 027, isolated from an outbreak in Glamorgan, Wales, in 2006.
- Published
- 2015
29. Abstract 5366: Detection of low-frequency somatic variants using single-molecule, real-time sequencing
- Author
-
Laura K. Nolden and Primo Baybayan
- Subjects
Cancer Research ,Oncology ,Somatic cell ,Computational biology ,Biology ,Variants of PCR ,Exome sequencing ,Single molecule real time sequencing - Abstract
Detection of somatic mutations, especially in heterogeneous tumor samples where variants may be present at a low level, is challenging. Single Molecule, Real-Time (SMRT®) Sequencing is ideal for minor variant detection because of its ability to sequence single molecules with very high accuracy (>QV40) using the circular consensus sequencing (CCS) approach. Here, we characterize the Sequel System for the detection of low-frequency somatic variants using constructs containing mutations in coding regions in EGFR, NPM1, AKT1 and JAK2 representing deletion, insertion, substitution and homopolymer variants. Wild type and mutant amplicons, provided by SeraCare, were mixed and serially diluted from 10% down to 0.1% allelic frequency. Independent SMRTbell libraries were constructed for each dilution point, sequenced and analyzed using SMRT Sequencing to identify the variants and determine the observed frequency. The random error profile and high-accuracy CCS reads make it possible to accurately detect low-frequency somatic variants. We will demonstrate sensitivity of the PacBio Systems to detect mutations down to 0.1%. Citation Format: Primo Baybayan, Laura Nolden. Detection of low-frequency somatic variants using single-molecule, real-time sequencing [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 5366. doi:10.1158/1538-7445.AM2017-5366
- Published
- 2017
- Full Text
- View/download PDF
30. Burkholderia pseudomallei sequencing identifies genomic clades with distinct recombination, accessory, and epigenetic profiles
- Author
-
John Harting, Mitali Sarkar-Tyson, Patrick Tan, Chua Hui Hoon, Swaine L. Chen, Richard W. Titball, Matthew T. G. Holden, Justin A Boddey, Ifor R. Beacham, Ian R. Peak, Xavier Didelot, Kurosh S. Mehershahi, Sophie J. Smither, Tannistha Nandi, Susana Wang, Catherine Ong, Bernice Sim, Yan Guo, Michelle Nelson, Stephen L. Michell, Angela E. Essex-Lopresti, David J. Studholme, Julian Parkhill, Lee Chee How, Lay Tin Aw, Primo Baybayan, University of St Andrews. School of Medicine, University of St Andrews. Infection Group, and University of St Andrews. Biomedical Sciences Research Complex
- Subjects
DNA, Bacterial ,Burkholderia pseudomallei ,Genomics ,QH426 Genetics ,R Medicine (General) ,Biology ,Genome ,Polymorphism, Single Nucleotide ,Epigenesis, Genetic ,Mice ,SDG 3 - Good Health and Well-being ,Genetics ,Escherichia coli ,Animals ,Humans ,QH426 ,Genetics (clinical) ,Genetic Association Studies ,Phylogeny ,DNA Primers ,Recombination, Genetic ,Genetic diversity ,Mice, Inbred BALB C ,Errata ,Research ,Haplotype ,DAS ,Sequence Analysis, DNA ,R1 ,Haplotypes ,Melioidosis ,Horizontal gene transfer ,DNA methylation ,Multilocus sequence typing ,Female ,Mobile genetic elements ,Transcriptome ,Gene Deletion ,Genome, Bacterial ,Multilocus Sequence Typing - Abstract
This study was supported by a core grant to P.T. from the GIS, an A-STAR research institute. The sequencing of the Burkholderia pseudomallei strains was supported by Wellcome Trust grant 098051 to J.P. Burkholderia pseudomallei (Bp) is the causative agent of the infectious disease melioidosis. To investigate population diversity, recombination, and horizontal gene transfer in closely related Bp isolates, we performed whole-genome sequencing (WGS) on 106 clinical, animal, and environmental strains from a restricted Asian locale. Whole-genome phylogenies resolved multiple genomic clades of Bp, largely congruent with multilocus sequence typing (MLST). We discovered widespread recombination in the Bp core genome, involving hundreds of regions associated with multiple haplotypes. Highly recombinant regions exhibited functional enrichments that may contribute to virulence. We observed clade-specific patterns of recombination and accessory gene exchange, and provide evidence that this is likely due to ongoing recombination between clade members. Reciprocally, interclade exchanges were rarely observed, suggesting mechanisms restricting gene flow between clades. Interrogation of accessory elements revealed that each clade harbored a distinct complement of restriction-modification (RM) systems, predicted to cause clade-specific patterns of DNA methylation. Using methylome sequencing, we confirmed that representative strains from separate clades indeed exhibit distinct methylation profiles. Finally, using an E. coli system, we demonstrate that Bp RM systems can inhibit uptake of non-self DNA. Our data suggest that RM systems borne on mobile elements, besides preventing foreign DNA invasion, may also contribute to limiting exchanges of genetic material between individuals of the same species. Genomic clades may thus represent functional units of genetic isolation in Bp, modulating intraspecies genetic diversity. Publisher PDF
- Published
- 2014
31. Correction: Comparing the genomes of Helicobacter pylori clinical strain UM032 and mice-adapted derivatives
- Author
-
Khean-Lee Goh, Mun Fai Loke, Barry J. Marshall, Jamuna Vadivelu, Meredith Ashby, Vellaya Rehvathy, Susana Wang, Sven Pettersson, Shih Wee Seow, Eng Guan Chua, Alfred Tay, Primo Baybayan, Tim Perkins, Arlaine Anne Amoyo, Yalda Khosravi, Siddarth Singh, Junxian Ong, Wei Yee Wee, Siew Woh Choo, and School of Biological Sciences
- Subjects
Genetics ,biology ,business.industry ,Strain (biology) ,Gastroenterology ,Correction ,Helicobacter pylori ,biology.organism_classification ,Microbiology ,Genome ,Science::Biological sciences::Microbiology::Bacteria [DRNTU] ,Infectious Diseases ,Virology ,Medicine ,Parasitology ,business - Abstract
Correction: Comparing the genomes of Helicobacter pylori clinical strain UM032 and mice-adapted derivatives Yalda Khosravi, Vellaya Rehvathy, Wei Yee Wee, Susana Wang, Primo Baybayan, Siddarth Singh, Meredith Ashby, Junxian Ong, Arlaine Anne Amoyo, Shih Wee Seow, Siew Woh Choo, Tim Perkins, Eng Guan Chua, Alfred Tay, Barry James Marshall, Mun Fai Loke, Khean Lee Goh, Sven Pettersson and Jamuna Vadivelu
- Published
- 2014
32. Multiple Genome Sequences of Helicobacter pylori Strains of Diverse Disease and Antibiotic Resistance Backgrounds from Malaysia
- Author
-
Nadeem O. Kaakoush, Laurence J. Croft, S.P. Gunaletchumy, Mun Fai Loke, Khean-Lee Goh, Hazel M. Mitchell, Susana Wang, Meredith Ashby, Siddarth Singh, Jamuna Vadivelu, Mun Hua Tan, Vellaya Rehvathy, Primo Baybayan, and Xinsheng Teh
- Subjects
Chronic gastritis ,Disease ,Biology ,Helicobacter pylori ,medicine.disease ,biology.organism_classification ,Genome ,digestive system diseases ,Lymphoma ,Microbiology ,Gastric adenocarcinoma ,Antibiotic resistance ,Microbial risk ,Genetics ,medicine ,Prokaryotes ,Molecular Biology - Abstract
Helicobacter pylori causes human gastroduodenal diseases, including chronic gastritis and peptic ulcer disease. It is also a major microbial risk factor for the development of gastric adenocarcinoma and mucosa-associated lymphoid tissue (MALT) lymphoma. Twenty-one strains with different ethnicity, disease, and antimicrobial susceptibility backgrounds were sequenced by use of Illumina HiSeq and PacBio RS platforms.
- Published
- 2013
- Full Text
- View/download PDF
33. X–linked anhidrotic (hypohidrotic) ectodermal dysplasia is caused by mutation in a novel transmembrane protein
- Author
-
Delyth Morgan, Primo Baybayan, Ellson Y. Chen, Jonathan Zonana, David Schlessinger, Betsy Ferguson, Ulpu Saarialho-Kere, Nicholas Stuart Tudor Thomas, Outi Montonen, Sini Ezer, Albert de la Chapelle, Anand Srivastava, Angus John Clarke, Juha Kere, and Felix Munoz
- Subjects
Adult ,Male ,Ectodermal dysplasia ,DNA, Complementary ,X Chromosome ,Positional cloning ,Genetic Linkage ,Molecular Sequence Data ,Gene Expression ,Biology ,Translocation, Genetic ,03 medical and health sciences ,0302 clinical medicine ,Ectodermal Dysplasia ,Skin Physiological Phenomena ,Genetics ,medicine ,Humans ,Edar Receptor ,Ectodysplasin A receptor ,Amino Acid Sequence ,RNA, Messenger ,Hypohidrotic ectodermal dysplasia ,Promoter Regions, Genetic ,Chromosomes, Artificial, Yeast ,Alleles ,In Situ Hybridization ,DNA Primers ,030304 developmental biology ,Hypohidrosis ,0303 health sciences ,EDARADD ,Base Sequence ,Tooth Abnormalities ,Membrane Proteins ,Alopecia ,030206 dentistry ,Ectodysplasins ,medicine.disease ,CpG Islands ,Ectodysplasin A ,Hair - Abstract
Ectodermal dysplasias comprise over 150 syndromes of unknown pathogenesis. X-linked anhidrotic ectodermal dysplasia (EDA) is characterized by abnormal hair, teeth and sweat glands. We now describe the positional cloning of the gene mutated in EDA. Two exons, separated by a 200-kilobase intron, encode a predicted 135-residue transmembrane protein. The gene is disrupted in six patients with X;autosome translocations or submicroscopic deletions; nine patients had point mutations. The gene is expressed in keratinocytes, hair follicles, and sweat glands, and in other adult and fetal tissues. The predicted EDA protein may belong to a novel class with a role in epithelial-mesenchymal signalling.
- Published
- 1996
- Full Text
- View/download PDF
34. P019 Collection of major HLA allele sequences in japanese population towards precise NGS based HLA DNA typing at the field 4 level
- Author
-
John Harting, Takashi Shiina, Miwako Kitazume, Ken Osaki, Swati Ranade, Shingo Suzuki, Junichi Sunaga, and Primo Baybayan
- Subjects
Genetics ,education.field_of_study ,Immunology ,Population ,Single-nucleotide polymorphism ,General Medicine ,Human leukocyte antigen ,Biology ,Null allele ,DNA sequencer ,Genotype ,Immunology and Allergy ,education ,Allele frequency ,Genotyping - Abstract
Aim We previously reported on the use of the Ion PGM next generation sequencing (NGS) platform to genotype HLA class I and class II genes by a super-high resolution, single-molecule, sequence-based typing (SS-SBT) method. However, HLA alleles could not be assigned at the field 4 level at some HLA loci such as DQA1, DPA1 and DPB1 because the SNP and indel densities were too low to identify and separate both of the phases. In this regard, we have now added the single molecule, real-time (SMRT®) DNA sequencer PacBio RS II method to our analysis in order to test whether it might determine the HLA allele sequences in some of the loci with which we previously had difficulties. Here, we report on sequence-based genotyping from the promoter-enhancer region to 3 ′ UTR of the major HLA genes in the Japanese using the PacBio RS II and Ion PGM NGS systems. Method Forty-six DNA samples were obtained from a reference set used previously to establish the HLA allele frequency data in the Japanese population. The reference samples represented more than 99.5% of the HLA alleles at each of the nine HLA loci. The genomic DNA samples were amplified by long ranged PCR using eleven HLA loci specific primer pairs (A, B, C, DRB1, DRB3/4/5, DQA1, DQB1, DPA1 and DPB1). After NGS, consensus sequences were obtained via Long Amplicon analysis such as overlapping, clustering of filtered sub-reads, phasing and removing PCR artifacts. The HLA allele sequences for generating a library of consensus sequences were determined by mapping of sequence reads obtained from Ion PGM sequencer. Results A total of 219 HLA allele sequences (20 A, 40 B, 23 C, 31 DRB1, 36 DQA1, 14 DQB1, 15 DPA1 and 40 DPB1) that covered the promoter-enhancer region to 3’UTR were determined for the 46 samples. The classification of the newly identified SNPs and/or indels revealed at least three non-synonymous substitutions for one B and two DPA1 alleles, although most of the substitutions were observed in intron regions. Conclusion We determined at least 219 Japanese major HLA alleles at the field 4 level by NGS and we expect that this HLA genotyping method of entire HLA gene regions by NGS will help to precisely detect rare, novel and null alleles in population genetic and disease studies.
- Published
- 2016
- Full Text
- View/download PDF
35. Abstract 3646: Highly sensitive and cost-effective detection of somatic cancer variants using single-molecule, real-time sequencing
- Author
-
Jurgen Del Favero, Lien Heyrman, Anand Sethuraman, Primo Baybayan, Steve Kujawa, and Kevin Eng
- Subjects
Genetics ,Cancer Research ,Oncology ,Somatic cell ,Industry standard ,Cancer gene ,Digital polymerase chain reaction ,Multiplex ,Amplicon ,Biology ,Highly sensitive ,Single molecule real time sequencing - Abstract
Next-Generation Sequencing (NGS) technologies allow for molecular profiling of cancer samples with high sensitivity and speed at reduced cost. For efficient profiling of cancer samples, it is important that the NGS methods used are not only robust but also capable of accurately detecting low-frequency somatic mutations. Single Molecule, Real-Time (SMRT®) Sequencing offers several advantages, including the ability to sequence single molecules with very high accuracy (>QV40) using the circular consensus sequencing (CCS) approach. The availability of genetically defined, human genomic reference standards provides an industry standard for the development and quality control of molecular assays for studying cancer variants. Here we characterize SMRT Sequencing for the detection of low-frequency somatic variants using the Quantitative Multiplex DNA Reference Standards from Horizon Diagnostics, combined with amplification of the variants using the Multiplicom Tumor Hotspot MASTR Plus assay. First, we sequenced a reference standard containing precise allelic frequencies from 1% to 24.5% for major oncology targets verified using digital PCR. This reference material recapitulates the complexity of tumor composition and serves as a well-characterized control. The control sample was amplified using the Multiplicom Tumor Hotspot MASTR Plus assay that targets 252 amplicons (121-254 bp) from 26 relevant cancer genes, which includes all 11 variants in the control sample. We also sequenced a second sample containing a series of mixes, each with known mutations, at levels below 10% and down to 0.01%. PCR-amplified targets were sequenced and analyzed using SMRT Sequencing to identify the variants and determine the observed frequency. The random error profile and high-accuracy CCS reads make it possible to accurately detect low-frequency somatic variants. Citation Format: Steve Kujawa, Anand Sethuraman, Kevin Eng, Primo Baybayan, Lien Heyrman, Jurgen Del Favero. Highly sensitive and cost-effective detection of somatic cancer variants using single-molecule, real-time sequencing. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 3646.
- Published
- 2016
- Full Text
- View/download PDF
36. Abstract 3611: SMRT® sequencing of DNA samples extracted from formalin-fixed and paraffin embedded tissues
- Author
-
Kevin Eng, Steve Kujawa, Primo Baybayan, Guillaume Durin, and Michael Weiand
- Subjects
Cancer Research ,Computational biology ,Biology ,Amplicon ,Genome ,DNA sequencing ,Paraffin embedded ,genomic DNA ,chemistry.chemical_compound ,Oncology ,chemistry ,A-DNA ,DNA ,Single molecule real time sequencing - Abstract
Recent advances in next generation sequencing have led to the increased use of formalin-fixed and paraffin-embedded (FFPE) tissues for medical samples in disease and scientific research. Single Molecule Real-Time (SMRT®) sequencing offers a unique advantage in that it allows direct analysis of FFPE samples without amplification. However, obtaining ample long read information from FFPE samples has been a challenge due to the quality and quantity of the extracted DNA. DNA samples extracted from FFPE often contain damaged sites, including breaks in the backbone and missing or altered nucleotide bases, which directly impact sequencing and amplification. Additionally, the quality and quantity of the recovered DNA also vary depending on the extraction methods used. We have evaluated the Adaptive Focused Acoustics (AFA™) system by Covaris® as a method for obtaining high molecular weight DNA suitable for SMRTbell template preparation and subsequent single molecule sequencing. Using this method, genomic DNA was extracted from normal kidney FFPE scrolls acquired from Cooperative Human Tissue Network (CHTN), University of Pennsylvania. Damaged sites present in the extracted DNA were repaired using a DNA Damage Repair step, and the treated DNA was constructed into SMRTbell libraries suitable for sequencing on the RSII System. Using the same repaired DNA, we also tested PCR efficiency of target gene regions of up to 5 kb. The resulting amplicons were constructed into SMRTbell templates for full-length sequencing on the RS II System. We found the Adaptive Focused Acoustics (AFA™) system by Covaris® to be effective and efficient. This system is easy and simple to use, and the resulting DNA is compatible with SMRTbell™ library preparation for targeted and whole genome SMRT sequencing. The data presented here demonstrates single molecule sequencing of DNA samples extracted from tissues embedded in FFPE. Citation Format: Primo Baybayan, Michael Weiand, Kevin Eng, Guillaume Durin, Steve Kujawa. SMRT® sequencing of DNA samples extracted from formalin-fixed and paraffin embedded tissues. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 3611.
- Published
- 2016
- Full Text
- View/download PDF
37. Comparing the genomes of Helicobacter pylori clinical strain UM032 and Mice-adapted derivatives
- Author
-
Vellaya Rehvathy, Yalda Khosravi, Khean-Lee Goh, Meredith Ashby, Mun Fai Loke, Shih Wee Seow, Siddarth Singh, Eng Guan Chua, Alfred Tay, Jamuna Vadivelu, Primo Baybayan, Barry J. Marshall, Arlaine Anne Amoyo, Tim Perkins, Sven Pettersson, Junxian Ong, Susana Wang, Wei Yee Wee, and Siew Woh Choo
- Subjects
medicine.medical_specialty ,Inflammation ,Bioinformatics ,Microbiology ,Genome ,Pathogenesis ,Medical microbiology ,Virology ,medicine ,Clinical H. pylori ,biology ,Helicobacter pylori ,business.industry ,Strain (biology) ,Gastroenterology ,biology.organism_classification ,Mice-adapted ,PacBio Single Molecule ,Infectious Diseases ,Parasitology ,Real-Time (SMRT) technology ,Genome Announcement ,medicine.symptom ,business ,Bacteria - Abstract
Background Helicobacter pylori is a Gram-negative bacterium that persistently infects the human stomach inducing chronic inflammation. The exact mechanisms of pathogenesis are still not completely understood. Although not a natural host for H. pylori, mouse infection models play an important role in establishing the immunology and pathogenicity of H. pylori. In this study, for the first time, the genome sequences of clinical H. pylori strain UM032 and mice-adapted derivatives, 298 and 299, were sequenced using the PacBio Single Molecule, Real-Time (SMRT) technology. Result Here, we described the single contig which was achieved for UM032 (1,599,441 bp), 298 (1,604,216 bp) and 299 (1,601,149 bp). Preliminary analysis suggested that methylation of H. pylori genome through its restriction modification system may be determinative of its host specificity and adaptation. Conclusion Availability of these genomic sequences will aid in enhancing our current level of understanding the host specificity of H. pylori.
- Published
- 2013
38. Ordered shotgun sequencing of a 135 kb Xq25 YAC containing ANT2 and four possible genes, including three confirmed by EST matches
- Author
-
Ramaiah Nagaraja, Primo Baybayan, Ellson Y. Chen, Ying Su, Chun-Nan Chen, Richard Mazzarella, David Schlessinger, and Aleli Siruno
- Subjects
X Chromosome ,Sequence analysis ,Molecular Sequence Data ,Biology ,Polymerase Chain Reaction ,Insert (molecular biology) ,Sequence-tagged site ,Complementary DNA ,Genetics ,Humans ,Cloning, Molecular ,Gene ,Chromosomes, Artificial, Yeast ,Sequence (medicine) ,DNA Primers ,Repetitive Sequences, Nucleic Acid ,Sequence Tagged Sites ,Shotgun sequencing ,Lambda phage ,biology.organism_classification ,Bacteriophage lambda ,Female ,Sequence Analysis ,Software ,Research Article - Abstract
Ordered shotgun sequencing (OSS) has been successfully carried out with an Xq25 YAC substrate. yWXD703 DNA was subcloned into lambda phage and sequences of insert ends of the lambda subclones were used to generate a map to select a minimum tiling path of clones to be completely sequenced. The sequence of 135 038 nt contains the entire ANT2 cDNA as well as four other candidates suggested by computer-assisted analyses. One of the putative genes is homologous to a gene implicated in Graves' disease and it, ANT2 and two others are confirmed by EST matches. The results suggest that OSS can be applied to YACs in accord with earlier simulations and further indicate that the sequence of the YAC accurately reflects the sequence of uncloned human DNA.
- Published
- 1996
39. Mutations in GPC3, a glypican gene, cause the Simpson-Golabi-Behmel overgrowth syndrome
- Author
-
Rhiannon M. Hughes-Benzie, Giuseppe Pilia, Reid Huber, Alex MacKenzie, Giovanni Neri, Primo Baybayan, Antonino Forabosco, Ellson Y. Chen, David Schlessinger, and Antonio Cao
- Subjects
Male ,Glypican ,X Chromosome ,Genetic Linkage ,Molecular Sequence Data ,Chromosome Disorders ,Biology ,Glypican 3 ,Glypican 4 ,Translocation, Genetic ,Cell Line ,Mice ,Gene mapping ,Glypicans ,Insulin-Like Growth Factor II ,Genetics ,medicine ,Tumor Cells, Cultured ,Animals ,Humans ,Abnormalities, Multiple ,Amino Acid Sequence ,Cloning, Molecular ,X chromosome ,Growth Disorders ,DNA Primers ,Chromosome Aberrations ,Autosome ,Base Sequence ,Sequence Homology, Amino Acid ,Chromosome Mapping ,Simpson–Golabi–Behmel syndrome ,Syndrome ,medicine.disease ,Molecular biology ,Pedigree ,Chromosomes, Human, Pair 1 ,Overgrowth syndrome ,Immunologic Techniques ,Female ,Proteoglycans ,Heparitin Sulfate ,Chromosomes, Human, Pair 16 ,Gene Deletion ,Heparan Sulfate Proteoglycans ,HeLa Cells ,Protein Binding - Abstract
Simpson-Golabi-Behmel syndrome (SGBS) is an X-linked condition characterized by pre- and postnatal overgrowth with visceral and skeletal anomalies. To identify the causative gene, breakpoints in two female patients with X;autosome translocations were identified. The breakpoints occur near the 5' and 3' ends of a gene, GPC3, that spans more than 500 kilobases in Xq26; in three families, different microdeletions encompassing exons cosegregate with SGBS. GPC3 encodes a putative extracellular proteoglycan, glypican 3, that is inferred to play an important role in growth control in embryonic mesodermal tissues in which it is selectively expressed. Initial western- and ligand-blotting experiments suggest that glypican 3 forms a complex with insulin-like growth factor 2 (IGF2), and might thereby modulate IGF2 action.
- Published
- 1996
40. Abstract 1159: Single molecule sequencing to detect and characterize somatic mutations in cancer genomes
- Author
-
Primo Baybayan, Benson Chau, Chen-Shan Chin, Rachel Maupin, Kevin Travers, Vince Magrini, Michael D. McLellan, John Eid, Todd Wylie, Jason Londry, and Elaine R. Mardis
- Subjects
Cancer genome sequencing ,Genetics ,Cancer Research ,education.field_of_study ,Population ,Cancer ,Context (language use) ,Biology ,medicine.disease ,Genome ,genomic DNA ,Oncology ,medicine ,education ,Gene ,Single molecule real time sequencing - Abstract
One result of large-scale cancer genome characterization is the identification of sets of genes that are commonly mutated in specific tumor types or subtypes and have clinical relevance, e.g. are prognostic or diagnostic. In this paradigm, we have investigated the use of a novel, single molecular real time (SMRT) sequencing technology from Pacific Biosciences that enables targeted regions to be sequenced in real time as single molecules in a mixed population. Several experiments were performed to evaluate the performance of this technology in the context of testing for cancer-specific mutations in previously characterized samples. In the first experiment we assessed whether SMRT sequencing could detect the known mutations of PCR products derived from genomic DNA of tumor cells compared to normal cells. In the second experiment we investigated the impact of different neoplastic cellularity percentages on the ability to detect known mutations. The final experiment involved producing deep read count SMRT sequencing data from PCR products containing known variants to ascertain their different levels of prevalence in a discrete tumor cell population, and then comparing these results to deep read counts for the same variants obtained with the Illumina instrument. Our results indicate that the Pacific Biosciences instrument offers exquisite sensitivity and speed in detecting somatic single base mutations in tumor-derived genomic DNAs. Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 101st Annual Meeting of the American Association for Cancer Research; 2010 Apr 17-21; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2010;70(8 Suppl):Abstract nr 1159.
- Published
- 2010
- Full Text
- View/download PDF
41. Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum genome through long-read (>11 kb), single molecule, real-time sequencing
- Author
-
Michael C. Schatz, Shruthi S. Vembar, Melissa Smith, Artur Scherf, Christine C. Lambert, Primo Baybayan, Matthew Seetin, and Maria Nattestad
- Subjects
0106 biological sciences ,0301 basic medicine ,Cancer genome sequencing ,Plasmodium falciparum ,Sequence assembly ,de novo assembly ,Biology ,01 natural sciences ,Genome ,DNA sequencing ,Structural variation ,Contig Mapping ,03 medical and health sciences ,AT-biased ,Genetics ,Molecular Biology ,Exome sequencing ,Whole genome sequencing ,Polymorphism, Genetic ,structural variation ,Sequence Analysis, DNA ,General Medicine ,Telomere ,Full Papers ,3. Good health ,030104 developmental biology ,long-read sequencing ,Genome, Protozoan ,010606 plant biology & botany ,Reference genome - Abstract
The application of next-generation sequencing to estimate genetic diversity of Plasmodium falciparum, the most lethal malaria parasite, has proved challenging due to the skewed AT-richness [∼80.6% (A + T)] of its genome and the lack of technology to assemble highly polymorphic subtelomeric regions that contain clonally variant, multigene virulence families (Ex: var and rifin). To address this, we performed amplification-free, single molecule, real-time sequencing of P. falciparum genomic DNA and generated reads of average length 12 kb, with 50% of the reads between 15.5 and 50 kb in length. Next, using the Hierarchical Genome Assembly Process, we assembled the P. falciparum genome de novo and successfully compiled all 14 nuclear chromosomes telomere-to-telomere. We also accurately resolved centromeres [∼90–99% (A + T)] and subtelomeric regions and identified large insertions and duplications that add extra var and rifin genes to the genome, along with smaller structural variants such as homopolymer tract expansions. Overall, we show that amplification-free, long-read sequencing combined with de novo assembly overcomes major challenges inherent to studying the P. falciparum genome. Indeed, this technology may not only identify the polymorphic and repetitive subtelomeric sequences of parasite populations from endemic areas but may also evaluate structural variation linked to virulence, drug resistance and disease transmission.
- Full Text
- View/download PDF
42. Accurate whole human genome sequencing using reversible terminator chemistry
- Author
-
Zoya Kingsbury, Marc Laurent, Jason Bryant, Konstantinos D. Diakoumakos, Klaus Maisinger, Louise Fraser, Jean Ernest Sohna Sohna, Adrian Horgan, Patrick Mccauley, Jane Rogers, David W. Elmore, Mark A. Osborne, Juying Yan, Mark Smith, Milan Fedurco, Gary P. Schroth, Belen Dominguez-Fernandez, Heng Li, Andrea Sabot, Suzanne Wakelin, Cindy Lawley, Carole Anastasi, David Klenerman, David George, Daniel P. Pliskin, Mohammed D. Alam, Svilen S. Tzonev, Mark T. Reed, Xiaohai Liu, Asha Boodhun, Lu Zhang, Aylwyn Scally, T. A. Huw Jones, Ugonna C. Egbujor, Tzvetana H. Kerelska, George Stefan Golda, Shankar Balasubramanian, Lukasz Szajkowski, Mitch Lok, Mitch K. Shiver, Paul McNitt, Simon Chang, Maria Q. Johnson, Gyoung-Dong Kang, Victor J. Quijano, Sarah E. Lee, Mike Zuerlein, Maria Candelaria Rogert Bacigalupo, Alan D. Kersey, Selena G. Barbour, Dirk J. Evers, Andrew C. Pike, Stephen Rawlings, Karin Fuentes Fajardo, Mirian S. Karbelashvili, Matthew E. Hurles, Sonia M. Novo, Xavier Lee, James C. Burrows, John Stephen West, Jingwen Wang, Ify C. Aniebo, Natasha R. Crake, Christian D. Haudenschild, Richard Shaw, Come Raczy, W. Scott Furey, Wu Xiaolin, Lambros L. Paraschos, Josefina M. Seoane, John W. Martin, Katya Hoschler, Raquel Maria Sanches-Kuiper, Nick J. McCooke, Colin Barnes, Johannes P. Sluis, Abass A. Bundu, John Milton, R. Keira Cheetham, Nancy F. Hansen, Clive Gavin Brown, Nigel P. Carter, Richard J. Carter, Chiara Rodighiero, Kim B. Stevens, Shujun Luo, Radhika M. Mammen, Phyllida M. Roe, Melanie Anne Smith, Bojan Obradovic, Johnny T. Ho, Jennifer A. Loch, Terena James, Harold Swerdlow, Dale Buermann, David E. Green, Steve Hurwitz, Joe W. Mullens, Ning Sizto, Frank L. Oaks, Eli Rusman, Natalie J. Rourke, Nikolai Romanov, Anthony J. Smith, Claire Bevis, Selene M. Virk, Ling Yau, Yuli Verhovsky, D. Chris Pinkard, Stephanie Vandevondele, Vincent Peter Smith, Rob C. Brown, Eric J. Spence, Joe Podhasky, Ana Chiva Rodriguez, Michael Lawrence Parkinson, Anthony Romieu, Joe S. Brennan, Rithy K. Roth, David Mark Dunstan Bailey, Roberto Rigatti, Anil Kumar, Phillip J. Black, Primo Baybayan, Saibal Banerjee, Matthew M. Hims, Arnold Liao, R. Neil Cooley, Omead Ostadan, Vincent A. Benoit, Andrew A. Brown, Silke Ruediger, Leslie J. Irving, Parul Mehta, James C. Mullikin, Klaudia Walter, John Rogers, Jonathan Mark Boutell, Alex P. Kindwall, Paula Kokko-Gonzales, Alger C. Pike, Michael J. O'Neill, Eric Vermaas, Subramanian V. Sankar, Sean Humphray, Steven W. Short, Gerardo Turcatti, Helen Bignell, Kimberley J. Gietzen, Peta E. Torrance, Narinder I. Heyer, David James Earnshaw, Kevin Hall, Martin R. Schenker, Richard Durbin, Philip A. Granieri, Tobias William Barr Ost, Iain R. Bancarz, Lea Pickering, David L. Gustafson, Peter Lundberg, Niall Anthony Gormley, John Bridgham, Andrew Osnowski, Scott M. Kirk, Mark R. Ewan, Keith W. Moon, Bee Ling Ng, Graham John Worsley, Anthony J. Cox, Olubunmi O. Dada, Gregory C. Walcott, Sergey Etchin, Irina Khrebtukova, Kevin Benson, Vicki H. Rae, Zemin Ning, Carolyn Tregidgo, Nestor Castillo, Colin P. Goddard, Taksina Newington, Denis V. Ivanov, Anastassia Spiridou, Maria Chiara E. Catenazzi, Neil Sutton, Kevin Harnish, Darren James Ellis, Lisa Murray, Geoffrey Paul Smith, Mark T. Ross, David R. Bentley, M. R. Pratt, Isabelle Rasolonjatovo, and Michael R. Flatbush
- Subjects
Male ,Genotype ,2 base encoding ,Nigeria ,Sequence assembly ,Hybrid genome assembly ,Genomics ,Computational biology ,Biology ,Polymorphism, Single Nucleotide ,Sensitivity and Specificity ,Deep sequencing ,Article ,03 medical and health sciences ,0302 clinical medicine ,Consensus Sequence ,Humans ,Paired-end tag ,030304 developmental biology ,Genetics ,Whole genome sequencing ,Chromosomes, Human, X ,0303 health sciences ,Multidisciplinary ,Genome, Human ,DNA sequencing theory ,Sequence Analysis, DNA ,030220 oncology & carcinogenesis - Abstract
DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400-800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.
43. The complete methylome of Helicobacter pylori UM032
- Author
-
Chin Yen Tay, Susana Wang, Richard J. Roberts, Brian P. Anton, Woon Ching Lee, Mun Fai Loke, Khean-Lee Goh, Eng Guan Chua, Primo Baybayan, Barry J. Marshall, Jamuna Vadivelu, Meredith Ashby, Siddarth Singh, and Fanny Thirriot
- Subjects
Methyltransferase ,Biology ,Genome ,DNA sequencing ,User-Computer Interface ,03 medical and health sciences ,Bacterial Proteins ,Genetics ,Gene ,030304 developmental biology ,Internet ,0303 health sciences ,Base Sequence ,Helicobacter pylori ,030306 microbiology ,High-Throughput Nucleotide Sequencing ,DNA Restriction Enzymes ,Methyltransferases ,Sequence Analysis, DNA ,Methylation ,DNA Methylation ,DNA methylation ,DNA microarray ,Sequence motif ,Genome, Bacterial ,Research Article ,Biotechnology - Abstract
The genome of the human gastric pathogen Helicobacter pylori encodes a large number of DNA methyltransferases (MTases), some of which are shared among many strains, and others of which are unique to a given strain. The MTases have potential roles in the survival of the bacterium. In this study, we sequenced a Malaysian H. pylori clinical strain, designated UM032, by using a combination of PacBio Single Molecule, Real-Time (SMRT) and Illumina MiSeq next generation sequencing platforms, and used the SMRT data to characterize the set of methylated bases (the methylome). The N4-methylcytosine and N6-methyladenine modifications detected at single-base resolution using SMRT technology revealed 17 methylated sequence motifs corresponding to one Type I and 16 Type II restriction-modification (R-M) systems. Previously unassigned methylation motifs were now assigned to their respective MTases-coding genes. Furthermore, one gene that appears to be inactive in the H. pylori UM032 genome during normal growth was characterized by cloning. Consistent with previously-studied H. pylori strains, we show that strain UM032 contains a relatively large number of R-M systems, including some MTase activities with novel specificities. Additional studies are underway to further elucidating the biological significance of the R-M systems in the physiology and pathogenesis of H. pylori.
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.