32 results on '"David R. Rank"'
Search Results
2. Highly accurate long-read HiFi sequencing data for five complex genomes
- Author
-
Steven J. Knapp, Jane M. Landolin, Nicholas Maurer, David Kudrna, David R. Rank, Beth Shapiro, Doreen Ware, Joseph W. Karalius, Ting Hon, Paul Peluso, Greg Young, Kristin Mars, Cynthia C. Steiner, Yu-Chih Tsai, and Michael A. Hardigan
- Subjects
0106 biological sciences ,Statistics and Probability ,Data Descriptor ,Ranidae ,Sequence analysis ,Computer science ,Sequencing data ,Sequence assembly ,Genomics ,Computational biology ,Biology ,Library and Information Sciences ,01 natural sciences ,Genome ,Data publication and archiving ,Zea mays ,Fragaria ,Education ,03 medical and health sciences ,Mice ,0302 clinical medicine ,Data sequences ,Polyploid ,Genome assembly algorithms ,Genetics ,Animals ,lcsh:Science ,030304 developmental biology ,0303 health sciences ,Haplotype ,Human Genome ,Structural variant ,High-Throughput Nucleotide Sequencing ,Sequence Analysis, DNA ,DNA ,Plant ,Computer Science Applications ,Metagenomics ,Next-generation sequencing ,Metagenome ,lcsh:Q ,Generic health relevance ,Statistics, Probability and Uncertainty ,Sequence Analysis ,030217 neurology & neurosurgery ,Genome, Plant ,010606 plant biology & botany ,Information Systems - Abstract
The PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10–25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System., Measurement(s) DNA • genome • Metagenome Technology Type(s) DNA sequencing • PacBio Sequel System Factor Type(s) organism that had its genome sequenced Sample Characteristic - Organism Mus musculus • Rana muscosa • Fragaria x ananassa • Zea mays Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12855527
- Published
- 2020
- Full Text
- View/download PDF
3. Fully Phased Sequence of a Diploid Human Genome Determined de Novo from the DNA of a Single Individual
- Author
-
Matthew Sooknah, Nicole L. Fong, Paul Peluso, Margaret Ann Roy, Gregory T. Concepcion, Nelda Yi, Andrea T. Ireland, Irene Lam, J. Graham Ruby, David R. Rank, Jonathan S Paw, Llya Soifer, Alex Hastie, Vladimir Jojic, and David Botstein
- Subjects
haplotype ,phased genome ,0303 health sciences ,Haplotype ,Sequence assembly ,phased sequencing ,Genomics ,Computational biology ,QH426-470 ,Biology ,Genome ,03 medical and health sciences ,0302 clinical medicine ,human genome ,chromosome sorting ,Genetics ,Homologous chromosome ,Human genome ,Ploidy ,Molecular Biology ,030217 neurology & neurosurgery ,Genetics (clinical) ,030304 developmental biology ,Reference genome - Abstract
In recent years, improved sequencing technology and computational tools have made de novo genome assembly more accessible. Many approaches, however, generate either an unphased or only partially resolved representation of a diploid genome, in which polymorphisms are detected but not assigned to one or the other of the homologous chromosomes. Yet chromosomal phase information is invaluable for the understanding of phenotypic trait inheritance in the cases of compound heterozygosity, allele-specific expression or cis-acting variants. Here we use a combination of tools and sequencing technologies to generate a de novo diploid assembly of the human primary cell line WI-38. First, data from PacBio single molecule sequencing and Bionano Genomics optical mapping were combined to generate an unphased assembly. Next, 10x Genomics linked reads were combined with the hybrid assembly to generate a partially phased assembly. Lastly, we developed and optimized methods to use short-read (Illumina) sequencing of flow cytometry-sorted metaphase chromosomes to provide phase information. The final genome assembly was almost fully (94%) phased with the addition of approximately 2.5-fold coverage of Illumina data from the sequenced metaphase chromosomes. The diploid nature of the final de novo genome assembly improved the resolution of structural variants between the WI-38 genome and the human reference genome. The phased WI-38 sequence data are available for browsing and download at wi38.research.calicolabs.com. Our work shows that assembling a completely phased diploid genome de novo from the DNA of a single individual is now readily achievable.
- Published
- 2020
- Full Text
- View/download PDF
4. Blueprint for Phasing and Assembling the Genomes of Heterozygous Polyploids: Application to the Octoploid Genome of Strawberry
- Author
-
Christopher A. Saski, Randi A. Famula, Michaela V. Vachev, Alan E. Yocca, Paul Peluso, Glenn S. Cole, Michael A. Hardigan, Mitchell J. Feldmann, Kristin Mars, David R. Rank, Adrian E. Platts, Mary A. Madera, Shujun Ou, Philipp Zerbe, Steven J. Knapp, Charlotte B. Acharya, Dominique D A Pincot, and Patrick P. Edger
- Subjects
Genetics ,Polyploid ,Haplotype ,Chromosome ,Allele ,Ploidy ,Biology ,Gene ,Genome ,Reference genome - Abstract
The challenge of allelic diversity for assembling haplotypes is exemplified in polyploid genomes containing homoeologous chromosomes of identical ancestry, and significant homologous variation within their ancestral subgenomes. Cultivated strawberry (Fragaria × ananassa) and its wild progenitors are outbred octoploids (2n = 8x = 56) in which up to eight homologous and homoeologous alleles are preserved. This introduces significant risk of haplotype collapse, switching, and chimeric fusions during assembly. Using third generation HiFi sequences from PacBio, we assembled the genome of the day-neutral octoploid F. × ananassa hybrid ‘Royal Royce’ from the University of California. Our goal was to produce subgenome-and haplotype-resolved assemblies of all 56 chromosomes, accurately reconstructing the parental haploid chromosome complements. Previous work has demonstrated that partitioning sequences by parental phase supports direct assembly of haplotypes in heterozygous diploid species. We leveraged the accuracy of HiFi sequence data with pedigree-informed sequencing to partition long read sequences by phase, and reduce the downstream risk of subgenomic chimeras during assembly. We were able to utilize an octoploid strawberry recombination breakpoint map containing 3.6 M variants to identify and break chimeric junctions, and perform scaffolding of the phase-1 and phase-2 octoploid assemblies. The N50 contiguity of the phase-1 and phase-2 assemblies prior to scaffolding and gap-filling was 11 Mb. The final haploid assembly represented seven of 28 chromosomes in a single contiguous sequence, and averaged fewer than three gaps per pseudomolecule. Additionally, we re-annotated the octoploid genome to produce a custom F. × ananassa repeat library and improved set of gene models based on IsoSeq transcript data and an expansive RNA-seq expression atlas. Here we present ‘FaRR1’, a gold-standard reference genome of F. × ananassa cultivar ‘Royal Royce’ to assist future genomic research and molecular breeding of allo-octoploid strawberry.
- Published
- 2021
- Full Text
- View/download PDF
5. A conditional density error model for the statistical analysis of microarray data.
- Author
-
Brad Love, David R. Rank, Sharron G. Penn, David A. Jenkins, and Russell S. Thomas
- Published
- 2002
- Full Text
- View/download PDF
6. A draft phased assembly of the diploid Cascade hop ( Humulus lupulus ) genome
- Author
-
Justin Elser, Gregory T. Concepcion, Daniel Moore, Paul Peluso, Brent A. Kronmiller, Pankaj Jaiswal, David A. Hendrix, Sarah B. Kingan, David R. Rank, Lillian K. Padgitt-Cobb, John A. Henning, and Jackson Wells
- Subjects
0106 biological sciences ,0301 basic medicine ,Humulus lupulus ,lcsh:QH426-470 ,Retrotransposon ,Genomics ,Plant Science ,Computational biology ,lcsh:Plant culture ,Biology ,01 natural sciences ,Genome ,Hop (networking) ,03 medical and health sciences ,Genetics ,lcsh:SB1-1110 ,Humulus ,Gene ,food and beverages ,Cascade hop ,biology.organism_classification ,Diploidy ,lcsh:Genetics ,030104 developmental biology ,Haplotypes ,Ploidy ,Agronomy and Crop Science ,Genome, Plant ,010606 plant biology & botany - Abstract
Hop (Humulus lupulus L. var Lupulus) is a diploid, dioecious plant with a history of cultivation spanning more than one thousand years. Hop cones are valued for their use in brewing and contain compounds of therapeutic interest including xanthohumol. Efforts to determine how biochemical pathways responsible for desirable traits are regulated have been challenged by the large (2.8 Gb), repetitive, and heterozygous genome of hop. We present a draft haplotype‐phased assembly of the Cascade cultivar genome. Our draft assembly and annotation of the Cascade genome is the most extensive representation of the hop genome to date. PacBio long‐read sequences from hop were assembled with FALCON and partially phased with FALCON‐Unzip. Comparative analysis of haplotype sequences provides insight into selective pressures that have driven evolution in hop. We discovered genes with greater sequence divergence enriched for stress‐response, growth, and flowering functions in the draft phased assembly. With improved resolution of long terminal retrotransposons (LTRs) due to long‐read sequencing, we found that hop is over 70% repetitive. We identified a homolog of cannabidiolic acid synthase (CBDAS) that is expressed in multiple tissues. The approaches we developed to analyze the draft phased assembly serve to deepen our understanding of the genomic landscape of hop and may have broader applicability to the study of other large, complex genomes.
- Published
- 2021
- Full Text
- View/download PDF
7. A phased, diploid assembly of the Cascade hop (Humulus lupulus) genome reveals patterns of selection and haplotype variation
- Author
-
Lillian K. Padgitt-Cobb, Paul Peluso, Daniel Moore, David A. Hendrix, Pankaj Jaiswal, Sarah B. Kingan, John A. Henning, Justin Elser, Gregory T. Concepcion, Brent A. Kronmiller, Jackson Wells, and David R. Rank
- Subjects
0106 biological sciences ,0303 health sciences ,Humulus lupulus ,biology ,Haplotype ,food and beverages ,Context (language use) ,Retrotransposon ,biology.organism_classification ,01 natural sciences ,Genome ,3. Good health ,Hop (networking) ,Structural variation ,03 medical and health sciences ,Evolutionary biology ,Gene ,030304 developmental biology ,010606 plant biology & botany - Abstract
Hop (Humulus lupulus L. var Lupulus) is a diploid, dioecious plant with a history of cultivation spanning more than one thousand years. Hop cones are valued for their use in brewing, and around the world, hop has been used in traditional medicine to treat a variety of ailments. Efforts to determine how biochemical pathways responsible for desirable traits are regulated have been challenged by the large, repetitive, and heterozygous genome of hop. We present the first report of a haplotype-phased assembly of a large plant genome. Our assembly and annotation of the Cascade cultivar genome is the most extensive to date. PacBio long-read sequences from hop were assembled with FALCON and phased with FALCON-Unzip. Using the diploid assembly to assess haplotype variation, we discovered genes under positive selection enriched for stress-response, growth, and flowering functions. Comparative analysis of haplotypes provides insight into large-scale structural variation and the selective pressures that have driven hop evolution. Previous studies estimated repeat content at around 60%. With improved resolution of long terminal retrotransposons (LTRs) due to long-read sequencing, we found that hop is nearly 78% repetitive. Our quantification of repeat content provides context for the size of the hop genome, and supports the hypothesis of whole genome duplication (WGD), rather than expansion due to LTRs. With our more complete assembly, we have identified a homolog of cannabidiolic acid synthase (CBDAS) that is expressed in multiple tissues. The approaches we developed to analyze a phased, diploid assembly serve to deepen our understanding of the genomic landscape of hop and may have broader applicability to the study of other large, complex genomes.
- Published
- 2019
- Full Text
- View/download PDF
8. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome
- Author
-
Armin Töpfer, Justin M. Zook, Heng Li, Gregory T. Concepcion, Medhat Mahmoud, Paul Peluso, Andrew Carroll, Aaron M. Wenger, Nathan D. Olson, Alexander Kolesnikov, Michael Alonge, Arkarachai Fungtammasan, Adam M. Phillippy, Michael C. Schatz, David R. Rank, Jue Ruan, Sergey Koren, Fritz J. Sedlazeck, Pi-Chuan Chang, Yufeng Qian, Gene Myers, William J Rowell, Mark A. DePristo, Richard Hall, Tobias Marschall, Chen-Shan Chin, Michael W. Hunkapiller, and Jana Ebler
- Subjects
Computer science ,Biomedical Engineering ,Sequence assembly ,Bioengineering ,Genomics ,Third generation sequencing ,Computational biology ,Applied Microbiology and Biotechnology ,Genome ,DNA sequencing ,Article ,03 medical and health sciences ,0302 clinical medicine ,Humans ,Base sequence ,030304 developmental biology ,0303 health sciences ,Base Sequence ,Genome, Human ,Genetic Variation ,High-Throughput Nucleotide Sequencing ,Sequence Analysis, DNA ,Haplotypes ,Molecular Medicine ,Human genome ,DNA, Circular ,030217 neurology & neurosurgery ,Biotechnology - Abstract
The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions 15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.
- Published
- 2019
- Full Text
- View/download PDF
9. Highly-accurate long-read sequencing improves variant detection and assembly of a human genome
- Author
-
Gene Myers, Mark A. DePristo, Michael Alonge, Richard Hall, Michael W. Hunkapiller, Fritz J. Sedlazeck, Paul Peluso, Jana Ebler, Aaron M. Wenger, Pi-Chuan Chang, Sergey Koren, Alexander Kolesnikov, Medhat Mahmoud, Justin M. Zook, Yufeng Qian, Arkarachai Fungtammasan, Nathan D. Olson, Michael C. Schatz, David R. Rank, Jue Ruan, Tobias Marschall, Heng Li, William J Rowell, Chen-Shan Chin, Andrew Carroll, Adam M. Phillippy, Armin Töpfer, and Gregory T. Concepcion
- Subjects
0303 health sciences ,Contig ,Computer science ,Haplotype ,Sequence assembly ,Computational biology ,Genome ,DNA sequencing ,03 medical and health sciences ,0302 clinical medicine ,Human genome ,Precision and recall ,Indel ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
The major DNA sequencing technologies in use today produce either highly-accurate short reads or noisy long reads. We developed a protocol based on single-molecule, circular consensus sequencing (CCS) to generate highly-accurate (99.8%) long reads averaging 13.5 kb and applied it to sequence the well-characterized human HG002/NA24385. We optimized existing tools to comprehensively detect variants, achieving precision and recall above 99.91% for SNVs, 95.98% for indels, and 95.99% for structural variants. We estimate that 2,434 discordances are correctable mistakes in the high-quality Genome in a Bottle benchmark. Nearly all (99.64%) variants are phased into haplotypes, which further improves variant detection. De novo assembly produces a highly contiguous and accurate genome with contig N50 above 15 Mb and concordance of 99.998%. CCS reads match short reads for small variant detection, while enabling structural variant detection and de novo assembly at similar contiguity and markedly higher concordance than noisy long reads.
- Published
- 2019
- Full Text
- View/download PDF
10. Phased diploid genome assembly with single-molecule real-time sequencing
- Author
-
Grant R. Cramer, Chongyuan Luo, Maria Nattestad, Fritz J. Sedlazeck, Rosa Figueroa-Balderas, Ronan C. O'Malley, Dario Cantu, Massimo Delledonne, Gregory T. Concepcion, Michael C. Schatz, David R. Rank, Abraham Morales-Cruz, Alicia Clum, Joseph R. Ecker, Chen-Shan Chin, Christopher Dunn, and Paul Peluso
- Subjects
0106 biological sciences ,0301 basic medicine ,Heterozygote ,DNA, Plant ,pacbio ,Arabidopsis ,Sequence assembly ,Computational biology ,Biology ,Polymorphism, Single Nucleotide ,01 natural sciences ,Biochemistry ,Genome ,Article ,Structural variation ,03 medical and health sciences ,vitis vinifera ,0302 clinical medicine ,Homologous chromosome ,Humans ,Arabidopsis thaliana ,Vitis ,DNA, Fungal ,Molecular Biology ,030304 developmental biology ,Genetics ,0303 health sciences ,Diploid genome ,Basidiomycota ,fungi ,Haplotype ,Genomics ,Sequence Analysis, DNA ,Cell Biology ,biology.organism_classification ,Diploidy ,genome sequencing ,030104 developmental biology ,Haplotypes ,Genome, Fungal ,Ploidy ,Algorithms ,Genome, Plant ,030217 neurology & neurosurgery ,genome sequencing, vitis vinifera, pacbio ,010606 plant biology & botany ,Biotechnology ,Single molecule real time sequencing - Abstract
While genome assembly projects have been successful in a number of haploid or inbred species, one of the current main challenges is assembling non-inbred or rearranged heterozygous genomes. To address this critical need, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble Single Molecule Real-Time (SMRT®) Sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We demonstrate the quality of this approach by assembling new reference sequences for three heterozygous samples, including an F1 hybrid of the model species Arabidopsis thaliana, the widely cultivated V. vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata that have challenged short-read assembly approaches. The FALCON-based assemblies were substantially more contiguous and complete than alternate short or long-read approaches. The phased diploid assembly enabled the study of haplotype structures and heterozygosities between the homologous chromosomes, including identifying widespread heterozygous structural variations within the coding sequences.
- Published
- 2016
- Full Text
- View/download PDF
11. Improved maize reference genome with single-molecule technologies
- Author
-
Xuehong Wei, Tiffany Y. Liang, Nathan M. Springer, Yinping Jiao, Doreen Ware, Thomas K. Wolfgruber, Jonathan I. Gent, Michael D. McMullen, Andrew Olson, Michelle C. Stitzer, Joshua C. Stein, Bo Wang, Paul Peluso, Chen-Shan Chin, Jinghua Shi, Eric Antoniou, Jeffrey Ross-Ibarra, Kevin L. Schneider, W. Richard McCombie, R. Kelly Dawe, Michael Regulski, Katherine E. Guill, Alex Hastie, Sunita Kumari, Gernot G. Presting, Michael R. May, Michael S. Campbell, and David R. Rank
- Subjects
0301 basic medicine ,Optics and Photonics ,Messenger ,Genome informatics ,Genome ,Contig Mapping ,Phylogeny ,2. Zero hunger ,Genetics ,Multidisciplinary ,Contig ,High-Throughput Nucleotide Sequencing ,Reference Standards ,Single Molecule Imaging ,DNA, Intergenic ,Genome, Plant ,Crops, Agricultural ,Transposable element ,General Science & Technology ,1.1 Normal biological development and functioning ,Centromere ,Crops ,Computational biology ,Biology ,Genes, Plant ,Zea mays ,Chromosomes, Plant ,Article ,Chromosomes ,03 medical and health sciences ,Gene density ,RNA, Messenger ,Gene ,Sorghum ,Agricultural ,Intergenic ,Human Genome ,Molecular Sequence Annotation ,Gene Annotation ,Plant ,DNA ,030104 developmental biology ,Genes ,DNA Transposable Elements ,RNA ,Plant sciences ,Reference genome - Abstract
An improved reference genome for maize, using single-molecule sequencing and high-resolution optical mapping, enables characterization of structural variation and repetitive regions, and identifies lineage expansions of transposable elements that are unique to maize. Supplementary information The online version of this article (doi:10.1038/nature22971) contains supplementary material, which is available to authorized users., A better map of the maize genome The maize genome was initially reported in 2009 but with some accuracy limitations. Doreen Ware and colleagues report a new reference genome for maize using single-molecule sequencing and high-resolution optical mapping. The technique shows improvements in the gene space including resolution of gaps and misassemblies and correction of order and orientation of genes. The authors characterize structural variation and repetitive regions, and identify transposable element lineage expansions unique to maize. Supplementary information The online version of this article (doi:10.1038/nature22971) contains supplementary material, which is available to authorized users., Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation1. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions2. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome3, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing4. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes. Supplementary information The online version of this article (doi:10.1038/nature22971) contains supplementary material, which is available to authorized users.
- Published
- 2017
12. Improved maize reference genome with single molecule technologies
- Author
-
Jinghua Shi, Xuehong Wei, Michael S. Campbell, Eric Antoniou, Katherine E. Guill, Alex Hastie, Michael D. McMullen, Michelle C. Stitzer, Bo Wang, Jeffrey Ross-Ibarra, Nathan M. Springer, Thomas K. Wolfgruber, Gernot G. Presting, Sunita Kumari, Tiffany Y. Liang, Jonathan I. Gent, Chen-Shan Chin, Kelly Dawe, Yinping Jiao, Paul Peluso, Doreen Ware, Michael Regulski, Andrew Olson, Richard W. McCombie, Joshua C. Stein, David R. Rank, Michael R. May, and Kevin L. Schneider
- Subjects
0106 biological sciences ,Transposable element ,0303 health sciences ,Contig ,Genomics ,Computational biology ,Gene Annotation ,Biology ,01 natural sciences ,Genome ,03 medical and health sciences ,Gene density ,030304 developmental biology ,010606 plant biology & botany ,Reference genome ,Single molecule real time sequencing - Abstract
Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate elucidation of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here, we report the assembly and annotation of maize, a genetic and agricultural model species, using Single Molecule Real-Time (SMRT) sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and significant improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed over 130,000 intact transposable elements (TEs), allowing us to identify TE lineage expansions unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by SMRT sequencing. In addition, comparative optical mapping of two other inbreds revealed a prevalence of deletions in the low gene density region and maize lineage-specific genes.
- Published
- 2016
- Full Text
- View/download PDF
13. Sequencing the unsequenceable: Expanded CGG-repeat alleles of the fragile X gene
- Author
-
Paul J. Hagerman, Sarah McCalmon, Erick Loomis, Randi J Hagerman, David R. Rank, Luke Hickey, Jun Yin, Paul Peluso, Flora Tassone, and John Eid
- Subjects
congenital, hereditary, and neonatal diseases and abnormalities ,DNA polymerase ,Molecular Sequence Data ,Method ,DNA sequencing ,Fragile X Mental Retardation Protein ,Genetics ,medicine ,Humans ,Genotyping ,Gene ,Alleles ,Genetics (clinical) ,Base Sequence ,biology ,Sequence Analysis, DNA ,medicine.disease ,FMR1 ,nervous system diseases ,Fragile X syndrome ,Mutation ,biology.protein ,5' Untranslated Regions ,Trinucleotide Repeat Expansion ,Trinucleotide repeat expansion ,Single molecule real time sequencing - Abstract
The human fragile X mental retardation 1 (FMR1) gene contains a (CGG)n trinucleotide repeat that is responsible for a number of heritable disorders affecting both early neurodevelopment and late-onset neurodegeneration (Willemsen et al. 2011; Leehey and Hagerman 2012). The repeat element is located in the 5′ untranslated region (5′UTR) of the gene and is thus transcribed into mRNA but not translated into the FMR1 protein product (FMRP). Expanded alleles in the premutation range (55–200 CGG repeats) result in elevated FMR1 mRNA expression (Tassone et al. 2000) and are associated with a number of disorders including the adult-onset neurodegenerative disorder, fragile X–associated tremor/ataxia syndrome (FXTAS) (Leehey and Hagerman 2012), fragile X–associated premature ovarian insufficiency (FXPOI) (Wittenberger et al. 2007; Sullivan et al. 2011), as well as learning disabilities, autism spectrum disorders, ADHD, and seizures (Farzin et al. 2006; Clifford et al. 2007; Chonchaiya et al. 2012). The molecular pathology of premutation expansion disorders is generally considered to be a toxic RNA gain of function resulting from the expanded CGG-repeat region in the mRNA (Garcia-Arocena and Hagerman 2010; Ross-Inta et al. 2010; Sellier et al. 2010). Alleles in this range also show a propensity to expand beyond 200 repeats (full mutation range) upon maternal transmission, in which case the FMR1 CpG-island promoter generally becomes hypermethylated and transcriptionally silenced (Willemsen et al. 2011). The resultant loss of FMRP expression disrupts early neurodevelopment and leads to fragile X syndrome (FXS), the most common heritable form of cognitive impairment and the most common single-gene mutation associated with autism (Willemsen et al. 2011; Hagerman et al. 2012). CGG-repeat expansions have been the focus of intense research since identification of the gene in 1991 (Verkerk et al. 1991); however, the inability to sequence repeat-expansion alleles in the disease-relevant size range has limited their complete genetic and epigenetic characterization. Indeed, investigators of the original gene-discovery study noted their inability to sequence the CGG repeats, and other early attempts to use sequencing to characterize the repeats describe the inability to fully traverse the region (Hornstra et al. 1993). Whereas PCR and Southern blotting are capable of genotyping repeat expansion alleles on the basis of DNA fragment size (Nolin et al. 2003; Saluto et al. 2005; Filipovic-Sadic et al. 2010), and even identify methylation status and AGG-repeat interruptions (Chen et al. 2010, 2011; Yrigollen et al. 2012), such methods lack the single-nucleotide resolution obtained with DNA sequencing and, more importantly, are severely limited in their ability to detect the presence of minor alleles. Furthermore, because dideoxyribose sequencing strategies (Sanger et al. 1977) and most “next-generation” sequencing technologies (Metzker 2010) rely on reading signal from bulk DNA populations, they are limited by the loss of sequence phase coherence—a particular problem for GC-rich sequence—as well as decreasing size resolution with increasing DNA length. As a consequence, it is generally not possible to sequence FMR1 alleles in excess of ∼100 CGG repeats, a limit that falls well short of the full mutation range that is responsible for fragile X syndrome. A fundamentally different sequencing approach, single-molecule, real-time (SMRT) sequencing, uses zero-mode waveguide (ZMW) nanowells to determine DNA sequence from individual DNA templates (Fig. 1; Eid et al. 2009). This is accomplished through real-time observation of individual nucleotide incorporation events catalyzed by a single DNA polymerase. This approach bypasses critical limitations of previous technologies in the context of highly repetitive sequences such as trinucleotide expansions. In particular, measurement of the signal from isolated molecules overcomes the problems of sample heterogeneity (phase-coherence) and diminishing resolution inherent in bulk sequencing approaches. Since the SMRT sequencing reads are limited only by loss of activity of individual polymerase molecules, single-molecule readlengths approaching 15 kb (average readlengths approaching 3 kb) (Rasko et al. 2011; Sebra et al. 2012) can be attained, with improved sequence accuracy achieved by iteratively sequencing the same SMRTbell circular sequencing template (circular consensus sequencing [CCS]) (Fig. 1; Travers et al. 2010). Figure 1. Schematic representation of SMRT sequencing. DNA polymerase synthesizes a nascent strand complementary to a closed-circular SMRTbell DNA template. Fluorescent phospholinked nucleotides produce real-time fluorescent pulse data for both basecalls and incorporation ... By utilizing the SMRT sequencing approach for the analysis of the CGG-repeat region of the FMR1 gene, we have demonstrated that it is possible to generate sequence data for FMR1 alleles in excess of 750 CGG repeats, which translates to over 2.25 kb of 100% CGG-repeat DNA. We show that this method produces repeat-size distributions reflective of the distributions expected from the input (e.g., cloned or PCR-amplified) DNA. We also demonstrate that by using CCS reads, we are able to identify AGG “interruptions” within the CGG-repeat tract that are of direct medical relevance (Yrigollen et al. 2012). Our approach should be broadly applicable to the analysis of other repetitive elements, particularly those containing high CG content and possessing short homopolymer runs (e.g., the GGGGCC motif in 9p-linked amyotrophic lateral sclerosis-frontal temporal dementia [ALS-FTLD]) (DeJesus-Hernandez et al. 2011; Renton et al. 2011). Finally, SMRT sequencing possesses the unique capability of analyzing the kinetics of individual DNA polymerase molecules, and we show clear, strand-specific transitions within the CGG-repeat region, which should facilitate future studies of differential methylation.
- Published
- 2012
- Full Text
- View/download PDF
14. Real-Time DNA Sequencing from Single Polymerase Molecules
- Author
-
Jeremy Gray, Mark Trulson, Patrick Marks, John Dixon, Ravindra V. Dalal, Fred Christians, Adrian Fehr, Jon M. Sorenson, Stephen Turner, Alfred Gaertner, Sonya Clark, Geoff Otto, Gregory J. Kearns, John Lyle, Alex DeWinter, Brad Bettman, Ronald Kuse, Primo Baybayan, Steven Lin, Denis Zaccarin, John Vieceli, Joy Roy, Cheryl Heiner, David R. Rank, Kevin Travers, Robert Sebra, Mathieu Foquet, Thang Pham, Dawn Wu, Keith Bjornson, Michael Phillips, Arkadiusz Bibillo, Bidhan Chaudhuri, Gene Shen, Alicia Yang, Mark Maxham, Peter Zhao, Khai Luong, Paul Hardenbol, Insil Park, Jonas Korlach, Paul Lundquist, Jeffrey Wegener, Kevin Hester, David P. Holden, Paul Peluso, Congcong Ma, Frank Zhong, Yves Lacroix, Austin B. Tomaney, Devon Murphy, John Eid, Xiangxu Kong, and Ronald L. Cicero
- Subjects
DNA nanoball sequencing ,Deoxyribonucleotides ,DNA, Single-Stranded ,DNA-Directed DNA Polymerase ,Sequencing by hybridization ,Consensus Sequence ,Polymerase ,Fluorescent Dyes ,Multidisciplinary ,DNA clamp ,Base Sequence ,biology ,Oligonucleotide ,Multiple displacement amplification ,DNA ,Sequence Analysis, DNA ,Enzymes, Immobilized ,Molecular biology ,Nanostructures ,Sequencing by ligation ,Kinetics ,Spectrometry, Fluorescence ,biology.protein ,Biophysics ,DNA, Circular ,Single molecule real time sequencing - Abstract
We present single-molecule, real-time sequencing data obtained from a DNA polymerase performing uninterrupted template-directed synthesis using four distinguishable fluorescently labeled deoxyribonucleoside triphosphates (dNTPs). We detected the temporal order of their enzymatic incorporation into a growing DNA strand with zero-mode waveguide nanostructure arrays, which provide optical observation volume confinement and enable parallel, simultaneous detection of thousands of single-molecule sequencing reactions. Conjugation of fluorophores to the terminal phosphate moiety of the dNTPs allows continuous observation of DNA synthesis over thousands of bases without steric hindrance. The data report directly on polymerase dynamics, revealing distinct polymerization states and pause sites corresponding to DNA secondary structure. Sequence data were aligned with the known reference sequence to assay biophysical parameters of polymerization for each template position. Consensus sequences were generated from the single-molecule reads at 15-fold coverage, showing a median accuracy of 99.3%, with no systematic error beyond fluorophore-dependent error rates.
- Published
- 2009
- Full Text
- View/download PDF
15. EDGE: A Centralized Resource for the Comparison, Analysis, and Distribution of Toxicogenomic Information
- Author
-
Stevan B. Jovanovich, Gina M. Zastrow, Russell S. Thomas, Jacqueline A. Walisser, Kevin R. Hayes, Sharon Penn, David R. Rank, Mark Craven, Christopher A. Bradfield, Aaron L. Vollrath, Brian J. McMillan, and Janardan K. Reddy
- Subjects
Lipopolysaccharides ,Pharmacology ,Training set ,Computer science ,Gene Expression Profiling ,Suite ,computer.software_genre ,Bioinformatics ,Toxicogenetics ,Gene expression profiling ,Mice ,Liver ,Receptors, Aryl Hydrocarbon ,Informatics ,Databases, Genetic ,Animals ,Molecular Medicine ,Profiling (information science) ,PPAR alpha ,Data mining ,DNA microarray ,Toxicogenomics ,Cluster analysis ,computer ,Oligonucleotide Array Sequence Analysis - Abstract
Transcriptional profiling via microarrays holds great promise for toxicant classification and hazard prediction. Unfortunately, the use of different microarray platforms, protocols, and informatics often hinders the meaningful comparison of transcriptional profiling data across laboratories. One solution to this problem is to provide a low-cost and centralized resource that enables researchers to share toxicogenomic data that has been generated on a common platform. In an effort to create such a resource, we developed a standardized set of microarray reagents and reproducible protocols to simplify the analysis of liver gene expression in the mouse model. This resource, referred to as EDGE, was then used to generate a training set of 117 publicly accessible transcriptional profiles that can be accessed at http://edge.oncology.wisc.edu/. The Web-accessible database was also linked to an informatics suite that allows on-line clustering and K-means analyses as well as Boolean and sequence-based searches of the data. We propose that EDGE can serve as a prototype resource for the sharing of toxicogenomics information and be used to develop algorithms for efficient chemical classification and hazard prediction.
- Published
- 2005
- Full Text
- View/download PDF
16. Application of genomics to toxicology research
- Author
-
Gina M. Zastrow, Kalyan Pande, Stevan B. Jovanovich, Sharron G. Penn, Russell S. Thomas, Mark H. Lewis, Tianhua Hu, Christopher A. Bradfield, David R. Rank, and Kevin R. Hayes
- Subjects
Polymorphism, Genetic ,Health, Toxicology and Mutagenesis ,Public Health, Environmental and Occupational Health ,Genomics ,Biology ,Toxicology ,Disease Models, Animal ,Gene Expression Regulation ,Toxicity Tests ,Animals ,Humans ,Environmental Pollutants ,Genomic information ,Toxicogenomics ,Research Article ,Forecasting ,Oligonucleotide Array Sequence Analysis - Abstract
Traditional models of toxicity have relied on dissecting chemical action into pharmacokinetic and pharmacodynamic processes. However, the integration of genomic information with toxicology will enhance our basic understanding of these processes and significantly change the way we apply toxicological information to risk assessment and regulatory problems. In this article, we summarize the application of gene expression information and polymorphism discovery to four areas in toxicology: toxicity testing, cross-species extrapolation, understanding mechanism of action, and susceptibility.
- Published
- 2002
- Full Text
- View/download PDF
17. Sequence variation and phylogenetic history of the mouse Ahr gene
- Author
-
Sharron G. Penn, David R. Rank, Kevin Holden, Russell S. Thomas, and Christopher A. Bradfield
- Subjects
Genetic Linkage ,Molecular Sequence Data ,Mice, Inbred Strains ,Locus (genetics) ,Biology ,Evolution, Molecular ,Mice ,Species Specificity ,Phylogenetics ,Genetic variation ,Genetics ,Animals ,Amino Acid Sequence ,Selection, Genetic ,General Pharmacology, Toxicology and Pharmaceutics ,Allele ,Peptide sequence ,Gene ,Phylogeny ,Polymorphism, Genetic ,Sequence Homology, Amino Acid ,Nucleic acid sequence ,Genetic Variation ,Aryl hydrocarbon receptor ,Receptors, Aryl Hydrocarbon ,biology.protein - Abstract
The Ahr locus encodes for the aryl hydrocarbon receptor (AHR), which plays an important toxicological and developmental role. Sequence variation in this gene was studied in 13 different mouse lines that included eight laboratory strains, two Mus musculus subspecies and three additional Mus species. The data presented represent the largest study of sequence variation across multiple mouse lines in a single gene (approximately equal to 15.9 kb/mouse line). Among all mice, the average frequency of all polymorphisms in the intronic regions was 20.3 variants/kb and the average exonic frequency was 14.1 variants/kb. For substitutions alone, the average frequencies in the intronic and exonic regions for all mice were 13.3 and 8.9 substitutions/kb, respectively. Between laboratory strains, the average intronic and exonic frequencies for all polymorphisms dropped to 5.4 and 2.9 variants/kb, respectively. There were 111 non-synonymous polymorphisms that resulted in 42 different amino acid changes, of which only 10 amino acid changes had been previously identified. Based on the nucleotide sequence, the phylogenetic history of the gene showed mice from the Ahr(b2) and Ahr(d) alleles in separate branches while mice from the Ahr(b1) and Ahr(b3) alleles exhibited a more complex history. Evolutionarily, the AHR protein as a whole appears to be under purifying selective pressure (K(a) : K(s) ratio = 0.237). Despite significant functional constraint in the basic helix-loop-helix and PAS domains, ligand binding is not constrained to the high-affinity allele, which supports further the role of the AHR in development and its importance beyond the adaptive response to environmental toxicants.
- Published
- 2002
- Full Text
- View/download PDF
18. Identification of toxicologically predictive gene sets using cDNA microarrays
- Author
-
Kevin R. Hayes, Stevan B. Jovanovich, E.W.N. Glover, Russell S. Thomas, Gina M. Zastrow, Tomi Silander, Christopher A. Bradfield, Janardan K. Reddy, Kalyan Pande, David R. Rank, Sharron G. Penn, and Mark Craven
- Subjects
Male ,Drug-Related Side Effects and Adverse Reactions ,Gene Expression ,Mice ,Predictive Value of Tests ,Gene expression ,Animals ,Oligonucleotide Array Sequence Analysis ,Pharmacology ,Genetics ,CDNA Microarrays ,biology ,Peroxisome proliferator ,Gene Expression Profiling ,Gene sets ,Aryl hydrocarbon receptor ,Mice, Inbred C57BL ,Gene expression profiling ,Pharmaceutical Preparations ,Models, Animal ,biology.protein ,RNA ,Molecular Medicine ,Identification (biology) ,RNA biosynthesis ,Signal Transduction - Abstract
We have developed an approach to classify toxicants based upon their influence on profiles of mRNA transcripts. Changes in liver gene expression were examined after exposure of mice to 24 model treatments that fall into five well-studied toxicological categories: peroxisome proliferators, aryl hydrocarbon receptor agonists, noncoplanar polychlorinated biphenyls, inflammatory agents, and hypoxia-inducing agents. Analysis of 1200 transcripts using both a correlation-based approach and a probabilistic approach resulted in a classification accuracy of between 50 and 70%. However, with the use of a forward parameter selection scheme, a diagnostic set of 12 transcripts was identified that provided an estimated 100% predictive accuracy based on leave-one-out cross-validation. Expansion of this approach to additional chemicals of regulatory concern could serve as an important screening step in a new era of toxicological testing.
- Published
- 2001
- Full Text
- View/download PDF
19. Mining the human genome using microarrays of open reading frames
- Author
-
David L. Barker, Sharron G. Penn, David K. Hanzel, and David R. Rank
- Subjects
Expressed Sequence Tags ,Genetics ,Expressed sequence tag ,Models, Genetic ,Genome, Human ,Gene Expression Profiling ,Exons ,Sequence Analysis, DNA ,Biology ,Polymerase Chain Reaction ,Genome ,Cell Line ,Open Reading Frames ,genomic DNA ,Open reading frame ,Organ Specificity ,Sequence Homology, Nucleic Acid ,Human Genome Project ,Humans ,Human genome ,DNA microarray ,ORFS ,Gene ,Oligonucleotide Array Sequence Analysis - Abstract
To test the hypothesis that the human genome project will uncover many genes not previously discovered by sequencing of expressed sequence tags (ESTs), we designed and produced a set of microarrays using probes based on open reading frames (ORFs) in 350 Mb of finished and draft human sequence. Our approach aims to identify all genes directly from genomic sequence by querying gene expression. We analysed genomic sequence with a suite of ORF prediction programs, selected approximately one ORF per gene, amplified the ORFs from genomic DNA and arrayed the amplicons onto treated glass slides. Of the first 10,000 arrayed ORFs, 31% are completely novel and 29% are similar, but not identical, to sequences in public databases. Approximately one-half of these are expressed in the tissues we queried by microarray. Subsequent verification by other techniques confirmed expression of several of the novel genes. Expressed sequence tags (ESTs) have yielded vast amounts of data1,2, but our results indicate that many genes in the human genome will only be found by genomic sequencing.
- Published
- 2000
- Full Text
- View/download PDF
20. Analyzing Sequencing Reactions from Bacteriophage M13 by Matrix-assisted Laser Desorption/Ionization Mass Spectrometry
- Author
-
David R. Rank, Stéphane Mouradian, and Lloyd M. Smith
- Subjects
Gel electrophoresis ,Sanger sequencing ,Chromatography ,M13 bacteriophage ,biology ,Chemistry ,Oligonucleotide ,Organic Chemistry ,Analytical chemistry ,Mass spectrometry ,biology.organism_classification ,DNA sequencing ,Analytical Chemistry ,Matrix-assisted laser desorption/ionization ,symbols.namesake ,symbols ,Human genome ,Spectroscopy - Abstract
The current demand for improved DNA sequencing methodologies posed by the Human Genome Project has spurred the investigation of alternatives to gel electrophoresis. Matrix-assisted laser desorption/ionization (MALDI) mass spectrometry has great potential for the rapid analysis of DNA fragments. Mock Sanger sequencing mixtures have been successfully analyzed by MALDI by pooling synthesized oligonucleotides corresponding to the M13 bacteriophage sequence. More recently, analyses of Sanger sequencing fragments enzymatically generated from synthetic templates of 45 or 50 bases were reported. In the present study, these feasibility demonstrations are extended to show MALDI sequencing from the M13 bacteriophage DNA template commonly used in actual Sanger sequencing. The results show sequence determination for extension products up to 35 bases in length. Different desalting and purification procedures were investigated and it was found that salt could be efficiently reduced by removal of the template in a post-reaction step. Work in progress to stabilize DNA by chemical modification, employed in conjunction with the methods described here, should enable significant extension of the length of readable sequence.
- Published
- 1996
- Full Text
- View/download PDF
21. Origins of the E. coli Strain Causing an Outbreak of Hemolytic–Uremic Syndrome in Germany
- Author
-
Flemming Scheutz, Lawrence Lee, Eric E. Schadt, Dale R. Webster, Susan R. Steyert, Chen-Shan Chin, Jason W. Sahl, John Eid, Carsten Struve, Julia C. Redman, Susanna Wang, Paul Peluso, Aaron Klammer, David R. Rank, James P. Nataro, Robert Sebra, Nadia Boisen, Andrew Kasarskis, Andrey Kislyuk, Karen A. Krogfelt, Ali Bashir, Jakob Frimodt-Møller, Andreas Petersen, Matthew K. Waldor, Dimitris Iliopoulos, David A. Rasko, James H. Bullard, and Ellen E. Paxinos
- Subjects
Diarrhea ,Serotype ,Virulence ,Biology ,medicine.disease_cause ,Polymerase Chain Reaction ,Article ,Disease Outbreaks ,law.invention ,Microbiology ,Feces ,Escherichia coli O104:H4 ,law ,Germany ,medicine ,Humans ,Escherichia coli ,Escherichia coli Infections ,Phylogeny ,Polymerase chain reaction ,Whole genome sequencing ,Base Sequence ,Shiga-Toxigenic Escherichia coli ,Outbreak ,Sequence Analysis, DNA ,General Medicine ,Middle Aged ,Virology ,Bacterial Typing Techniques ,Hemolytic-Uremic Syndrome ,Female ,medicine.symptom ,Genome, Bacterial - Abstract
A large outbreak of diarrhea and the hemolytic-uremic syndrome caused by an unusual serotype of Shiga-toxin-producing Escherichia coli (O104:H4) began in Germany in May 2011. As of July 22, a large number of cases of diarrhea caused by Shiga-toxin-producing E. coli have been reported--3167 without the hemolytic-uremic syndrome (16 deaths) and 908 with the hemolytic-uremic syndrome (34 deaths)--indicating that this strain is notably more virulent than most of the Shiga-toxin-producing E. coli strains. Preliminary genetic characterization of the outbreak strain suggested that, unlike most of these strains, it should be classified within the enteroaggregative pathotype of E. coli.We used third-generation, single-molecule, real-time DNA sequencing to determine the complete genome sequence of the German outbreak strain, as well as the genome sequences of seven diarrhea-associated enteroaggregative E. coli serotype O104:H4 strains from Africa and four enteroaggregative E. coli reference strains belonging to other serotypes. Genomewide comparisons were performed with the use of these enteroaggregative E. coli genomes, as well as those of 40 previously sequenced E. coli isolates.The enteroaggregative E. coli O104:H4 strains are closely related and form a distinct clade among E. coli and enteroaggregative E. coli strains. However, the genome of the German outbreak strain can be distinguished from those of other O104:H4 strains because it contains a prophage encoding Shiga toxin 2 and a distinct set of additional virulence and antibiotic-resistance factors.Our findings suggest that horizontal genetic exchange allowed for the emergence of the highly virulent Shiga-toxin-producing enteroaggregative E. coli O104:H4 strain that caused the German outbreak. More broadly, these findings highlight the way in which the plasticity of bacterial genomes facilitates the emergence of new pathogens.
- Published
- 2011
22. A flexible and efficient template format for circular consensus sequencing and SNP detection
- Author
-
David R. Rank, Stephen Turner, John Eid, Chen-Shan Chin, and Kevin Travers
- Subjects
Staphylococcus aureus ,Sequence analysis ,Oligonucleotides ,Computational biology ,Biology ,Polymorphism, Single Nucleotide ,Insert (molecular biology) ,03 medical and health sciences ,0302 clinical medicine ,Consensus Sequence ,Genetics ,Consensus sequence ,030304 developmental biology ,Sequence (medicine) ,0303 health sciences ,Base Sequence ,Oligonucleotide ,DNA ,Sequence Analysis, DNA ,Templates, Genetic ,A-site ,Sequence logo ,Template ,Methods Online ,030217 neurology & neurosurgery - Abstract
A novel template design for single-molecule sequencing is introduced, a structure we refer to as a SMRTbell™ template. This structure consists of a double-stranded portion, containing the insert of interest, and a single-stranded hairpin loop on either end, which provides a site for primer binding. Structurally, this format resembles a linear double-stranded molecule, and yet it is topologically circular. When placed into a single-molecule sequencing reaction, the SMRTbell template format enables a consensus sequence to be obtained from multiple passes on a single molecule. Furthermore, this consensus sequence is obtained from both the sense and antisense strands of the insert region. In this article, we present a universal method for constructing these templates, as well as an application of their use. We demonstrate the generation of high-quality consensus accuracy from single molecules, as well as the use of SMRTbell templates in the identification of rare sequence variants.
- Published
- 2010
23. Microarrays: Use in Gene Identification
- Author
-
Sharron G. Penn, David K. Hanzel, Yonggang Ji, Yizhong Gu, Mark E. Shannon, Russell S. Thomas, David R. Rank, Tianhua Hu, Jinjiao Guo, David L. Barker, Wensheng Chen, Amy Corrigan, David Jenkins, and Jian Zhang
- Subjects
Genetics ,Comparative genomics ,Open reading frame ,Complementary DNA ,Gene prediction ,DNA microarray ,Biology ,GeneCalling ,Gene ,Functional genomics - Abstract
Gene prediction by computer algorithms and approaches based on comparative genomics identifies the presence of potential genes in a genomic sequence. By placing these predicted genes onto microarrays and searching for interactions with complementary DNA from different types of tissue or cell, novel genes are discovered. Keywords: gene prediction; comparative genomics; chromosome 22; myosin; open reading frame
- Published
- 2006
- Full Text
- View/download PDF
24. Comprehensive evaluation of the association between prostate cancer and genotypes/haplotypes in CYP17A1, CYP3A4, and SRD5A2
- Author
-
John S. Witte, Anu Loukola, Deborah J. Thompson, David V. Conti, David R. Rank, Mine S. Cicek, Graham Casey, Monica Chadha, Sharron G. Penn, David K. Hanzel, Qiner Yang, Brad Love, Yalin Jiang, Pamela L. Paris, Katherine Dains, and Vesna Bivolarevic
- Subjects
Genetics ,Male ,Linkage disequilibrium ,Genotype ,Haplotype ,Prostatic Neoplasms ,Steroid 17-alpha-Hydroxylase ,Single-nucleotide polymorphism ,Biology ,medicine.disease ,Polymorphism, Single Nucleotide ,Genetic determinism ,Prostate cancer ,3-Oxo-5-alpha-Steroid 4-Dehydrogenase ,Cytochrome P-450 Enzyme System ,Haplotypes ,Case-Control Studies ,medicine ,SNP ,Cytochrome P-450 CYP3A ,Humans ,Allele frequency ,Genetics (clinical) - Abstract
Genes involved in the testosterone biosynthetic pathway – such as CYP17A1, CYP3A4, and SRD5A2 – represent strong candidates for affecting prostate cancer. Previous work has detected associations between individual variants in these three genes and prostate cancer risk and aggressiveness. To more comprehensively evaluate CYP17A1, CYP3A4, and SRD5A2, we undertook a two-phase study of the relationship between their genotypes/haplotypes and prostate cancer. Phase I of the study first searched for single-nucleotide polymorphisms (SNPs) in these genes by resequencing 24 individuals from the Coriell Polymorphism Discovery Resource, 92–110 men from prostate cancer case–control sibships, and by leveraging public databases. In all, 87 SNPs were discovered and genotyped in 276 men from case–control sibships. Those SNPs exhibiting preliminary case–control allele frequency differences, or distinguishing (ie, ‘tagging’) common haplotypes across the genes, were identified for further study (24 SNPs in total). In Phase II of the study, the 24 SNPs were genotyped in an additional 841 men from case–control sibships. Finally, associations between genotypes/haplotypes in CYP17A1, CYP3A4, and SRD5A2 and prostate cancer were evaluated in the total case–control sample of 1117 brothers from 506 sibships. Family-based analyses detected associations between prostate cancer risk or aggressiveness and a number of CYP3A4 SNPs (P-values between 0.006 and 0.05), a CYP3A4 haplotype (P-values 0.05 and 0.009 in nonstratified and stratified analysis, respectively), and two SRD5A2 SNPs in strong linkage disequilibrium (P=0.02). Undertaking a two-phase study comprising SNP discovery, haplotype tagging, and association analyses allowed us to more fully decipher the relation between CYP17A1, CYP3A4, and SRD5A2 and prostate cancer.
- Published
- 2003
25. Application of DNA microarrays for predicting toxicity and evaluating cross-species extrapolation
- Author
-
Karen Tran, Sharron G. Penn, Gina M. Zastrow, Christopher A. Bradfield, Russell S. Thomas, David R. Rank, and Kevin R. Hayes
- Subjects
Gene expression ,Toxicity ,Genomics ,Computational biology ,Biology ,DNA microarray ,Bioinformatics ,Toxicogenomics ,Genome ,Gene ,Organism - Abstract
The application of DNA microarray technology to the field toxicology has increased significantly. In most cases, the research has monitored global changes in gene expression in order to provide insight into the cellular mechanisms of toxicity. Although assessing global gene expression changes may prove to be important when characterizing the action of a particular chemical, it is not necessarily predictive of the toxicological behavior within an organism or across species. As a first step for developing predictive toxicological models within a species, we developed a statistical model to classify 24 model treatments that fall into five well-studied toxicological categories based on gene expression. Using all the gene expression measurements resulted in relatively poor predictive accuracy. However, focusing on a diagnostic subset of genes greatly increased the predictive accuracy. For evaluating the toxicological behavior across species, a set of orthologous microarrays were developed that allowed a direct comparison of gene expression changes in both organisms. To construct these microarrays, a genome wide comparison of available human and mouse sequence was performed to identify putative orthologous genes. A subset of the orthologous genes were spotted on tandem microarrays (one human and one mouse) and used to evaluate conservation of expression patterns between organisms. The use of predictive statistical models and cross-genome comparisons in chemically induced gene expression are the next logical advancements in the field of toxicogenomics and their application has the potential to be extremely valuable in regulatory decisions and the risk assessment process.
- Published
- 2003
- Full Text
- View/download PDF
26. Developing toxicologically predictive gene sets using cDNA microarrays and Bayesian classification
- Author
-
Russell S, Thomas, David R, Rank, Sharron G, Penn, Mark W, Craven, Norman R, Drinkwater, and Christopher A, Bradfield
- Subjects
Expressed Sequence Tags ,Mice ,Gene Expression Regulation ,Liver ,Image Processing, Computer-Assisted ,Animals ,Bayes Theorem ,Toxicology ,Oligonucleotide Array Sequence Analysis ,Toxins, Biological - Published
- 2002
27. Developing toxicologically predictive gene sets using cDNA microarrays and bayesian classification
- Author
-
Norman R. Drinkwater, Mark Craven, David R. Rank, Christopher A. Bradfield, Russell S. Thomas, and Sharron G. Penn
- Subjects
CDNA Microarrays ,Naive Bayes classifier ,Microarray ,Microarray analysis techniques ,Gene sets ,Statistical model ,Computational biology ,Biology ,Toxicogenomics ,Bioinformatics ,Prenatal exposure - Abstract
Publisher Summary The potential applications of predictive statistical models in toxicology based on gene expression measurements are numerous. For example, short-term studies measuring gene expression could be used to predict long-term toxicity studies like those still performed by the National Toxicology Program (NTP). Other short-term gene expression studies could be used to predict which chemicals would be teratogenic or cause more subtle developmental changes after human prenatal exposure. In either case, the application of microarray analysis and predictive statistical models has the potential to be extremely useful from both economic and human health perspectives. The application of predictive statistical models to chemically induced gene expression is the next logical step in the developing field of toxicogenomics. The development of these models may eventually open the door to a new era of toxicological testing where relatively short and inexpensive microarray studies may allow the assessment of the human health risks associated with a previously untested chemical. However, the accuracy and applicability of these models are highly dependent of the quality of the training sets used in their development. As the public gene expression database grows, more chemicals may be added to training the models and those models will become more predictive.
- Published
- 2002
- Full Text
- View/download PDF
28. Sub-microliter DNA sequencing for capillary array electrophoresis
- Author
-
Andrew G. Hadd, David R. Rank, Michael P Goard, and Stevan B. Jovanovich
- Subjects
Chromosomes, Artificial, Bacterial ,Chromatography ,Base Sequence ,DNA, Plant ,Chemistry ,Capillary action ,Organic Chemistry ,Analytical chemistry ,Arabidopsis ,Electrophoresis, Capillary ,General Medicine ,Sequence Analysis, DNA ,Biochemistry ,Polymerase Chain Reaction ,Fluorescence spectroscopy ,Analytical Chemistry ,Electrokinetic phenomena ,Electrophoresis ,Capillary electrophoresis ,Reagent ,Sample preparation ,Laser-induced fluorescence ,DNA Primers - Abstract
DNA sequencing from sub-microliter samples was demonstrated for capillary array electrophoresis by optimizing the analysis of 500 nl reaction aliquots of full-volume reactions and by preparing 500 nl reactions within fused-silica capillaries. Sub-microliter aliquots were removed from the pooled reaction products of 10 microl dye-primer cycle-sequencing reactions and analyzed without modifying either the reagent concentrations or instrument workflow. The impact of precipitation methods, resuspension buffers, and injection times on electrokinetic injection efficiency for 500 nl aliquots were determined by peak heights, signal-to-noise ratios, and changes in base-called readlengths. For 500 nl aliquots diluted to 5 microl in 60% formamide-1 mM EDTA and directly injected, a five-fold increase in signal-to-noise ratios was obtained by increasing injection times from 10 to 80 s without a corresponding increase in peak widths or reduction in readlengths. For 500 nl aliquots precipitated in alcohol, 80 +/- 5% template recovery and a two-fold decrease in conductivity was obtained, resulting in a two-fold increase in peak heights and 50 to 100 bases increase in readlengths. In a comparison of aliquot volumes and precipitation methods, equivalent readlengths were obtained for 500 nl, 4 microl, and 8 microl aliquots by simply adjusting the electrokinetic injection conditions. To ascertain the robustness of this methodology for genomic sequencing, 96 Arabidopsis thaliana subclones were sequenced, with a yield of 38 624 bases obtained from 500 nl aliquots versus 30 764 bases from standard scale reactions. To demonstrate 500 nl sample preparation, reactions were performed in fused-silica capillary reaction chambers using air-based thermal cycling. A readlength of 690 bases was obtained for the polymerase chain reaction product of an Arabidopsis subclone without modifying the reagent concentrations, post-reaction processing or electrokinetic injection workflow. These results demonstrated the fundamental feasibility of small-volume DNA sequencing for high-throughput capillary electrophoresis.
- Published
- 2000
29. Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution
- Author
-
Joseph L. DeRisi, David R. Rank, José Fernando Garcia, John Eid, Daniël P. Melters, Simon W. L. Chan, Christian M. Tobias, Michael R. May, Ian F Korf, Keith Bradnam, Natalie Telis, Paul Peluso, J. Graham Ruby, Timothy P. L. Smith, Jeffrey Ross-Ibarra, Hugh A. Young, Robert Sebra, Univ Calif Davis, USDA ARS, Univ Calif San Francisco, Pacific Biosci, Universidade Estadual Paulista (Unesp), and Howard Hughes Med Inst
- Subjects
0106 biological sciences ,Centromere ,Molecular Sequence Data ,Biology ,01 natural sciences ,Genome ,DNA sequencing ,Evolution, Molecular ,03 medical and health sciences ,chemistry.chemical_compound ,Species Specificity ,Tandem repeat ,Animals ,Quantitative Biology - Genomics ,030304 developmental biology ,Sequence (medicine) ,Genomics (q-bio.GN) ,0303 health sciences ,Concerted evolution ,Base Sequence ,Phylogenetic tree ,Research ,Plants ,chemistry ,Tandem Repeat Sequences ,Evolutionary biology ,FOS: Biological sciences ,DNA ,010606 plant biology & botany - Abstract
Made available in DSpace on 2014-12-03T13:10:41Z (GMT). No. of bitstreams: 0 Previous issue date: 2013-01-01Bitstream added on 2014-12-03T13:23:07Z : No. of bitstreams: 1 WOS000320155200010.pdf: 2040166 bytes, checksum: 8b98e8c7f6f77d18cab8c6df4dd9f7cb (MD5) Joint USDA/DOE Office of Science Feedstock genomics grant National Science Foundation NIH-NIGMS Background: Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data.Results: Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. We assumed that the most abundant tandem repeat is the centromere DNA, which was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond approximately 50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution.Conclusions: While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animal and plant genomes. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes. Univ Calif Davis, Dept Mol & Cell Biol, Davis, CA 95616 USA Univ Calif Davis, Genome Ctr, Davis, CA 95616 USA Univ Calif Davis, Dept Plant Biol, Davis, CA 95616 USA USDA ARS, Western Reg Res Ctr, Albany, CA 94710 USA Univ Calif Davis, Dept Ecol & Evolut, Davis, CA 95616 USA Univ Calif San Francisco, Dept Biochem & Biophys, San Francisco, CA 94158 USA Pacific Biosci, Menlo Pk, CA 94025 USA Univ Estadual Paulista, Dept Anim Prod & Hlth, IAEA Collaborating Ctr Anim Genom & Bioinformat, BR-16050680 Aracatuba, SP, Brazil Howard Hughes Med Inst, Chevy Chase, MD 20815 USA USDA ARS, US Meat Anim Res Ctr, Clay Ctr, NE 68933 USA Univ Calif Davis, Dept Plant Sci, Ctr Populat Biol, Davis, CA 95616 USA Univ Estadual Paulista, Dept Anim Prod & Hlth, IAEA Collaborating Ctr Anim Genom & Bioinformat, BR-16050680 Aracatuba, SP, Brazil Joint USDA/DOE Office of Science Feedstock genomics grantDE-AI02-09ER64829 National Science FoundationIOS-0922703 National Science FoundationIOS-1026094 NIH-NIGMST32-GM008799
- Published
- 2013
- Full Text
- View/download PDF
30. M13-102: a vector for facilitating construction and improving quality of M13 shotgun libraries
- Author
-
Arthur F. Johnson, Danhua Chen, Richard A. Guilfoyle, David R. Rank, Jessica Severin, and Lloyd M. Smith
- Subjects
Cloning ,Genomic Library ,Base Sequence ,Shotgun sequencing ,Genetic Vectors ,Molecular Sequence Data ,General Medicine ,Computational biology ,DNA ,Biology ,Molecular biology ,Insert (molecular biology) ,Phosphoric Monoester Hydrolases ,chemistry.chemical_compound ,chemistry ,Genetics ,Nucleic acid ,Genomic library ,Primer (molecular biology) ,Gene ,Bacteriophage M13 ,DNA Primers ,Fluorescent Dyes - Abstract
A modified vector, M13-102, is described which utilizes the previously reported M13-100 direct selection strategy for shotgun cloning [Guilfoyle and Smith, Nucleic Acids Res. 22 (1994) 100-107]. In these vectors, direct selection replaces the need for phosphatase treatment of vector DNA and is achieved by insertional inactivation of M13 gene X. When not inactivated, the engineered overproduction of the M13 gene X product mediates phage replication repression. M13-102 contains two new additions: (1) a sequence enabling triple-helix-mediated affinity capture (TAC) for purification of linearized vector DNA, and (2) universal primer sequences for wider compatibility with commercial instruments that support fluorescence-based sequencing. Using a biotinylated homopyrimidine oligodeoxyribonucleotide as third-strand probe, TAC is performed on streptavidin-coated magnetic beads [Ji et al., Genetics Analysis: Techniques and Applications 11 (1994) 43-47], and serves as a rapid and efficient alternative to gel purification. To reduce tandem insertions, phosphatase treatment of insert DNA was easily invoked without sacrificing cloning efficiency. The combined capabilities of direct selection, TAC purification and phosphatase treatment of inserts should facilitate library construction and improve overall library quality.
- Published
- 1996
31. AUTOMATED DNA SEQUENCERS: THE NEXT GENERATION
- Author
-
David R. Rank
- Subjects
Massive parallel sequencing ,genetic structures ,Computer science ,2 base encoding ,Sequence assembly ,Hybrid genome assembly ,Computational biology ,Condensed Matter Physics ,Bioinformatics ,eye diseases ,Atomic and Molecular Physics, and Optics ,DNA sequencing ,Electronic, Optical and Magnetic Materials ,DNA sequencer ,Human genome ,Electrical and Electronic Engineering ,ABI Solid Sequencing - Abstract
Optically detected DNA sequencing technologies facilitate sequencing the 3 billion base pairs of the human genome. Faster, higher throughput optically detected electrophoresis instrumentation will be necessary to bring the cost of sequencing the entire genome down and to complete the project by 2005.
- Published
- 1996
- Full Text
- View/download PDF
32. Sequencing the Unsequenceable: Expanded CGG Repeats in the Human FMR1 Gene
- Author
-
Erick Loomis, Luke Hickey, Paul Peluso, David R. Rank, Jun Yin, Paul J. Hagerman, Flora Tassone, and John Eid
- Subjects
Genetics ,biology ,DNA polymerase ,Biophysics ,medicine.disease ,FMR1 ,DNA sequencing ,Fragile X syndrome ,medicine ,biology.protein ,Consensus sequence ,Gene ,Exome sequencing ,Polymerase - Abstract
Alleles of the FMR1 gene with more than 200 CGG repeats generally undergo methylation-coupled gene silencing, resulting in fragile X syndrome, the leading heritable form of cognitive impairment. Smaller expansions (55-200 CGG repeats) result in elevated levels of FMR1 mRNA, which is directly responsible for the late-onset neurodegenerative disorder, fragile X-associated tremor/ataxia syndrome (FXTAS). Despite the importance of this gene, no existing DNA sequencing method is capable of sequencing through more than ∼100 CGG repeats, thus limiting the ability to precisely characterize the disease-causing alleles.The recent development of single molecule, real-time sequencing represents a novel approach to DNA sequencing that couples the intrinsic processivity of DNA polymerase with the ability to read polymerase activity on a single-molecule basis. Further, the accuracy of the method is improved through the use of circular templates, such that each molecule can be read multiple times to produce a circular consensus sequence (CCS). We have succeeded in generating CCS reads representing multiple passes through both strands of repeat tracts exceeding 700 CGGs (>2 kb of 100 percent CG) flanked by native FMR1 sequence, with single-molecule readlengths exceeding 12 kb. This sequencing approach thus enables us to fully characterize the previously intractable CGG-repeat sequence, leading to a better understanding of the distinct associated molecular pathologies. The method will enable us to study details of allele-expansion mosaicism and repeat instability in a manner not previously possible. Real-time kinetic data also provides insight into the activity of DNA polymerase inside this unique sequence.The methodology should be widely applicable for studies of the molecular pathogenesis of an increasing number of repeat expansion-associated neurodegenerative and neurodevelopmental disorders, and for the efficient identification of such disorders in the clinical setting.
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.