1,275 results on '"Salzberg, Steven L."'
Search Results
2. The status of the human gene catalogue
- Author
-
Amaral, Paulo, Carbonell-Sala, Silvia, De La Vega, Francisco M., Faial, Tiago, Frankish, Adam, Gingeras, Thomas, Guigo, Roderic, Harrow, Jennifer L, Hatzigeorgiou, Artemis G., Johnson, Rory, Murphy, Terence D., Pertea, Mihaela, Pruitt, Kim D., Pujar, Shashikant, Takahashi, Hazuki, Ulitsky, Igor, Varabyou, Ales, Wells, Christine A., Yandell, Mark, Carninci, Piero, and Salzberg, Steven L.
- Subjects
Quantitative Biology - Genomics ,Quantitative Biology - Quantitative Methods - Abstract
Scientists have been trying to identify all of the genes in the human genome since the initial draft of the genome was published in 2001. Over the intervening years, much progress has been made in identifying protein-coding genes, and the estimated number has shrunk to fewer than 20,000, although the number of distinct protein-coding isoforms has expanded dramatically. The invention of high-throughput RNA sequencing and other technological breakthroughs have led to an explosion in the number of reported non-coding RNA genes, although most of them do not yet have any known function. A combination of recent advances offers a path forward to identifying these functions and towards eventually completing the human gene catalogue. However, much work remains to be done before we have a universal annotation standard that includes all medically significant genes, maintains their relationships with different reference genomes, and describes clinically relevant genetic variants., Comment: 14 pages
- Published
- 2023
3. Semi-automated assembly of high-quality diploid human reference genomes
- Author
-
Jarvis, Erich D, Formenti, Giulio, Rhie, Arang, Guarracino, Andrea, Yang, Chentao, Wood, Jonathan, Tracey, Alan, Thibaud-Nissen, Francoise, Vollger, Mitchell R, Porubsky, David, Cheng, Haoyu, Asri, Mobin, Logsdon, Glennis A, Carnevali, Paolo, Chaisson, Mark JP, Chin, Chen-Shan, Cody, Sarah, Collins, Joanna, Ebert, Peter, Escalona, Merly, Fedrigo, Olivier, Fulton, Robert S, Fulton, Lucinda L, Garg, Shilpa, Gerton, Jennifer L, Ghurye, Jay, Granat, Anastasiya, Green, Richard E, Harvey, William, Hasenfeld, Patrick, Hastie, Alex, Haukness, Marina, Jaeger, Erich B, Jain, Miten, Kirsche, Melanie, Kolmogorov, Mikhail, Korbel, Jan O, Koren, Sergey, Korlach, Jonas, Lee, Joyce, Li, Daofeng, Lindsay, Tina, Lucas, Julian, Luo, Feng, Marschall, Tobias, Mitchell, Matthew W, McDaniel, Jennifer, Nie, Fan, Olsen, Hugh E, Olson, Nathan D, Pesout, Trevor, Potapova, Tamara, Puiu, Daniela, Regier, Allison, Ruan, Jue, Salzberg, Steven L, Sanders, Ashley D, Schatz, Michael C, Schmitt, Anthony, Schneider, Valerie A, Selvaraj, Siddarth, Shafin, Kishwar, Shumate, Alaina, Stitziel, Nathan O, Stober, Catherine, Torrance, James, Wagner, Justin, Wang, Jianxin, Wenger, Aaron, Xiao, Chuanle, Zimin, Aleksey V, Zhang, Guojie, Wang, Ting, Li, Heng, Garrison, Erik, Haussler, David, Hall, Ira, Zook, Justin M, Eichler, Evan E, Phillippy, Adam M, Paten, Benedict, Howe, Kerstin, and Miga, Karen H
- Subjects
Genetics ,Human Genome ,Biotechnology ,Generic health relevance ,Humans ,Chromosome Mapping ,Diploidy ,Genome ,Human ,Haplotypes ,High-Throughput Nucleotide Sequencing ,Sequence Analysis ,DNA ,Reference Standards ,Genomics ,Chromosomes ,Human ,Genetic Variation ,Human Pangenome Reference Consortium ,General Science & Technology - Abstract
The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
- Published
- 2022
4. The complete sequence of a human Y chromosome
- Author
-
Rhie, Arang, Nurk, Sergey, Cechova, Monika, Hoyt, Savannah J., Taylor, Dylan J., Altemose, Nicolas, Hook, Paul W., Koren, Sergey, Rautiainen, Mikko, Alexandrov, Ivan A., Allen, Jamie, Asri, Mobin, Bzikadze, Andrey V., Chen, Nae-Chyun, Chin, Chen-Shan, Diekhans, Mark, Flicek, Paul, Formenti, Giulio, Fungtammasan, Arkarachai, Garcia Giron, Carlos, Garrison, Erik, Gershman, Ariel, Gerton, Jennifer L., Grady, Patrick G. S., Guarracino, Andrea, Haggerty, Leanne, Halabian, Reza, Hansen, Nancy F., Harris, Robert, Hartley, Gabrielle A., Harvey, William T., Haukness, Marina, Heinz, Jakob, Hourlier, Thibaut, Hubley, Robert M., Hunt, Sarah E., Hwang, Stephen, Jain, Miten, Kesharwani, Rupesh K., Lewis, Alexandra P., Li, Heng, Logsdon, Glennis A., Lucas, Julian K., Makalowski, Wojciech, Markovic, Christopher, Martin, Fergal J., Mc Cartney, Ann M., McCoy, Rajiv C., McDaniel, Jennifer, McNulty, Brandy M., Medvedev, Paul, Mikheenko, Alla, Munson, Katherine M., Murphy, Terence D., Olsen, Hugh E., Olson, Nathan D., Paulin, Luis F., Porubsky, David, Potapova, Tamara, Ryabov, Fedor, Salzberg, Steven L., Sauria, Michael E. G., Sedlazeck, Fritz J., Shafin, Kishwar, Shepelev, Valery A., Shumate, Alaina, Storer, Jessica M., Surapaneni, Likhitha, Taravella Oill, Angela M., Thibaud-Nissen, Françoise, Timp, Winston, Tomaszkiewicz, Marta, Vollger, Mitchell R., Walenz, Brian P., Watwood, Allison C., Weissensteiner, Matthias H., Wenger, Aaron M., Wilson, Melissa A., Zarate, Samantha, Zhu, Yiming, Zook, Justin M., Eichler, Evan E., O’Neill, Rachel J., Schatz, Michael C., Miga, Karen H., Makova, Kateryna D., and Phillippy, Adam M.
- Published
- 2023
- Full Text
- View/download PDF
5. CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure
- Author
-
Varabyou, Ales, Sommer, Markus J., Erdogdu, Beril, Shinder, Ida, Minkin, Ilia, Chao, Kuan-Hao, Park, Sukhwan, Heinz, Jakob, Pockrandt, Christopher, Shumate, Alaina, Rincon, Natalia, Puiu, Daniela, Steinegger, Martin, Salzberg, Steven L., and Pertea, Mihaela
- Published
- 2023
- Full Text
- View/download PDF
6. The complete sequence of a human genome
- Author
-
Nurk, Sergey, Koren, Sergey, Rhie, Arang, Rautiainen, Mikko, Bzikadze, Andrey V, Mikheenko, Alla, Vollger, Mitchell R, Altemose, Nicolas, Uralsky, Lev, Gershman, Ariel, Aganezov, Sergey, Hoyt, Savannah J, Diekhans, Mark, Logsdon, Glennis A, Alonge, Michael, Antonarakis, Stylianos E, Borchers, Matthew, Bouffard, Gerard G, Brooks, Shelise Y, Caldas, Gina V, Chen, Nae-Chyun, Cheng, Haoyu, Chin, Chen-Shan, Chow, William, de Lima, Leonardo G, Dishuck, Philip C, Durbin, Richard, Dvorkina, Tatiana, Fiddes, Ian T, Formenti, Giulio, Fulton, Robert S, Fungtammasan, Arkarachai, Garrison, Erik, Grady, Patrick GS, Graves-Lindsay, Tina A, Hall, Ira M, Hansen, Nancy F, Hartley, Gabrielle A, Haukness, Marina, Howe, Kerstin, Hunkapiller, Michael W, Jain, Chirag, Jain, Miten, Jarvis, Erich D, Kerpedjiev, Peter, Kirsche, Melanie, Kolmogorov, Mikhail, Korlach, Jonas, Kremitzki, Milinn, Li, Heng, Maduro, Valerie V, Marschall, Tobias, McCartney, Ann M, McDaniel, Jennifer, Miller, Danny E, Mullikin, James C, Myers, Eugene W, Olson, Nathan D, Paten, Benedict, Peluso, Paul, Pevzner, Pavel A, Porubsky, David, Potapova, Tamara, Rogaev, Evgeny I, Rosenfeld, Jeffrey A, Salzberg, Steven L, Schneider, Valerie A, Sedlazeck, Fritz J, Shafin, Kishwar, Shew, Colin J, Shumate, Alaina, Sims, Ying, Smit, Arian FA, Soto, Daniela C, Sović, Ivan, Storer, Jessica M, Streets, Aaron, Sullivan, Beth A, Thibaud-Nissen, Françoise, Torrance, James, Wagner, Justin, Walenz, Brian P, Wenger, Aaron, Wood, Jonathan MD, Xiao, Chunlin, Yan, Stephanie M, Young, Alice C, Zarate, Samantha, Surti, Urvashi, McCoy, Rajiv C, Dennis, Megan Y, Alexandrov, Ivan A, Gerton, Jennifer L, O’Neill, Rachel J, Timp, Winston, Zook, Justin M, Schatz, Michael C, Eichler, Evan E, Miga, Karen H, and Phillippy, Adam M
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Human Genome ,1.1 Normal biological development and functioning ,Underpinning research ,Generic health relevance ,Cell Line ,Chromosomes ,Artificial ,Bacterial ,Chromosomes ,Human ,Genome ,Human ,Human Genome Project ,Humans ,Reference Values ,Sequence Analysis ,DNA ,General Science & Technology - Abstract
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
- Published
- 2022
7. Investigating open reading frames in known and novel transcripts using ORFanage
- Author
-
Varabyou, Ales, Erdogdu, Beril, Salzberg, Steven L., and Pertea, Mihaela
- Published
- 2023
- Full Text
- View/download PDF
8. Detecting differential transcript usage in complex diseases with SPIT
- Author
-
Erdogdu, Beril, Varabyou, Ales, Hicks, Stephanie C., Salzberg, Steven L., and Pertea, Mihaela
- Published
- 2024
- Full Text
- View/download PDF
9. High-quality genome and methylomes illustrate features underlying evolutionary success of oaks
- Author
-
Sork, Victoria L, Cokus, Shawn J, Fitz-Gibbon, Sorel T, Zimin, Aleksey V, Puiu, Daniela, Garcia, Jesse A, Gugger, Paul F, Henriquez, Claudia L, Zhen, Ying, Lohmueller, Kirk E, Pellegrini, Matteo, and Salzberg, Steven L
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Biotechnology ,Human Genome ,Biological Evolution ,DNA Methylation ,Epigenome ,Evolution ,Molecular ,Humans ,Quercus - Abstract
The genus Quercus, which emerged ∼55 million years ago during globally warm temperatures, diversified into ∼450 extant species. We present a high-quality de novo genome assembly of a California endemic oak, Quercus lobata, revealing features consistent with oak evolutionary success. Effective population size remained large throughout history despite declining since early Miocene. Analysis of 39,373 mapped protein-coding genes outlined copious duplications consistent with genetic and phenotypic diversity, both by retention of genes created during the ancient γ whole genome hexaploid duplication event and by tandem duplication within families, including numerous resistance genes and a very large block of duplicated DUF247 genes, which have been found to be associated with self-incompatibility in grasses. An additional surprising finding is that subcontext-specific patterns of DNA methylation associated with transposable elements reveal broadly-distributed heterochromatin in intergenic regions, similar to grasses. Collectively, these features promote genetic and phenotypic variation that would facilitate adaptability to changing environments.
- Published
- 2022
10. Dissecting the Polygenic Basis of Cold Adaptation Using Genome-Wide Association of Traits and Environmental Data in Douglas-fir
- Author
-
De La Torre, Amanda R, Wilhite, Benjamin, Puiu, Daniela, St. Clair, John Bradley, Crepeau, Marc W, Salzberg, Steven L, Langley, Charles H, Allen, Brian, and Neale, David B
- Subjects
Human Genome ,Genetics ,Biotechnology ,2.1 Biological and endogenous factors ,Aetiology ,Climate Action ,Acclimatization ,Genes ,Plant ,Genome-Wide Association Study ,Polymorphism ,Single Nucleotide ,Pseudotsuga ,cold adaptation ,growth ,phenology ,cold hardiness ,GWAS ,GEA ,Douglas-fir - Abstract
Understanding the genomic and environmental basis of cold adaptation is key to understand how plants survive and adapt to different environmental conditions across their natural range. Univariate and multivariate genome-wide association (GWAS) and genotype-environment association (GEA) analyses were used to test associations among genome-wide SNPs obtained from whole-genome resequencing, measures of growth, phenology, emergence, cold hardiness, and range-wide environmental variation in coastal Douglas-fir (Pseudotsuga menziesii). Results suggest a complex genomic architecture of cold adaptation, in which traits are either highly polygenic or controlled by both large and small effect genes. Newly discovered associations for cold adaptation in Douglas-fir included 130 genes involved in many important biological functions such as primary and secondary metabolism, growth and reproductive development, transcription regulation, stress and signaling, and DNA processes. These genes were related to growth, phenology and cold hardiness and strongly depend on variation in environmental variables such degree days below 0c, precipitation, elevation and distance from the coast. This study is a step forward in our understanding of the complex interconnection between environment and genomics and their role in cold-associated trait variation in boreal tree species, providing a baseline for the species' predictions under climate change.
- Published
- 2021
11. Metagenome analysis using the Kraken software suite
- Author
-
Lu, Jennifer, Rincon, Natalia, Wood, Derrick E., Breitwieser, Florian P., Pockrandt, Christopher, Langmead, Ben, Salzberg, Steven L., and Steinegger, Martin
- Published
- 2022
- Full Text
- View/download PDF
12. High-quality chromosome-scale assembly of the walnut (Juglans regia L.) reference genome.
- Author
-
Marrano, Annarita, Britton, Monica, Zaini, Paulo A, Zimin, Aleksey V, Workman, Rachael E, Puiu, Daniela, Bianco, Luca, Pierro, Erica Adele Di, Allen, Brian J, Chakraborty, Sandeep, Troggio, Michela, Leslie, Charles A, Timp, Winston, Dandekar, Abhaya, Salzberg, Steven L, and Neale, David B
- Subjects
Chromosomes ,Plant ,Juglans ,Proteomics ,Computational Biology ,Genomics ,Species Specificity ,Genome ,Plant ,Open Reading Frames ,Genetic Variation ,Genome-Wide Association Study ,Molecular Sequence Annotation ,Hi-C ,Iso-Seq ,Nanopore ,allergens ,gene prediction ,genetic diversity ,proteome ,Biotechnology ,Human Genome ,Genetics - Abstract
The release of the first reference genome of walnut (Juglans regia L.) enabled many achievements in the characterization of walnut genetic and functional variation. However, it is highly fragmented, preventing the integration of genetic, transcriptomic, and proteomic information to fully elucidate walnut biological processes. Here, we report the new chromosome-scale assembly of the walnut reference genome (Chandler v2.0) obtained by combining Oxford Nanopore long-read sequencing with chromosome conformation capture (Hi-C) technology. Relative to the previous reference genome, the new assembly features an 84.4-fold increase in N50 size, with the 16 chromosomal pseudomolecules assembled and representing 95% of its total length. Using full-length transcripts from single-molecule real-time sequencing, we predicted 37,554 gene models, with a mean gene length higher than the previous gene annotations. Most of the new protein-coding genes (90%) present both start and stop codons, which represents a significant improvement compared with Chandler v1.0 (only 48%). We then tested the potential impact of the new chromosome-level genome on different areas of walnut research. By studying the proteome changes occurring during male flower development, we observed that the virtual proteome obtained from Chandler v2.0 presents fewer artifacts than the previous reference genome, enabling the identification of a new potential pollen allergen in walnut. Also, the new chromosome-scale genome facilitates in-depth studies of intraspecies genetic diversity by revealing previously undetected autozygous regions in Chandler, likely resulting from inbreeding, and 195 genomic regions highly differentiated between Western and Eastern walnut cultivars. Overall, Chandler v2.0 will serve as a valuable resource to better understand and explore walnut biology.
- Published
- 2020
13. No Evidence of Chronic Infection in a Metagenomic Sequencing Study of the Keratoconus Corneal Epithelium
- Author
-
Kaur, Pritpal, primary, Moon, Loren, additional, Srikumaran, Divya, additional, Salzberg, Steven L., additional, Lu, Jennifer, additional, Simner, Patricia J., additional, and Soiberman, Uri S., additional
- Published
- 2024
- Full Text
- View/download PDF
14. Metagenomic next generation sequencing of plasma RNA for diagnosis of unexplained, acute febrile illness in Uganda.
- Author
-
Kandathil, Abraham J., Blair, Paul W., Lu, Jennifer, Anantharam, Raghavendran, Kobba, Kenneth, Robinson, Matthew L., Alharthi, Sultanah, Ndawula, Edgar C., Dumler, J. Stephen, Kakooza, Francis, Lamorde, Mohammed, Thomas, David L., Salzberg, Steven L., and Manabe, Yukari C.
- Subjects
NUCLEOTIDE sequencing ,HELICOBACTER pylori ,NEISSERIA gonorrhoeae ,PLATELET count ,METAGENOMICS - Abstract
Metagenomic next generation metagenomic sequencing (mNGS) has proven to be a useful tool in the diagnosis and identification of novel human pathogens and pathogens not identified on routine clinical microbiologic tests. In this study, we applied mNGS to characterize plasma RNA isolated from 42 study participants with unexplained acute febrile illness (AFI) admitted to tertiary referral hospitals in Mubende and Arua, Uganda. Study participants were selected based on clinical criteria suggestive of viral infection (i.e., thrombocytopenia, leukopenia). The study population had a median age of 28 years (IQR:24 to 38.5) and median platelet count of 114 x10
3 cells/mm3 (IQR:66,500 to 189,800). An average of 25 million 100 bp reads were generated per sample. We identified strong signals from diverse virus, bacteria, fungi, or parasites in 10 (23.8%) of the study participants. These included well recognized pathogens like Helicobacter pylori, human herpes virus-8, Plasmodium falciparum, Neisseria gonorrhoeae, and Rickettsia conorii. We further confirmed Rickettsia conorii infection, the cause of Mediterranean Spotted Fever (MSF), using PCR assays and Sanger sequencing. mNGS was a useful addition for detection of otherwise undetected pathogens and well-recognized non-pathogens. This is the first report to describe the molecular confirmation of a hospitalized case of MSF in sub-Saharan Africa (SSA). Further studies are needed to determine the utility of mNGS for disease surveillance in similar settings. Author summary: Unbiased molecular approaches like metagenomic sequencing have improved our ability to identify not only novel microbes but also known microbes in new settings. We report the results of a metagenomic next generation sequencing approach to identify viral and cell free plasma RNA from a curated panel of 42 unexplained, acute febrile, hospitalized study participants from tertiary referral hospitals in Mubende and Arua, Uganda. In ten study participants, metagenomic sequencing allowed us to identify pathogens including Helicobacter pylori, and Rickettsia conorii that were missed on routine clinical microbiologic testing. Sequence-specific targeted PCR assays and Sanger sequencing confirmed Rickettsia conorii infection in the first hospitalized case of Mediterranean Spotted Fever diagnosed in sub-Saharan Africa. Using appropriate controls, we observed metagenomic sequencing to be unable to consistently detect microbial sequences when plasma circulation levels were below 10,000 copies/mL. These results highlight the need for more studies to determine the utility of metagenomic next generation sequencing approaches for disease diagnosis and surveillance in similar settings. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
15. Genomic Variation Among and Within Six Juglans Species.
- Author
-
Stevens, Kristian A, Woeste, Keith, Chakraborty, Sandeep, Crepeau, Marc W, Leslie, Charles A, Martínez-García, Pedro J, Puiu, Daniela, Romero-Severson, Jeanne, Coggeshall, Mark, Dandekar, Abhaya M, Kluepfel, Daniel, Neale, David B, Salzberg, Steven L, and Langley, Charles H
- Subjects
Juglans ,Computational Biology ,Genomics ,Evolution ,Molecular ,Phylogeny ,Genotype ,Polymorphism ,Single Nucleotide ,Genome ,Plant ,Genetic Variation ,Molecular Sequence Annotation ,High-Throughput Nucleotide Sequencing ,Genome Size ,genomic variation ,polyphenol oxidase ,reference genomes ,walnut ,Evolution ,Molecular ,Polymorphism ,Single Nucleotide ,Genome ,Plant ,Genetics - Abstract
Genomic analysis in Juglans (walnuts) is expected to transform the breeding and agricultural production of both nuts and lumber. To that end, we report here the determination of reference sequences for six additional relatives of Juglans regia: Juglans sigillata (also from section Dioscaryon), Juglans nigra, Juglans microcarpa, Juglans hindsii (from section Rhysocaryon), Juglans cathayensis (from section Cardiocaryon), and the closely related Pterocarya stenoptera While these are 'draft' genomes, ranging in size between 640Mbp and 990Mbp, their contiguities and accuracies can support powerful annotations of genomic variation that are often the foundation of new avenues of research and breeding. We annotated nucleotide divergence and synteny by creating complete pairwise alignments of each reference genome to the remaining six. In addition, we have re-sequenced a sample of accessions from four Juglans species (including regia). The variation discovered in these surveys comprises a critical resource for experimentation and breeding, as well as a solid complementary annotation. To demonstrate the potential of these resources the structural and sequence variation in and around the polyphenol oxidase loci, PPO1 and PPO2 were investigated. As reported for other seed crops variation in this gene is implicated in the domestication of walnuts. The apparently Juglandaceae specific PPO1 duplicate shows accelerated divergence and an excess of amino acid replacement on the lineage leading to accessions of the domesticated nut crop species, Juglans regia and sigillata.
- Published
- 2018
16. Comprehensive analysis of microbial content in whole-genome sequencing samples from The Cancer Genome Atlas project
- Author
-
Ge, Yuchen, primary, Lu, Jennifer, additional, Revsine, Mahler, additional, Puiu, Daniela, additional, and Salzberg, Steven L., additional
- Published
- 2024
- Full Text
- View/download PDF
17. Implementing governmental oversight of enhanced potential pandemic pathogen research
- Author
-
Ebright, Richard H., primary, MacIntyre, Raina, additional, Dudley, Joseph P., additional, Butler, Colin D., additional, Goffinet, Andre, additional, Hammond, Edward, additional, Harris, Elisa D., additional, Kakeya, Hideki, additional, Lambrinidou, Yanna, additional, Leitenberg, Milton, additional, Newman, Stuart A., additional, Nickels, Bryce E., additional, Rahalkar, Monali C., additional, Ridley, Matt W., additional, Salzberg, Steven L., additional, Seshadri, Harish, additional, Theißen, Günter, additional, VanDongen, Antonius M., additional, and Washburne, Alex, additional
- Published
- 2024
- Full Text
- View/download PDF
18. A genome sequence for the threatened whitebark pine
- Author
-
Neale, David B, primary, Zimin, Aleksey V, additional, Meltzer, Amy, additional, Bhattarai, Akriti, additional, Amee, Maurice, additional, Figueroa Corona, Laura, additional, Allen, Brian J, additional, Puiu, Daniela, additional, Wright, Jessica, additional, De La Torre, Amanda R, additional, McGuire, Patrick E, additional, Timp, Winston, additional, Salzberg, Steven L, additional, and Wegrzyn, Jill L, additional
- Published
- 2024
- Full Text
- View/download PDF
19. Genome sequence of the progenitor of the wheat D genome Aegilops tauschii
- Author
-
Luo, Ming-Cheng, Gu, Yong Q, Puiu, Daniela, Wang, Hao, Twardziok, Sven O, Deal, Karin R, Huo, Naxin, Zhu, Tingting, Wang, Le, Wang, Yi, McGuire, Patrick E, Liu, Shuyang, Long, Hai, Ramasamy, Ramesh K, Rodriguez, Juan C, Van, Sonny L, Yuan, Luxia, Wang, Zhenzhong, Xia, Zhiqiang, Xiao, Lichan, Anderson, Olin D, Ouyang, Shuhong, Liang, Yong, Zimin, Aleksey V, Pertea, Geo, Qi, Peng, Bennetzen, Jeffrey L, Dai, Xiongtao, Dawson, Matthew W, Müller, Hans-Georg, Kugler, Karl, Rivarola-Duarte, Lorena, Spannagl, Manuel, Mayer, Klaus FX, Lu, Fu-Hao, Bevan, Michael W, Leroy, Philippe, Li, Pingchuan, You, Frank M, Sun, Qixin, Liu, Zhiyong, Lyons, Eric, Wicker, Thomas, Salzberg, Steven L, Devos, Katrien M, and Dvořák, Jan
- Subjects
Biological Sciences ,Genetics ,Chromosome Mapping ,Diploidy ,Evolution ,Molecular ,Gene Duplication ,Genes ,Plant ,Genome ,Plant ,Genomics ,Phylogeny ,Poaceae ,Recombination ,Genetic ,Sequence Analysis ,DNA ,Triticum ,General Science & Technology - Abstract
Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat (Triticum aestivum, genomes AABBDD) and an important genetic resource for wheat. The large size and highly repetitive nature of the Ae. tauschii genome has until now precluded the development of a reference-quality genome sequence. Here we use an array of advanced technologies, including ordered-clone genome sequencing, whole-genome shotgun sequencing, and BioNano optical genome mapping, to generate a reference-quality genome sequence for Ae. tauschii ssp. strangulata accession AL8/78, which is closely related to the wheat D genome. We show that compared to other sequenced plant genomes, including a much larger conifer genome, the Ae. tauschii genome contains unprecedented amounts of very similar repeated sequences. Our genome comparisons reveal that the Ae. tauschii genome has a greater number of dispersed duplicated genes than other sequenced genomes and its chromosomes have been structurally evolving an order of magnitude faster than those of other grass genomes. The decay of colinearity with other grass genomes correlates with recombination rates along chromosomes. We propose that the vast amounts of very similar repeated sequences cause frequent errors in recombination and lead to gene duplications and structural chromosome changes that drive fast genome evolution.
- Published
- 2017
20. Erratum to: An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing
- Author
-
Zimin, Aleksey V, Stevens, Kristian A, Crepeau, Marc W, Puiu, Daniela, Wegrzyn, Jill L, Yorke, James A, Langley, Charles H, Neale, David B, and Salzberg, Steven L
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Human Genome - Abstract
The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly.
- Published
- 2017
21. The Douglas-Fir Genome Sequence Reveals Specialization of the Photosynthetic Apparatus in Pinaceae.
- Author
-
Neale, David B, McGuire, Patrick E, Wheeler, Nicholas C, Stevens, Kristian A, Crepeau, Marc W, Cardeno, Charis, Zimin, Aleksey V, Puiu, Daniela, Pertea, Geo M, Sezen, U Uzay, Casola, Claudio, Koralewski, Tomasz E, Paul, Robin, Gonzalez-Ibeas, Daniel, Zaman, Sumaira, Cronn, Richard, Yandell, Mark, Holt, Carson, Langley, Charles H, Yorke, James A, Salzberg, Steven L, and Wegrzyn, Jill L
- Subjects
Pinaceae ,Pseudotsuga ,Proteomics ,Computational Biology ,Genomics ,Adaptation ,Biological ,Evolution ,Molecular ,Phylogeny ,Photosynthesis ,Gene Duplication ,Repetitive Sequences ,Nucleic Acid ,Genome ,Plant ,Multigene Family ,Gene Regulatory Networks ,Molecular Sequence Annotation ,Whole Genome Sequencing ,annotation ,conifer ,genome assembly ,gymnosperm ,mega-genome ,shade tolerance ,Adaptation ,Biological ,Evolution ,Molecular ,Repetitive Sequences ,Nucleic Acid ,Genome ,Plant ,Genetics - Abstract
A reference genome sequence for Pseudotsuga menziesii var. menziesii (Mirb.) Franco (Coastal Douglas-fir) is reported, thus providing a reference sequence for a third genus of the family Pinaceae. The contiguity and quality of the genome assembly far exceeds that of other conifer reference genome sequences (contig N50 = 44,136 bp and scaffold N50 = 340,704 bp). Incremental improvements in sequencing and assembly technologies are in part responsible for the higher quality reference genome, but it may also be due to a slightly lower exact repeat content in Douglas-fir vs. pine and spruce. Comparative genome annotation with angiosperm species reveals gene-family expansion and contraction in Douglas-fir and other conifers which may account for some of the major morphological and physiological differences between the two major plant groups. Notable differences in the size of the NDH-complex gene family and genes underlying the functional basis of shade tolerance/intolerance were observed. This reference genome sequence not only provides an important resource for Douglas-fir breeders and geneticists but also sheds additional light on the evolutionary processes that have led to the divergence of modern angiosperms from the more ancient gymnosperms.
- Published
- 2017
22. An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing
- Author
-
Zimin, Aleksey V, Stevens, Kristian A, Crepeau, Marc W, Puiu, Daniela, Wegrzyn, Jill L, Yorke, James A, Langley, Charles H, Neale, David B, and Salzberg, Steven L
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Human Genome ,Algorithms ,Contig Mapping ,Genome ,Plant ,Genomics ,High-Throughput Nucleotide Sequencing ,Pinus taeda ,Sequence Analysis ,DNA ,Genome assembly ,Next-gen sequencing ,Conifers ,Pine genomes - Abstract
The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly.
- Published
- 2017
23. Sequence of the Sugar Pine Megagenome.
- Author
-
Stevens, Kristian A, Wegrzyn, Jill L, Zimin, Aleksey, Puiu, Daniela, Crepeau, Marc, Cardeno, Charis, Paul, Robin, Gonzalez-Ibeas, Daniel, Koriabine, Maxim, Holtz-Morris, Ann E, Martínez-García, Pedro J, Sezen, Uzay U, Marçais, Guillaume, Jermstad, Kathy, McGuire, Patrick E, Loopstra, Carol A, Davis, John M, Eckert, Andrew, de Jong, Pieter, Yorke, James A, Salzberg, Steven L, Neale, David B, and Langley, Charles H
- Subjects
Basidiomycota ,Pinus ,DNA Transposable Elements ,Genome ,Plant ,Genetic Variation ,Plant Immunity ,Genome Size ,conifer genome ,transposable elements ,white pine blister rust ,Genetics ,Human Genome ,Biotechnology ,Developmental Biology - Abstract
Until very recently, complete characterization of the megagenomes of conifers has remained elusive. The diploid genome of sugar pine (Pinus lambertiana Dougl.) has a highly repetitive, 31 billion bp genome. It is the largest genome sequenced and assembled to date, and the first from the subgenus Strobus, or white pines, a group that is notable for having the largest genomes among the pines. The genome represents a unique opportunity to investigate genome "obesity" in conifers and white pines. Comparative analysis of P. lambertiana and P. taeda L. reveals new insights on the conservation, age, and diversity of the highly abundant transposable elements, the primary factor determining genome size. Like most North American white pines, the principal pathogen of P. lambertiana is white pine blister rust (Cronartium ribicola J.C. Fischer ex Raben.). Identification of candidate genes for resistance to this pathogen is of great ecological importance. The genome sequence afforded us the opportunity to make substantial progress on locating the major dominant gene for simple resistance hypersensitive response, Cr1 We describe new markers and gene annotation that are both tightly linked to Cr1 in a mapping population, and associated with Cr1 in unrelated sugar pine individuals sampled throughout the species' range, creating a solid foundation for future mapping. This genomic variation and annotated candidate genes characterized in our study of the Cr1 region are resources for future marker-assisted breeding efforts as well as for investigations of fundamental mechanisms of invasive disease and evolutionary response.
- Published
- 2016
24. Novel metagenomics analysis of stony coral tissue loss disease
- Author
-
Heinz, Jakob M., primary, Lu, Jennifer, additional, Huebner, Lindsay K., additional, Salzberg, Steven L., additional, Sommer, Markus, additional, and Rosales, Stephanie M., additional
- Published
- 2024
- Full Text
- View/download PDF
25. First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae).
- Author
-
Sork, Victoria L, Fitz-Gibbon, Sorel T, Puiu, Daniela, Crepeau, Marc, Gugger, Paul F, Sherman, Rachel, Stevens, Kristian, Langley, Charles H, Pellegrini, Matteo, and Salzberg, Steven L
- Subjects
GenPred ,Genomic Selection ,Quercus ,Shared Data Resources ,adaptation ,annotation ,chloroplast ,nuclear genome assembly ,Genetics - Abstract
Oak represents a valuable natural resource across Northern Hemisphere ecosystems, attracting a large research community studying its genetics, ecology, conservation, and management. Here we introduce a draft genome assembly of valley oak (Quercus lobata) using Illumina sequencing of adult leaf tissue of a tree found in an accessible, well-studied, natural southern California population. Our assembly includes a nuclear genome and a complete chloroplast genome, along with annotation of encoded genes. The assembly contains 94,394 scaffolds, totaling 1.17 Gb with 18,512 scaffolds of length 2 kb or longer, with a total length of 1.15 Gb, and a N50 scaffold size of 278,077 kb. The k-mer histograms indicate an diploid genome size of ∼720-730 Mb, which is smaller than the total length due to high heterozygosity, estimated at 1.25%. A comparison with a recently published European oak (Q. robur) nuclear sequence indicates 93% similarity. The Q. lobata chloroplast genome has 99% identity with another North American oak, Q. rubra Preliminary annotation yielded an estimate of 61,773 predicted protein-coding genes, of which 71% had similarity to known protein domains. We searched 956 Benchmarking Universal Single-Copy Orthologs, and found 863 complete orthologs, of which 450 were present in > 1 copy. We also examined an earlier version (v0.5) where duplicate haplotypes were removed to discover variants. These additional sources indicate that the predicted gene count in Version 1.0 is overestimated by 37-52%. Nonetheless, this first draft valley oak genome assembly represents a high-quality, well-annotated genome that provides a tool for forest restoration and management practices.
- Published
- 2016
26. Genomic architecture of complex traits in loblolly pine
- Author
-
De La Torre, Amanda R., Puiu, Daniela, Crepeau, Marc W., Stevens, Kristian, Salzberg, Steven L., Langley, Charles H., and Neale, David B.
- Published
- 2019
27. Pan-genomics in the human genome era
- Author
-
Sherman, Rachel M. and Salzberg, Steven L.
- Published
- 2020
- Full Text
- View/download PDF
28. Sequencing and Assembly of the 22-Gb Loblolly Pine Genome
- Author
-
Zimin, Aleksey, Stevens, Kristian A, Crepeau, Marc W, Holtz-Morris, Ann, Koriabine, Maxim, Marçais, Guillaume, Puiu, Daniela, Roberts, Michael, Wegrzyn, Jill L, de Jong, Pieter J, Neale, David B, Salzberg, Steven L, Yorke, James A, and Langley, Charles H
- Subjects
Genetics ,Human Genome ,Biotechnology ,Generic health relevance ,Genome ,Plant ,Genomics ,Haploidy ,Ovule ,Pinus taeda ,Sequence Analysis ,DNA ,Transcriptome ,placeholder ,denovo assembly ,loblolly pine ,(see cocoa genome) ,Pinus taeda L. ,conifer ,de novo assembly ,fosmid ,genome ,Developmental Biology - Abstract
Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer "super-reads," rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.
- Published
- 2014
29. Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies.
- Author
-
Neale, David B, Wegrzyn, Jill L, Stevens, Kristian A, Zimin, Aleksey V, Puiu, Daniela, Crepeau, Marc W, Cardeno, Charis, Koriabine, Maxim, Holtz-Morris, Ann E, Liechty, John D, Martínez-García, Pedro J, Vasquez-Gross, Hans A, Lin, Brian Y, Zieve, Jacob J, Dougherty, William M, Fuentes-Soriano, Sara, Wu, Le-Shin, Gilbert, Don, Marçais, Guillaume, Roberts, Michael, Holt, Carson, Yandell, Mark, Davis, John M, Smith, Katherine E, Dean, Jeffrey FD, Lorenz, W Walter, Whetten, Ross W, Sederoff, Ronald, Wheeler, Nicholas, McGuire, Patrick E, Main, Doreen, Loopstra, Carol A, Mockaitis, Keithanne, deJong, Pieter J, Yorke, James A, Salzberg, Steven L, and Langley, Charles H
- Subjects
Pinus taeda ,DNA ,Plant ,Contig Mapping ,Sequence Analysis ,DNA ,Haploidy ,Genome ,Plant ,Human Genome ,Biotechnology ,Genetics ,Generic health relevance ,Environmental Sciences ,Biological Sciences ,Information and Computing Sciences ,Bioinformatics - Abstract
BackgroundThe size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination.ResultsWe develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome.ConclusionsIn addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.
- Published
- 2014
30. Unique Features of the Loblolly Pine (Pinus taeda L.) Megagenome Revealed Through Sequence Annotation
- Author
-
Wegrzyn, Jill L, Liechty, John D, Stevens, Kristian A, Wu, Le-Shin, Loopstra, Carol A, Vasquez-Gross, Hans A, Dougherty, William M, Lin, Brian Y, Zieve, Jacob J, Martínez-García, Pedro J, Holt, Carson, Yandell, Mark, Zimin, Aleksey V, Yorke, James A, Crepeau, Marc W, Puiu, Daniela, Salzberg, Steven L, de Jong, Pieter J, Mockaitis, Keithanne, Main, Doreen, Langley, Charles H, and Neale, David B
- Subjects
Biological Sciences ,Genetics ,Biotechnology ,DNA ,Plant ,Evolution ,Molecular ,Genes ,Plant ,Genome ,Plant ,Molecular Sequence Annotation ,Multigene Family ,Phylogeny ,Pinus taeda ,Sequence Alignment ,introns ,gene family ,repeats ,retrotransposons ,conifer ,Developmental Biology ,Biochemistry and cell biology - Abstract
The largest genus in the conifer family Pinaceae is Pinus, with over 100 species. The size and complexity of their genomes (∼20-40 Gb, 2n = 24) have delayed the arrival of a well-annotated reference sequence. In this study, we present the annotation of the first whole-genome shotgun assembly of loblolly pine (Pinus taeda L.), which comprises 20.1 Gb of sequence. The MAKER-P annotation pipeline combined evidence-based alignments and ab initio predictions to generate 50,172 gene models, of which 15,653 are classified as high confidence. Clustering these gene models with 13 other plant species resulted in 20,646 gene families, of which 1554 are predicted to be unique to conifers. Among the conifer gene families, 159 are composed exclusively of loblolly pine members. The gene models for loblolly pine have the highest median and mean intron lengths of 24 fully sequenced plant genomes. Conifer genomes are full of repetitive DNA, with the most significant contributions from long-terminal-repeat retrotransposons. In depth analysis of the tandem and interspersed repetitive content yielded a combined estimate of 82%.
- Published
- 2014
31. A Genome Sequence for the Threatened Whitebark Pine
- Author
-
Neale, David B., primary, Zimin, Aleksey V., additional, Meltzer, Amy, additional, Bhattarai, Akriti, additional, Amee, Maurice, additional, Corona, Laura Figueroa, additional, Allen, Brian J., additional, Puiu, Daniela, additional, Wright, Jessica, additional, Torre, Amanda R. De La, additional, McGuire, Patrick E., additional, Timp, Winston, additional, Salzberg, Steven L., additional, and Wegrzyn, Jill L., additional
- Published
- 2023
- Full Text
- View/download PDF
32. Major data analysis errors invalidate cancer microbiome findings
- Author
-
Gihawi, Abraham, primary, Ge, Yuchen, additional, Lu, Jennifer, additional, Puiu, Daniela, additional, Xu, Amanda, additional, Cooper, Colin S., additional, Brewer, Daniel S., additional, Pertea, Mihaela, additional, and Salzberg, Steven L., additional
- Published
- 2023
- Full Text
- View/download PDF
33. The status of the human gene catalogue
- Author
-
Amaral, Paulo, primary, Carbonell-Sala, Silvia, additional, De La Vega, Francisco M., additional, Faial, Tiago, additional, Frankish, Adam, additional, Gingeras, Thomas, additional, Guigo, Roderic, additional, Harrow, Jennifer L., additional, Hatzigeorgiou, Artemis G., additional, Johnson, Rory, additional, Murphy, Terence D., additional, Pertea, Mihaela, additional, Pruitt, Kim D., additional, Pujar, Shashikant, additional, Takahashi, Hazuki, additional, Ulitsky, Igor, additional, Varabyou, Ales, additional, Wells, Christine A., additional, Yandell, Mark, additional, Carninci, Piero, additional, and Salzberg, Steven L., additional
- Published
- 2023
- Full Text
- View/download PDF
34. Next-generation sequencing: insights to advance clinical investigations of the microbiome
- Author
-
Wensel, Caroline R., Pluznick, Jennifer L., Salzberg, Steven L., and Sears, Cynthia L.
- Subjects
Genomics -- Methods ,Microbiota (Symbiotic organisms) -- Research ,Dysbiosis -- Genetic aspects -- Development and progression ,RNA sequencing -- Methods ,Translational research ,Health care industry - Abstract
Next-generation sequencing (NGS) technology has advanced our understanding of the human microbiome by allowing for the discovery and characterization of unculturable microbes with prediction of their function. Key NGS methods include 16S rRNA gene sequencing, shotgun metagenomic sequencing, and RNA sequencing. The choice of which NGS methodology to pursue for a given purpose is often unclear for clinicians and researchers. In this Review, we describe the fundamentals of NGS, with a focus on 16S rRNA and shotgun metagenomic sequencing. We also discuss pros and cons of each methodology as well as important concepts in data variability, study design, and clinical metadata collection. We further present examples of how NGS studies of the human microbiome have advanced our understanding of human disease pathophysiology across diverse clinical contexts, including the development of diagnostics and therapeutics. Finally, we share insights as to how NGS might further be integrated into and advance microbiome research and clinical care in the coming years., Introduction The number of microbial cells that reside on and in us rivals the number of our own cells (1). In health, we, the host, and microbes live in symbiosis. [...]
- Published
- 2022
- Full Text
- View/download PDF
35. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions
- Author
-
Kim, Daehwan, Pertea, Geo, Trapnell, Cole, Pimentel, Harold, Kelley, Ryan, and Salzberg, Steven L
- Abstract
Abstract TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat.
- Published
- 2013
36. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype
- Author
-
Kim, Daehwan, Paggi, Joseph M., Park, Chanhee, Bennett, Christopher, and Salzberg, Steven L.
- Published
- 2019
- Full Text
- View/download PDF
37. Assembly of a pan-genome from deep sequencing of 910 humans of African descent
- Author
-
Sherman, Rachel M., Forman, Juliet, Antonescu, Valentin, Puiu, Daniela, Daya, Michelle, Rafaels, Nicholas, Boorgula, Meher Preethi, Chavan, Sameer, Vergara, Candelaria, Ortega, Victor E., Levin, Albert M., Eng, Celeste, Yazdanbakhsh, Maria, Wilson, James G., Marrugo, Javier, Lange, Leslie A., Williams, L. Keoki, Watson, Harold, Ware, Lorraine B., Olopade, Christopher O., Olopade, Olufunmilayo, Oliveira, Ricardo R., Ober, Carole, Nicolae, Dan L., Meyers, Deborah A., Mayorga, Alvaro, Knight-Madden, Jennifer, Hartert, Tina, Hansel, Nadia N., Foreman, Marilyn G., Ford, Jean G., Faruque, Mezbah U., Dunston, Georgia M., Caraballo, Luis, Burchard, Esteban G., Bleecker, Eugene R., Araujo, Maria I., Herrera-Paz, Edwin F., Campbell, Monica, Foster, Cassandra, Taub, Margaret A., Beaty, Terri H., Ruczinski, Ingo, Mathias, Rasika A., Barnes, Kathleen C., and Salzberg, Steven L.
- Published
- 2019
- Full Text
- View/download PDF
38. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species
- Author
-
Dasmahapatra, Kanchon K, Walters, James R, Briscoe, Adriana D, Davey, John W, Whibley, Annabel, Nadeau, Nicola J, Zimin, Aleksey V, Hughes, Daniel ST, Ferguson, Laura C, Martin, Simon H, Salazar, Camilo, Lewis, James J, Adler, Sebastian, Ahn, Seung-Joon, Baker, Dean A, Baxter, Simon W, Chamberlain, Nicola L, Chauhan, Ritika, Counterman, Brian A, Dalmay, Tamas, Gilbert, Lawrence E, Gordon, Karl, Heckel, David G, Hines, Heather M, Hoff, Katharina J, Holland, Peter WH, Jacquin-Joly, Emmanuelle, Jiggins, Francis M, Jones, Robert T, Kapan, Durrell D, Kersey, Paul, Lamas, Gerardo, Lawson, Daniel, Mapleson, Daniel, Maroja, Luana S, Martin, Arnaud, Moxon, Simon, Palmer, William J, Papa, Riccardo, Papanicolaou, Alexie, Pauchet, Yannick, Ray, David A, Rosser, Neil, Salzberg, Steven L, Supple, Megan A, Surridge, Alison, Tenger-Trolander, Ayse, Vogel, Heiko, Wilkinson, Paul A, Wilson, Derek, Yorke, James A, Yuan, Furong, Balmuth, Alexi L, Eland, Cathlene, Gharbi, Karim, Thomson, Marian, Gibbs, Richard A, Han, Yi, Jayaseelan, Joy C, Kovar, Christie, Mathew, Tittu, Muzny, Donna M, Ongeri, Fiona, Pu, Ling-Ling, Qu, Jiaxin, Thornton, Rebecca L, Worley, Kim C, Wu, Yuan-Qing, Linares, Mauricio, Blaxter, Mark L, Ffrench-Constant, Richard H, Joron, Mathieu, Kronforst, Marcus R, Mullen, Sean P, Reed, Robert D, Scherer, Steven E, Richards, Stephen, Mallet, James, McMillan, W Owen, and Jiggins, Chris D
- Subjects
Adaptation ,Physiological ,Animals ,Bombyx ,Butterflies ,Chromosomes ,Insect ,Evolution ,Molecular ,Gene Flow ,Genes ,Homeobox ,Genes ,Insect ,Genome ,Insect ,Genomics ,Hybridization ,Genetic ,Molecular Mimicry ,Molecular Sequence Data ,Phylogeny ,Pigmentation ,Sequence Analysis ,DNA ,Species Specificity ,Synteny ,Wings ,Animal ,Heliconius Genome Consortium ,General Science & Technology - Abstract
The evolutionary importance of hybridization and introgression has long been debated. Hybrids are usually rare and unfit, but even infrequent hybridization can aid adaptation by transferring beneficial traits between species. Here we use genomic tools to investigate introgression in Heliconius, a rapidly radiating genus of neotropical butterflies widely used in studies of ecology, behaviour, mimicry and speciation. We sequenced the genome of Heliconius melpomene and compared it with other taxa to investigate chromosomal evolution in Lepidoptera and gene flow among multiple Heliconius species and races. Among 12,669 predicted genes, biologically important expansions of families of chemosensory and Hox genes are particularly noteworthy. Chromosomal organization has remained broadly conserved since the Cretaceous period, when butterflies split from the Bombyx (silkmoth) lineage. Using genomic resequencing, we show hybrid exchange of genes between three co-mimics, Heliconius melpomene, Heliconius timareta and Heliconius elevatus, especially at two genomic regions that control mimicry pattern. We infer that closely related Heliconius species exchange protective colour-pattern genes promiscuously, implying that hybridization has an important role in adaptive radiation.
- Published
- 2012
39. Ultrafast and accurate 16S rRNA microbial community analysis using Kraken 2
- Author
-
Lu, Jennifer and Salzberg, Steven L.
- Published
- 2020
- Full Text
- View/download PDF
40. Two new complete genome sequences offer insight into host and tissue specificity of plant pathogenic Xanthomonas spp.
- Author
-
Bogdanove, Adam J, Koebnik, Ralf, Lu, Hong, Furutani, Ayako, Angiuoli, Samuel V, Patil, Prabhu B, Van Sluys, Marie-Anne, Ryan, Robert P, Meyer, Damien F, Han, Sang-Wook, Aparna, Gudlur, Rajaram, Misha, Delcher, Arthur L, Phillippy, Adam M, Puiu, Daniela, Schatz, Michael C, Shumway, Martin, Sommer, Daniel D, Trapnell, Cole, Benahmed, Faiza, Dimitrov, George, Madupu, Ramana, Radune, Diana, Sullivan, Steven, Jha, Gopaljee, Ishihara, Hiromichi, Lee, Sang-Won, Pandey, Alok, Sharma, Vikas, Sriariyanun, Malinee, Szurek, Boris, Vera-Cruz, Casiana M, Dorman, Karin S, Ronald, Pamela C, Verdier, Valérie, Dow, J Maxwell, Sonti, Ramesh V, Tsuge, Seiji, Brendel, Volker P, Rabinowicz, Pablo D, Leach, Jan E, White, Frank F, and Salzberg, Steven L
- Abstract
Xanthomonas is a large genus of bacteria that collectively cause disease on more than 300 plant species. The broad host range of the genus contrasts with stringent host and tissue specificity for individual species and pathovars. Whole-genome sequences of Xanthomonas campestris pv. raphani strain 756C and X. oryzae pv. oryzicola strain BLS256, pathogens that infect the mesophyll tissue of the leading models for plant biology, Arabidopsis thaliana and rice, respectively, were determined and provided insight into the genetic determinants of host and tissue specificity. Comparisons were made with genomes of closely related strains that infect the vascular tissue of the same hosts and across a larger collection of complete Xanthomonas genomes. The results suggest a model in which complex sets of adaptations at the level of gene content account for host specificity and subtler adaptations at the level of amino acid or noncoding regulatory nucleotide sequence determine tissue specificity.
- Published
- 2011
41. Comparative Genomics of Trypanosomatid Parasitic Protozoa
- Author
-
El-Sayed, Najib M., Myler, Peter J., Blandin, Gaëlle, Berriman, Matthew, Crabtree, Jonathan, Aggarwal, Gautam, Caler, Elisabet, Renauld, Hubert, Worthey, Elizabeth A., Hertz-Fowler, Christiane, Ghedin, Elodie, Peacock, Christopher, Bartholomeu, Daniella C., Haas, Brian J., Tran, Anh-Nhi, Wortman, Jennifer R., Angiuoli, Samuel, Anupama, Atashi, Badger, Jonathan, Bringaud, Frederic, Cadag, Eithon, Carlton, Jane M., Cerqueira, Gustavo C., Creasy, Todd, Delcher, Arthur L., Djikeng, Appolinaire, Embley, T. Martin, Hauser, Christopher, Ivens, Alasdair C., Kummerfeld, Sarah K., Pereira-Leal, Jose B., Nilsson, Daniel, Peterson, Jeremy, Salzberg, Steven L., Shallom, Joshua, Silva, Joana C., Sundaram, Jaideep, Westenberger, Scott, White, Owen, Melville, Sara E., Donelson, John E., Andersson, Björn, Stuart, Kenneth D., and Hall, Neil
- Published
- 2005
42. The Genome of the African Trypanosome Trypanosoma brucei
- Author
-
Berriman, Matthew, Ghedin, Elodie, Hertz-Fowler, Christiane, Blandin, Gaëlle, Renauld, Hubert, Bartholomeu, Daniella C., Lennard, Nicola J., Caler, Elisabet, Hamlin, Nancy E., Haas, Brian, Böhme, Ulrike, Hannick, Linda, Aslett, Martin A., Shallom, Joshua, Marcello, Lucio, Hou, Lihua, Wickstead, Bill, Arrowsmith, Claire, Atkin, Rebecca J., Barron, Andrew J., Bringaud, Frederic, Brooks, Karen, Carrington, Mark, Cherevach, Inna, Chillingworth, Tracey-Jane, Churcher, Carol, Clark, Louise N., Corton, Craig H., Cronin, Ann, Davies, Rob M., Doggett, Jonathon, Djikeng, Appolinaire, Feldblyum, Tamara, Field, Mark C., Fraser, Audrey, Goodhead, Ian, Hance, Zahra, Harper, David, Harris, Barbara R., Hauser, Heidi, Hostetler, Jessica, Ivens, Al, Jagels, Kay, Johnson, David, Johnson, Justin, Jones, Kristine, Kerhornou, Arnaud X., Koo, Hean, Larke, Natasha, Landfear, Scott, Larkin, Christopher, Leech, Vanessa, Line, Alexandra, Lord, Angela, MacLeod, Annette, Mooney, Paul J., Moule, Sharon, Morgan, Gareth W., Mungall, Karen, Norbertczak, Halina, Ormond, Doug, Pai, Grace, Peacock, Chris S., Peterson, Jeremy, Quail, Michael A., Rabbinowitsch, Ester, Rajandream, Marie-Adele, Reitter, Chris, Salzberg, Steven L., Sanders, Mandy, Schobel, Seth, Sharp, Sarah, Simmonds, Mark, Simpson, Anjana J., Tallon, Luke, Tait, Andrew, Tivey, Adrian R., Van Aken, Susan, Walker, Danielle, Wanless, David, Wang, Shiliang, White, Brian, White, Owen, Whitehead, Sally, Woodward, John, Wortman, Jennifer, Adams, Mark D., Embley, T. Martin, Gull, Keith, Ullu, Elisabetta, Barry, J. David, Fairlamb, Alan H., Opperdoes, Fred, Barrell, Barclay G., Donelson, John E., Hall, Neil, Fraser, Claire M., Melville, Sara E., and El-Sayed, Najib M.
- Published
- 2005
43. The Genome Sequence of Trypanosoma cruzi, Etiologic Agent of Chagas Disease
- Author
-
El-Sayed, Najib M., Myler, Peter J., Bartholomeu, Daniella C., Nilsson, Daniel, Aggarwal, Gautam, Tran, Anh-Nhi, Ghedin, Elodie, Worthey, Elizabeth A., Delcher, Arthur L., Blandin, Gaëlle, Westenberger, Scott J., Caler, Elisabet, Cerqueira, Gustavo C., Anupama, Atashi, Arner, Erik, Åslund, Lena, Attipoe, Philip, Bontempi, Esteban, Bringaud, Frédéric, Burton, Peter, Cadag, Eithon, Campbell, David A., Carrington, Mark, Crabtree, Jonathan, Darban, Hamid, da Silveira, Jose Franco, de Jong, Pieter, Edwards, Kimberly, Englund, Paul T., Fazelina, Gholam, Feldblyum, Tamara, Ferella, Marcela, Frasch, Alberto Carlos, Gull, Keith, Horn, David, Hou, Lihua, Huang, Yiting, Kindlund, Ellen, Klingbeil, Michele, Kluge, Sindy, Koo, Hean, Lacerda, Daniela, Levin, Mariano J., Lorenzi, Hernan, Louie, Tin, Machado, Carlos Renato, McCulloch, Richard, McKenna, Alan, Mizuno, Yumi, Mottram, Jeremy C., Nelson, Siri, Ochaya, Stephen, Osoegawa, Kazutoyo, Pai, Grace, Parsons, Marilyn, Pentony, Martin, Pettersson, Ulf, Pop, Mihai, Ramirez, Jose Luis, Rinta, Joel, Robertson, Laura, Salzberg, Steven L., Sanchez, Daniel O., Seyler, Amber, Sharma, Reuben, Shetty, Jyoti, Simpson, Anjana J., Sisk, Ellen, Tammi, Martti T., Tarleton, Rick, Teixeira, Santuza, Van Aken, Susan, Vogt, Christy, Ward, Pauline N., Wickstead, Bill, Wortman, Jennifer, White, Owen, Fraser, Claire M., Stuart, Kenneth D., and Andersson, Björn
- Published
- 2005
44. Genome Sequence of Theileria parva, a Bovine Pathogen That Transforms Lymphocytes
- Author
-
Gardner, Malcolm J., Bishop, Richard, Shah, Trushar, de Villiers, Etienne P., Carlton, Jane M., Hall, Neil, Ren, Qinghu, Paulsen, Ian T., Pain, Arnab, Berriman, Matthew, Sato, Shigeharu, Ralph, Stuart A., Mann, David J., Xiong, Zikai, Shallom, Shamira J., Weidman, Janice, Jiang, Lingxia, Lynn, Jeffery, Weaver, Bruce, Shoaibi, Azadeh, Domingo, Alexander R., Wasawo, Delia, Crabtree, Jonathan, Wortman, Jennifer R., Haas, Brian, Angiuoli, Samuel V., Creasy, Todd H., Lu, Charles, Suh, Bernard, Silva, Joana C., Utterback, Teresa R., Feldblyum, Tamara V., Pertea, Mihaela, Allen, Jonathan, Nierman, William C., Salzberg, Steven L., White, Owen R., Fitzhugh, Henry A., Morzaria, Subhash, Venter, J. Craig, Fraser, Claire M., and Nene, Vishvanath
- Published
- 2005
45. The Genome of the Basidiomycetous Yeast and Human Pathogen Cryptococcus neoformans
- Author
-
Loftus, Brendan J., Fung, Eula, Roncaglia, Paola, Rowley, Don, Amedeo, Paolo, Bruno, Dan, Vamathevan, Jessica, Miranda, Molly, Anderson, Iain J., Fraser, James A., Allen, Jonathan E., Bosdet, Ian E., Brent, Michael R., Chiu, Readman, Doering, Tamara L., Donlin, Maureen J., D'Souza, Cletus A., Fox, Deborah S., Grinberg, Viktoriya, Fu, Jianmin, Fukushima, Marilyn, Haas, Brian J., Huang, James C., Janbon, Guilhem, Koo, Hean L., Krzywinski, Martin I., Kwon-Chung, June K., Lengeler, Klaus B., Maiti, Rama, Marra, Marco A., Marra, Robert E., Mathewson, Carrie A., Mitchell, Thomas G., Pertea, Mihaela, Riggs, Florenta R., Salzberg, Steven L., Schein, Jacqueline E., Shvartsbeyn, Alla, Shin, Heesun, Shumway, Martin, Specht, Charles A., Suh, Bernard B., Tenney, Aaron, Utterback, Terry R., Wickes, Brian L., Wortman, Jennifer R., Wye, Natasja H., Kronstad, James W., Lodge, Jennifer K., Heitman, Joseph, Davis, Ronald W., Fraser, Claire M., and Hyman, Richard W.
- Published
- 2005
46. Splam: a deep-learning-based splice site predictor that improves spliced alignments
- Author
-
Chao, Kuan-Hao, primary, Mao, Alan, additional, Salzberg, Steven L, additional, and Pertea, Mihaela, additional
- Published
- 2023
- Full Text
- View/download PDF
47. Genome sequence and rapid evolution of the rice pathogen Xanthomonas oryzae pv. oryzae PXO99A.
- Author
-
Salzberg, Steven L, Sommer, Daniel D, Schatz, Michael C, Phillippy, Adam M, Rabinowicz, Pablo D, Tsuge, Seiji, Furutani, Ayako, Ochiai, Hirokazu, Delcher, Arthur L, Kelley, David, Madupu, Ramana, Puiu, Daniela, Radune, Diana, Shumway, Martin, Trapnell, Cole, Aparna, Gudlur, Jha, Gopaljee, Pandey, Alok, Patil, Prabhu B, Ishihara, Hiromichi, Meyer, Damien F, Szurek, Boris, Verdier, Valerie, Koebnik, Ralf, Dow, J Maxwell, Ryan, Robert P, Hirata, Hisae, Tsuyumu, Shinji, Won Lee, Sang, Seo, Young-Su, Sriariyanum, Malinee, Ronald, Pamela C, Sonti, Ramesh V, Van Sluys, Marie-Anne, Leach, Jan E, White, Frank F, and Bogdanove, Adam J
- Abstract
Xanthomonas oryzae pv. oryzae causes bacterial blight of rice (Oryza sativa L.), a major disease that constrains production of this staple crop in many parts of the world. We report here on the complete genome sequence of strain PXO99A and its comparison to two previously sequenced strains, KACC10331 and MAFF311018, which are highly similar to one another.The PXO99A genome is a single circular chromosome of 5,240,075 bp, considerably longer than the genomes of the other strains (4,941,439 bp and 4,940,217 bp, respectively), and it contains 5083 protein-coding genes, including 87 not found in KACC10331 or MAFF311018. PXO99A contains a greater number of virulence-associated transcription activator-like effector genes and has at least ten major chromosomal rearrangements relative to KACC10331 and MAFF311018. PXO99A contains numerous copies of diverse insertion sequence elements, members of which are associated with 7 out of 10 of the major rearrangements. A rapidly-evolving CRISPR (clustered regularly interspersed short palindromic repeats) region contains evidence of dozens of phage infections unique to the PXO99A lineage. PXO99A also contains a unique, near-perfect tandem repeat of 212 kilobases close to the replication terminus.Our results provide striking evidence of genome plasticity and rapid evolution within Xanthomonas oryzae pv. oryzae. The comparisons point to sources of genomic variation and candidates for strain-specific adaptations of this pathogen that help to explain the extraordinary diversity of Xanthomonas oryzae pv. oryzae genotypes and races that have been isolated from around the world.
- Published
- 2008
48. The Genome Sequence of the Malaria Mosquito Anopheles gamblae
- Author
-
Holt, Robert A., Subramanian, G. Mani, Halpern, Aaron, Sutton, Granger G., Charlab, Rosane, Nusskern, Deborah R., Wincker, Patrick, Clark, Andrew G., Wides, Ron, Salzberg, Steven L., Loftus, Brendan, Yandell, Mark, Majoros, William H., Rusch, Douglas B., Lai, Zhongwu, Kraft, Cheryl L., Abril, Josep F., Anthouard, Veronique, Arensburger, Peter, Atkinson, Peter W., Baden, Holly, de Berardinis, Veronique, Baldwin, Danita, Benes, Vladimir, Biedler, Jim, Blass, Claudia, Bolanos, Randall, Boscus, Didier, Barnstead, Mary, Cai, Shuang, Chatuverdi, Kabir, Christophides, George K., Chrystal, Mathew A., Clamp, Michele, Cravchik, Anibal, Curwen, Val, Dana, Ali, Delcher, Art, Dew, Ian, Evans, Cheryl A., Flanigan, Michael, Grundschober-Freimoser, Anne, Friedli, Lisa, Gu, Zhiping, Guan, Ping, Guigo, Roderic, Hillenmeyer, Maureen E., Hladun, Susanne L., Hogan, James R., Hong, Young S., Hoover, Jeffrey, Jaillon, Olivier, Ke, Zhaoxi, Kodira, Chinnappa, Kokoza, Elena, Koutsos, Anastasios, Letunic, Ivica, Levitsky, Alex, Liang, Yong, Lin, Jhy-Jhu, Lobo, Neil F., Lopez, John R., Malek, Joel A., McIntosh, Tina C., Meister, Stephan, Miller, Jason, Mobarry, Clark, Mongin, Emmanuel, Murphy, Sean D., O'Brochta, David A., Pfannkoch, Cynthia, Qi, Rong, Regier, Megan A., Remington, Karin, Shao, Hongguang, Sharakhova, Maria V., Sitter, Cynthia D., Shetty, Jyoti, Smith, Thomas J., Strong, Renee, Sun, Jingtao, Thomasova, Dana, Ton, Lucas Q., Topalis, Pantelis, Tu, Zhijian, Unger, Maria F., Walenz, Brian, Wang, Aihui, Wang, Jian, Wang, Mei, Wang, Xuelan, Woodford, Kerry J., Wortman, Jennifer R., Wu, Martin, Yao, Alison, Zdobnov, Evgeny M., Zhang, Hongyu, Zhao, Qi, Zhao, Shaying, Zhu, Shiaoping C., Zhimulev, Igor, Coluzzi, Mario, della Torre, Alessandra, Roth, Charles W., Louis, Christos, Kalush, Francis, Mural, Richard J., Myers, Eugene W., Adams, Mark D., Smith, Hamilton O., Broder, Samuel, Gardner, Malcolm J., Fraser, Claire M., Birney, Ewan, Bork, Peer, Brey, Paul T., Venter, J. Craig, Weissenbach, Jean, Kafatos, Fotis C., Collins, Frank H., and Hoffman, Stephen L.
- Published
- 2002
49. Comparative Genome and Proteome Analysis of Anopheles gambiae and Drosophila melanogaster
- Author
-
Zdobnov, Evgeny M., von Mering, Christian, Letunic, Ivica, Torrents, David, Suyama, Mikita, Copley, Richard R., Christophides, George K., Thomasova, Dana, Holt, Robert A., Subramanian, G. Mani, Mueller, Hans-Michael, Dimopoulos, George, Law, John H., Wells, Michael A., Birney, Ewan, Charlab, Rosane, Halpern, Aaron L., Kokoza, Elena, Kraft, Cheryl L., Lai, Zhongwu, Lewis, Suzanna, Louis, Christos, Barillas-Mury, Carolina, Nusskern, Deborah, Rubin, Gerald M., Salzberg, Steven L., Sutton, Granger G., Topalis, Pantelis, Wides, Ron, Wincker, Patrick, Yandell, Mark, Collins, Frank H., Ribeiro, Jose, Gelbart, William M., Kafatos, Fotis C., and Bork, Peer
- Published
- 2002
50. The Brucella suis Genome Reveals Fundamental Similarities between Animal and Plant Pathogens and Symbionts
- Author
-
Paulsen, Ian T., Seshadri, Rekha, Nelson, Karen E., Eisen, Jonathan A., Heidelberg, John F., Read, Timothy D., Dodson, Robert J., Umayam, Lowell, Brinkac, Lauren M., Beanan, Maureen J., Daugherty, Sean C., Deboy, Robert T., Durkin, A. Scott, Kolonay, James F., Madupu, Ramana, Nelson, William C., Ayodeji, Bola, Kraul, Margaret, Shetty, Jyoti, Malek, Joel, van Aken, Susan E., Riedmuller, Steven, Tettelin, Herve, Gill, Steven R., White, Owen, Salzberg, Steven L., Hoover, David L., Lindler, Luther E., Halling, Shirley M., Boyle, Stephen M., and Fraser, Claire M.
- Published
- 2002
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.