41 results on '"Jeremy D. Selengut"'
Search Results
2. Expanding the scope of plant genome engineering with Cas12a orthologs and highly multiplexable editing systems
- Author
-
Yanhao Cheng, Stephen M. Mount, Yiping Qi, Desuo Yin, Aimee Malzahn, Yingxiao Zhang, Jeremy D. Selengut, Mingzhu Yuan, Lan Huang, Changtian Pan, Xuelian Zheng, Bailey McCoy, Jianping Zhou, Qing Fang, Yong Zhang, Lidiya Franklin, Qiurong Ren, Jiaheng Wang, Han Yang, Li Tian, Ysa Le, Shishi Liu, Qiudeng Que, Xu Tang, and Yuxin Zhao
- Subjects
Crops, Agricultural ,CRISPR-Cas9 genome editing ,0106 biological sciences ,0301 basic medicine ,Science ,CRISPR-Associated Proteins ,Arabidopsis ,General Physics and Astronomy ,Sequence alignment ,Computational biology ,01 natural sciences ,Genome ,Article ,General Biochemistry, Genetics and Molecular Biology ,Genome engineering ,03 medical and health sciences ,Bacterial Proteins ,Genome editing ,CRISPR-Associated Protein 9 ,Humans ,Clustered Regularly Interspaced Short Palindromic Repeats ,Multiplex ,Gene ,Alleles ,Gene Editing ,Endodeoxyribonucleases ,Multidisciplinary ,Base Sequence ,biology ,Scope (project management) ,food and beverages ,Oryza ,Genomics ,General Chemistry ,Plants, Genetically Modified ,biology.organism_classification ,Isoenzymes ,030104 developmental biology ,Agrobacterium tumefaciens ,Genetic engineering ,CRISPR-Cas Systems ,Sequence Alignment ,Genome, Plant ,RNA, Guide, Kinetoplastida ,010606 plant biology & botany - Abstract
CRISPR-Cas12a is a promising genome editing system for targeting AT-rich genomic regions. Comprehensive genome engineering requires simultaneous targeting of multiple genes at defined locations. Here, to expand the targeting scope of Cas12a, we screen nine Cas12a orthologs that have not been demonstrated in plants, and identify six, ErCas12a, Lb5Cas12a, BsCas12a, Mb2Cas12a, TsCas12a and MbCas12a, that possess high editing activity in rice. Among them, Mb2Cas12a stands out with high editing efficiency and tolerance to low temperature. An engineered Mb2Cas12a-RVRR variant enables editing with more relaxed PAM requirements in rice, yielding two times higher genome coverage than the wild type SpCas9. To enable large-scale genome engineering, we compare 12 multiplexed Cas12a systems and identify a potent system that exhibits nearly 100% biallelic editing efficiency with the ability to target as many as 16 sites in rice. This is the highest level of multiplex edits in plants to date using Cas12a. Two compact single transcript unit CRISPR-Cas12a interference systems are also developed for multi-gene repression in rice and Arabidopsis. This study greatly expands the targeting scope of Cas12a for crop genome engineering., CRISPR-Cas12a is a promising system for targeting AT-rich regions of the genome. Here the authors identify Cas12a orthologs with expanded targeting scope and develop a highly multiplexable editing system in rice.
- Published
- 2021
3. Whole genome analysis of Leptospira licerasiae provides insight into leptospiral evolution and pathogenicity.
- Author
-
Jessica N Ricaldi, Derrick E Fouts, Jeremy D Selengut, Derek M Harkins, Kailash P Patra, Angelo Moreno, Jason S Lehmann, Janaki Purushe, Ravi Sanka, Michael Torres, Nicholas J Webster, Joseph M Vinetz, and Michael A Matthias
- Subjects
Arctic medicine. Tropical medicine ,RC955-962 ,Public aspects of medicine ,RA1-1270 - Abstract
The whole genome analysis of two strains of the first intermediately pathogenic leptospiral species to be sequenced (Leptospira licerasiae strains VAR010 and MMD0835) provides insight into their pathogenic potential and deepens our understanding of leptospiral evolution. Comparative analysis of eight leptospiral genomes shows the existence of a core leptospiral genome comprising 1547 genes and 452 conserved genes restricted to infectious species (including L. licerasiae) that are likely to be pathogenicity-related. Comparisons of the functional content of the genomes suggests that L. licerasiae retains several proteins related to nitrogen, amino acid and carbohydrate metabolism which might help to explain why these Leptospira grow well in artificial media compared with pathogenic species. L. licerasiae strains VAR010(T) and MMD0835 possess two prophage elements. While one element is circular and shares homology with LE1 of L. biflexa, the second is cryptic and homologous to a previously identified but unnamed region in L. interrogans serovars Copenhageni and Lai. We also report a unique O-antigen locus in L. licerasiae comprised of a 6-gene cluster that is unexpectedly short compared with L. interrogans in which analogous regions may include >90 such genes. Sequence homology searches suggest that these genes were acquired by lateral gene transfer (LGT). Furthermore, seven putative genomic islands ranging in size from 5 to 36 kb are present also suggestive of antecedent LGT. How Leptospira become naturally competent remains to be determined, but considering the phylogenetic origins of the genes comprising the O-antigen cluster and other putative laterally transferred genes, L. licerasiae must be able to exchange genetic material with non-invasive environmental bacteria. The data presented here demonstrate that L. licerasiae is genetically more closely related to pathogenic than to saprophytic Leptospira and provide insight into the genomic bases for its infectiousness and its unique antigenic characteristics.
- Published
- 2012
- Full Text
- View/download PDF
4. Correction: Comparative Genomics of Emerging Human Ehrlichiosis Agents.
- Author
-
Julie C Dunning Hotopp, Mingqun Lin, Ramana Madupu, Jonathan Crabtree, Samuel V Angiuoli, Jonathan A Eisen, Rekha Seshadri, Qinghu Ren, Martin Wu, Teresa R Utterback, Shannon Smith, Matthew Lewis, Hoda Khouri, Chunbin Zhang, Hua Niu, Quan Lin, Norio Ohashi, Ning Zhi, William Nelson, Lauren M Brinkac, Robert J Dodson, M. J Rosovitz, Jaideep Sundaram, Sean C Daugherty, Tanja Davidsen, Anthony S Durkin, Michelle Gwinn, Daniel H Haft, Jeremy D Selengut, Steven A Sullivan, Nikhat Zafar, Liwei Zhou, Faiza Benahmed, Heather Forberger, Rebecca Halpin, Stephanie Mulligan, Jeffrey Robinson, Owen White, Yasuko Rikihisa, and Hervé Tettelin
- Subjects
Genetics ,QH426-470 - Published
- 2006
- Full Text
- View/download PDF
5. Comparative genomics of emerging human ehrlichiosis agents.
- Author
-
Julie C Dunning Hotopp, Mingqun Lin, Ramana Madupu, Jonathan Crabtree, Samuel V Angiuoli, Jonathan A Eisen, Rekha Seshadri, Qinghu Ren, Martin Wu, Teresa R Utterback, Shannon Smith, Matthew Lewis, Hoda Khouri, Chunbin Zhang, Hua Niu, Quan Lin, Norio Ohashi, Ning Zhi, William Nelson, Lauren M Brinkac, Robert J Dodson, M J Rosovitz, Jaideep Sundaram, Sean C Daugherty, Tanja Davidsen, Anthony S Durkin, Michelle Gwinn, Daniel H Haft, Jeremy D Selengut, Steven A Sullivan, Nikhat Zafar, Liwei Zhou, Faiza Benahmed, Heather Forberger, Rebecca Halpin, Stephanie Mulligan, Jeffrey Robinson, Owen White, Yasuko Rikihisa, and Hervé Tettelin
- Subjects
Genetics ,QH426-470 - Abstract
Anaplasma (formerly Ehrlichia) phagocytophilum, Ehrlichia chaffeensis, and Neorickettsia (formerly Ehrlichia) sennetsu are intracellular vector-borne pathogens that cause human ehrlichiosis, an emerging infectious disease. We present the complete genome sequences of these organisms along with comparisons to other organisms in the Rickettsiales order. Ehrlichia spp. and Anaplasma spp. display a unique large expansion of immunodominant outer membrane proteins facilitating antigenic variation. All Rickettsiales have a diminished ability to synthesize amino acids compared to their closest free-living relatives. Unlike members of the Rickettsiaceae family, these pathogenic Anaplasmataceae are capable of making all major vitamins, cofactors, and nucleotides, which could confer a beneficial role in the invertebrate vector or the vertebrate host. Further analysis identified proteins potentially involved in vacuole confinement of the Anaplasmataceae, a life cycle involving a hematophagous vector, vertebrate pathogenesis, human pathogenesis, and lack of transovarial transmission. These discoveries provide significant insights into the biology of these obligate intracellular pathogens.
- Published
- 2006
- Full Text
- View/download PDF
6. Expanding plant genome-editing scope by an engineered iSpyMacCas9 system that targets A-rich PAM sequences
- Author
-
Yiping Qi, Simon Sretenovic, Desuo Yin, Jeremy D. Selengut, Adam Levav, and Stephen M. Mount
- Subjects
Computer science ,Mutagenesis (molecular biology technique) ,Plant Science ,Computational biology ,Zea mays ,Biochemistry ,cytosine base editing ,Genome editing ,adenine base editing ,CRISPR ,Resource Article ,Molecular Biology ,Triticum ,Gateway cloning ,Gene Editing ,CRISPR interference ,Scope (project management) ,Oryza ,Cell Biology ,plant genome editing ,Rna expression ,PAM ,CRISPR-Cas Systems ,iSpyMacCas9 ,Genome, Plant ,Biotechnology - Abstract
The most popular CRISPR-SpCas9 system recognizes canonical NGG protospacer adjacent motifs (PAMs). Previously engineered SpCas9 variants, such as Cas9-NG, favor G-rich PAMs in genome editing. In this manuscript, we describe a new plant genome-editing system based on a hybrid iSpyMacCas9 platform that allows for targeted mutagenesis, C to T base editing, and A to G base editing at A-rich PAMs. This study fills a major technology gap in the CRISPR-Cas9 system for editing NAAR PAMs in plants, which greatly expands the targeting scope of CRISPR-Cas9. Finally, our vector systems are fully compatible with Gateway cloning and will work with all existing single-guide RNA expression systems, facilitating easy adoption of the systems by others. We anticipate that more tools, such as prime editing, homology-directed repair, CRISPR interference, and CRISPR activation, will be further developed based on our promising iSpyMacCas9 platform., This study reports the development of an iSpyMacCas9 genome-editing toolbox for targeted mutagenesis, C to T base editing, and A to G base editing at A-rich PAM sites. While the tools are demonstrated in rice, they are expected to aid genome-editing applications in other plant species as well with a broadened targeting scope.
- Published
- 2021
- Full Text
- View/download PDF
7. InterPro: the integrative protein signature database
- Author
-
Elizabeth Kelly, Paul Thomas, Amos Marc Bairoch, Daniel Kahn, Teresa K. Attwood, Manjula Thimma, John Maslen, Corin Yeats, Darren A. Natale, Ivica Letunic, Nicola Mulder, Martin Madera, David Binns, Derek Wilson, Alex L. Mitchell, Julian Gough, Craig McAnulla, Rodrigo Lopez, Christian J. A. Sigrist, Lauranne Duquenne, Cathy H. Wu, Nicolas Hulo, Daniel H. Haft, Aurélie Laugraud, Jaina Mistry, Peer Bork, Sarah Hunter, Robert D. Finn, Rolf Apweiler, David M. Lonsdale, Ujjwal Das, Louise C. Daugherty, Christine A. Orengo, Franck Valentin, Alex Bateman, Jennifer McDowall, Jeremy D. Selengut, Antony F. Quinn, Bioinformatique, phylogénie et génomique évolutive (BPGE), Département PEGASE [LBBE] (PEGASE), Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS), Biotechnology and Biological Sciences Research Council BB/F010508/1, National Institute of Health GM081084, European Project: 213037, and MDC Library
- Subjects
0106 biological sciences ,InterPro ,[SDV.OT]Life Sciences [q-bio]/Other [q-bio.OT] ,Interface (Java) ,computer.internet_protocol ,Protein Sequence Analysis ,570 Life Sciences ,Biology ,PROSITE ,computer.software_genre ,01 natural sciences ,610 Medical Sciences, Medicine ,03 medical and health sciences ,Sequence Analysis, Protein ,TIGRFAMs ,Genetics ,Protein Databases ,ddc:576 ,Databases, Protein ,030304 developmental biology ,Proteins/chemistry/classification ,0303 health sciences ,Database ,Databases ,Protein ,Proteins ,Articles ,Sequence Analysis ,Systems Integration ,Cardiovascular and Metabolic Diseases ,UniProt ,Web service ,computer ,XML ,Autre (Sciences du Vivant) ,010606 plant biology & botany ,InterProScan - Abstract
The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or 'signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total approximately 58,000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).
- Published
- 2017
- Full Text
- View/download PDF
8. Genome sequence of the human malaria parasite Plasmodium falciparum
- Author
-
Man Suen Chan, Alan H. Fairlamb, Jeremy D. Selengut, Leda M. Cummings, Arnab Pain, Vishvanath Nene, Martin Fraunholz, G. Mani Subramanian, Christopher J. Mungall, Akhil B. Vaidya, Bart Barrell, Jane M. Carlton, Jeremy Peterson, Chris I. Newbold, Richard W. Hyman, Daniel J. Carucci, Claire M. Fraser, Neil Hall, Matthew Berriman, Michael W. Mather, David S. Roos, Jonathan A. Eisen, Ronald W. Davis, Sue Kyes, Jonathan E. Allen, Eula Fung, Kim Rutherford, Geoffrey I. McFadden, Samuel V. Angiuoli, Karen E. Nelson, Owen White, J. Craig Venter, Alister Craig, Steven L. Salzberg, Shamira J. Shallom, Mihaela Pertea, David M. A. Martin, Stuart A. Ralph, Bernard B. Suh, Daniel H. Haft, Keith D. James, Ian T. Paulsen, Stephen L. Hoffman, Sharen Bowman, and Malcolm J. Gardner
- Subjects
DNA Replication ,Genome evolution ,DNA Repair ,Proteome ,Molecular Sequence Data ,Plasmodium falciparum ,Protozoan Proteins ,Genomics ,Biology ,Genome ,Article ,Evolution, Molecular ,parasitic diseases ,Malaria Vaccines ,Animals ,Humans ,Plastids ,Malaria, Falciparum ,Gene ,Genetics ,Recombination, Genetic ,Pregnancy-associated malaria ,Apicoplast ,Multidisciplinary ,Membrane Transport Proteins ,Sequence Analysis, DNA ,DNA, Protozoan ,biology.organism_classification ,Chromosome Structures ,Genome, Protozoan - Abstract
The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.
- Published
- 2016
9. Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage
- Author
-
Robert Friedman, Jeremy D. Selengut, R. Alexander Richter, Daniel H. Haft, Aaron L. Halpern, Roger S. Lasken, J. Craig Venter, Mary-Jane Lombardo, Mark Novotny, Douglas B. Rusch, Christopher L. Dupont, Shibu Yooseph, Ruben E. Valas, Joyclyn Yee-Greenbaum, and Kenneth H. Nealson
- Subjects
Rhodopsin ,proteorhodopsin ,tonB receptors ,Oceans and Seas ,Lineage (evolution) ,SAR86 ,Biology ,Microbiology ,Genome ,Phylogenetics ,RNA, Ribosomal, 16S ,Rhodopsins, Microbial ,Seawater ,Genomic library ,single cell genomics ,Phylogeny ,Ecology, Evolution, Behavior and Systematics ,Genomic Library ,Proteorhodopsin ,Ecology ,SAR11 ,Computational Biology ,Ribosomal RNA ,Plankton ,Evolutionary biology ,Metagenomics ,biology.protein ,Original Article ,metagenomic assembly ,Bacterial outer membrane ,Gammaproteobacteria ,Genome, Bacterial - Abstract
Bacteria in the 16S rRNA clade SAR86 are among the most abundant uncultivated constituents of microbial assemblages in the surface ocean for which little genomic information is currently available. Bioinformatic techniques were used to assemble two nearly complete genomes from marine metagenomes and single-cell sequencing provided two more partial genomes. Recruitment of metagenomic data shows that these SAR86 genomes substantially increase our knowledge of non-photosynthetic bacteria in the surface ocean. Phylogenomic analyses establish SAR86 as a basal and divergent lineage of γ-proteobacteria, and the individual genomes display a temperature-dependent distribution. Modestly sized at 1.25-1.7 Mbp, the SAR86 genomes lack several pathways for amino-acid and vitamin synthesis as well as sulfate reduction, trends commonly observed in other abundant marine microbes. SAR86 appears to be an aerobic chemoheterotroph with the potential for proteorhodopsin-based ATP generation, though the apparent lack of a retinal biosynthesis pathway may require it to scavenge exogenously-derived pigments to utilize proteorhodopsin. The genomes contain an expanded capacity for the degradation of lipids and carbohydrates acquired using a wealth of tonB-dependent outer membrane receptors. Like the abundant planktonic marine bacterial clade SAR11, SAR86 exhibits metabolic streamlining, but also a distinct carbon compound specialization, possibly avoiding competition.
- Published
- 2011
- Full Text
- View/download PDF
10. Three Genomes from the Phylum Acidobacteria Provide Insight into the Lifestyles of These Microorganisms in Soils
- Author
-
Anuradha Ganapathy, Naomi L. Ward, Qinghu Ren, Susmita Shrivastava, Cliff Han, Jonathan H. Badger, Pedro M. Coutinho, Chunhui Yu, Kevin Penn, Liwei Zhou, Gary Xie, Jeremy D. Selengut, M. J. Rosovitz, Thomas Brettin, Bernard Henrissat, Todd Creasy, Martin Wu, Lauren M. Brinkac, William C. Nelson, A. Scott Durkin, Robert T. DeBoy, Brent Bradley, Karen E. Nelson, Peter H. Janssen, Hoda Khouri, Hajnalka Kiss, Steven A. Sullivan, J. Chris Detter, Jean F. Challacombe, Roxanne Tapia, Kisha Watkins, Cheryl R. Kuske, Tanja M. Davidsen, Ramana Madupu, David Bruce, Robert J. Dodson, L. Sue Thompson, Michelle Sait, Daniel H. Haft, Ian T. Paulsen, Ravi D. Barabote, Qi Yang, Sean C. Daugherty, Sagar Kothari, Nikhat Zafar, and Michelle Gwinn-Giglio
- Subjects
DNA, Bacterial ,Siderophore ,Nitrogen ,Molecular Sequence Data ,Sequence Homology ,Cyanobacteria ,Applied Microbiology and Biotechnology ,Genome ,Phylogenetics ,Proteobacteria ,Evolutionary and Genomic Microbiology ,Phylogeny ,Soil Microbiology ,Genetics ,Bacteria ,Ecology ,biology ,Phylum ,Fungi ,Biological Transport ,Sequence Analysis, DNA ,biology.organism_classification ,Major facilitator superfamily ,Anti-Bacterial Agents ,Biochemistry ,Carbohydrate Metabolism ,Macrolides ,Soil microbiology ,Genome, Bacterial ,Food Science ,Biotechnology ,Acidobacteria - Abstract
The complete genomes of three strains from the phylum Acidobacteria were compared. Phylogenetic analysis placed them as a unique phylum. They share genomic traits with members of the Proteobacteria , the Cyanobacteria , and the Fungi. The three strains appear to be versatile heterotrophs. Genomic and culture traits indicate the use of carbon sources that span simple sugars to more complex substrates such as hemicellulose, cellulose, and chitin. The genomes encode low-specificity major facilitator superfamily transporters and high-affinity ABC transporters for sugars, suggesting that they are best suited to low-nutrient conditions. They appear capable of nitrate and nitrite reduction but not N 2 fixation or denitrification. The genomes contained numerous genes that encode siderophore receptors, but no evidence of siderophore production was found, suggesting that they may obtain iron via interaction with other microorganisms. The presence of cellulose synthesis genes and a large class of novel high-molecular-weight excreted proteins suggests potential traits for desiccation resistance, biofilm formation, and/or contribution to soil structure. Polyketide synthase and macrolide glycosylation genes suggest the production of novel antimicrobial compounds. Genes that encode a variety of novel proteins were also identified. The abundance of acidobacteria in soils worldwide and the breadth of potential carbon use by the sequenced strains suggest significant and previously unrecognized contributions to the terrestrial carbon cycle. Combining our genomic evidence with available culture traits, we postulate that cells of these isolates are long-lived, divide slowly, exhibit slow metabolic rates under low-nutrient conditions, and are well equipped to tolerate fluctuations in soil hydration.
- Published
- 2009
- Full Text
- View/download PDF
11. Genome sequence and identification of candidate vaccine antigens from the animal pathogen Dichelobacter nodosus
- Author
-
Hervé Tettelin, Hoda Khouri, Keith Al-Hasani, Ramana Madupu, Ben Adler, Xiaoyan Han, Julian I. Rood, Jonathan H. Badger, Ian T. Paulsen, Steven P. Bottomley, J. Glenn Songer, Victoria McCarl, Tara Holley, Garry S. A. Myers, Richard Whittington, Qinghu Ren, Jeremy D. Selengut, William C. Nelson, Robert T. DeBoy, Ruth M. Kennan, John D. Boyce, Yasmin Ali Mohamoud, Nadia Fedorova, Torsten Seemann, and Dane Parker
- Subjects
Genetics ,Whole genome sequencing ,Genome evolution ,biology ,Intracellular parasite ,Biomedical Engineering ,Chromosome Mapping ,Virulence ,Bioengineering ,Sequence Analysis, DNA ,Dichelobacter nodosus ,biology.organism_classification ,Applied Microbiology and Biotechnology ,Genome ,Horizontal gene transfer ,Animals ,Molecular Medicine ,Antigens ,Foot Rot ,Pathogen ,Genome, Bacterial ,Biotechnology - Abstract
Dichelobacter nodosus causes ovine footrot, a disease that leads to severe economic losses in the wool and meat industries. We sequenced its 1.4-Mb genome, the smallest known genome of an anaerobe. It differs markedly from small genomes of intracellular bacteria, retaining greater biosynthetic capabilities and lacking any evidence of extensive ongoing genome reduction. Comparative genomic microarray studies and bioinformatic analysis suggested that, despite its small size, almost 20% of the genome is derived from lateral gene transfer. Most of these regions seem to be associated with virulence. Metabolic reconstruction indicated unsuspected capabilities, including carbohydrate utilization, electron transfer and several aerobic pathways. Global transcriptional profiling and bioinformatic analysis enabled the prediction of virulence factors and cell surface proteins. Screening of these proteins against ovine antisera identified eight immunogenic proteins that are candidate antigens for a cross-protective vaccine. © 2007 Nature Publishing Group.
- Published
- 2007
- Full Text
- View/download PDF
12. New developments in the InterPro database
- Author
-
David Lonsdale, Rodrigo Lopez, Rolf Apweiler, Richard R. Copley, Emmanuel Courcelle, Robert Petryszak, Alberto Labarga, Alex Bateman, Jennifer McDowall, Anastasia N. Nikolskaya, Christine A. Orengo, Nicolas Hulo, Martin Madera, Franck Valentin, Daniel H. Haft, Robert D. Finn, David Binns, Petra S. Langendijk-Genevaux, Ujjwal Das, Nicola Mulder, Louise C. Daugherty, Anish Kejariwal, Julian Gough, Virginie Buillard, Wolfgang Fleischmann, Amos Marc Bairoch, Daniel Kahn, Mark Dibley, Peer Bork, Craig McAnulla, Teresa K. Attwood, Cathy H. Wu, Lorenzo Cerutti, Alexander Kanapin, Christian J. A. Sigrist, John Maslen, Corin Yeats, Sarah Hunter, Paul Thomas, Ivica Letunic, Derek Wilson, Jaina Mistry, Sandra Orchard, Alex L. Mitchell, Jeremy D. Selengut, Bioinformatique, phylogénie et génomique évolutive (BPGE), Département PEGASE [LBBE] (PEGASE), Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS), Computer science and genomics (HELIX), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), and Université de Lyon-Université de Lyon-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
InterPro ,Protein Structure ,[SDV.OT]Life Sciences [q-bio]/Other [q-bio.OT] ,Web server ,computer.internet_protocol ,Biology ,PROSITE ,computer.software_genre ,Databases ,User-Computer Interface ,03 medical and health sciences ,0302 clinical medicine ,Sequence Analysis, Protein ,TIGRFAMs ,Genetics ,ddc:576 ,Databases, Protein ,030304 developmental biology ,Internet ,0303 health sciences ,Database ,Protein ,Proteins ,Articles ,Protein Structure, Tertiary ,Systems Integration ,Cardiovascular and Metabolic Diseases ,[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,030220 oncology & carcinogenesis ,Proteins/chemistry/classification/physiology ,Web service ,UniProt ,Sequence Analysis ,computer ,Tertiary ,XML ,Autre (Sciences du Vivant) ,InterProScan - Abstract
InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). The InterProScan search tool is now also available via a web service at http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html.
- Published
- 2007
- Full Text
- View/download PDF
13. TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes
- Author
-
Anurhada Ganapathy, William C. Nelson, Owen White, Daniel H. Haft, R. Alexander Richter, Tanja M. Davidsen, Michelle Gwinn-Giglio, and Jeremy D. Selengut
- Subjects
Protein family ,Archaeal Proteins ,Genomics ,Context (language use) ,Computational biology ,Biology ,Genome ,User-Computer Interface ,03 medical and health sciences ,Annotation ,Bacterial Proteins ,TIGRFAMs ,Genetics ,Databases, Protein ,Hidden Markov model ,Phylogeny ,030304 developmental biology ,Comparative genomics ,Internet ,0303 health sciences ,030306 microbiology ,Articles ,Genome, Bacterial ,Software - Abstract
TIGRFAMs is a collection of protein family definitions built to aid in high-throughput annotation of specific protein functions. Each family is based on a hidden Markov model (HMM), where both cutoff scores and membership in the seed alignment are chosen so that the HMMs can classify numerous proteins according to their specific molecular functions. Most TIGRFAMs models describe 'equivalog' families, where both orthology and lateral gene transfer may be part of the evolutionary history, but where a single molecular function has been conserved. The Genome Properties system contains a queriable set of metabolic reconstructions, genome metrics and extractions of information from the scientific literature. Its genome-by-genome assertions of whether or not specific structures, pathways or systems are present provide high-level conceptual descriptions of genomic content. These assertions enable comparative genomics, provide a meaningful biological context to aid in manual annotation, support assignments of Gene Ontology (GO) biological process terms and help validate HMM-based predictions of protein function. The Genome Properties system is particularly useful as a generator of phylogenetic profiles, through which new protein family functions may be discovered. The TIGRFAMs and Genome Properties systems can be accessed at http://www.tigr.org/TIGRFAMs and http://www.tigr.org/Genome_Properties.
- Published
- 2007
- Full Text
- View/download PDF
14. Meeting Report:eGenomics: Cataloguing Our Complete Genome Collection II
- Author
-
Dawn Field, Norman Morrison, Jeremy D. Selengut, and Peter Sterk
- Subjects
Genetics ,Molecular Medicine ,Library science ,Computational biology ,Biology ,Molecular Biology ,Biochemistry ,Genome ,Biotechnology - Abstract
This article summarizes the proceedings of the "eGenomics: Cataloguing our Complete Genome Collection II" workshop held November 10–11, 2005, at the European Bioinformatics Institute. This explorat...
- Published
- 2006
- Full Text
- View/download PDF
15. eGenomics: Cataloguing our Complete Genome Collection
- Author
-
Tatiana Tatusova, Dawn Field, Jeremy D. Selengut, Norman Morrison, George M. Garrity, Peter Sterk, and Nicholas R. Thomson
- Subjects
Whole genome sequencing ,0303 health sciences ,lcsh:QH426-470 ,030306 microbiology ,Library science ,computer.software_genre ,Genome ,Genomic databases ,Metadata ,03 medical and health sciences ,lcsh:Genetics ,Geography ,lcsh:Biology (General) ,Metagenomics ,Genetics ,lcsh:Q ,Data mining ,lcsh:Science ,Molecular Biology ,computer ,lcsh:QH301-705.5 ,030304 developmental biology ,Biotechnology ,Research Article - Abstract
This meeting report summarizes the proceedings of the “eGenomics: Cataloguing our Complete Genome Collection III” workshop held September 11–13, 2006, at the National Institute for Environmental eScience (NIEeS), Cambridge, United Kingdom. This 3rd workshop of the Genomic Standards Consortium was divided into two parts. The first half of the three-day workshop was dedicated to reviewing the genomic diversity of our current and future genome and metagenome collection, and exploring linkages to a series of existing projects through formal presentations. The second half was dedicated to strategic discussions. Outcomes of the workshop include a revised “Minimum Information about a Genome Sequence” (MIGS) specification (v1.1), consensus on a variety of features to be added to the Genome Catalogue (GCat), agreement by several researchers to adopt MIGS for imminent genome publications, and an agreement by the EBI and NCBI to input their genome collections into GCat for the purpose of quantifying the amount of optional data already available (e.g., for geographic location coordinates) and working towards a single, global list of all public genomes and metagenomes.
- Published
- 2005
16. Genome sequence of Silicibacter pomeroyi reveals adaptations to the marine environment
- Author
-
Jane M. Carlton, Clay Fuqua, James R. Henriksen, Steven A. Sullivan, Qinghu Ren, Wenying Ye, Robert Belas, Robert T. DeBoy, Grace Pai, William C. Nelson, John F. Heidelberg, Mary Ann Moran, Elisha Rahe, Ian T. Paulsen, William B. Whitman, Jeremy D. Selengut, Ronald P. Kiene, Daniel H. Haft, Jonathan A. Eisen, Wade M. Sheldon, Matthew R. Lewis, A. Scott Durkin, Robert J. Dodson, Sean C. Daugherty, Lauren M. Brinkac, Shivani Johri, Ramana Madupu, Bruce Weaver, Gary M. King, Alison Buchan, Todd R. Miller, David A. Rasko, M. J. Rosovitz, José M. González, and Naomi L. Ward
- Subjects
Genetics ,Whole genome sequencing ,Multidisciplinary ,biology ,Oceans and Seas ,Ruegeria ,Molecular Sequence Data ,fungi ,Marine Biology ,Bacterioplankton ,Plankton ,Roseobacter ,biology.organism_classification ,Adaptation, Physiological ,Genome ,Genes, Bacterial ,RNA, Ribosomal, 16S ,Seawater ,Carrier Proteins ,Silicibacter pomeroyi ,Gene ,Genome, Bacterial ,Phylogeny - Abstract
Since the recognition of prokaryotes as essential components of the oceanic food web, bacterioplankton have been acknowledged as catalysts of most major biogeochemical processes in the sea. Studying heterotrophic bacterioplankton has been challenging, however, as most major clades have never been cultured or have only been grown to low densities in sea water. Here we describe the genome sequence of Silicibacter pomeroyi, a member of the marine Roseobacter clade (Fig. 1), the relatives of which comprise approximately 10-20% of coastal and oceanic mixed-layer bacterioplankton. This first genome sequence from any major heterotrophic clade consists of a chromosome (4,109,442 base pairs) and megaplasmid (491,611 base pairs). Genome analysis indicates that this organism relies upon a lithoheterotrophic strategy that uses inorganic compounds (carbon monoxide and sulphide) to supplement heterotrophy. Silicibacter pomeroyi also has genes advantageous for associations with plankton and suspended particles, including genes for uptake of algal-derived compounds, use of metabolites from reducing microzones, rapid growth and cell-density-dependent regulation. This bacterium has a physiology distinct from that of marine oligotrophs, adding a new strategy to the recognized repertoire for coping with a nutrient-poor ocean.
- Published
- 2004
- Full Text
- View/download PDF
17. Structural flexibility in the Burkholderia mallei genome
- Author
-
Tanja Davidsen, Robert J. Dodson, Hervé Tettelin, Michelle L. Gwinn, James F. Kolonay, Diana Radune, A. Scott Durkin, David DeShazer, Yan Yu, Jeremy D. Selengut, Karen E. Nelson, Saul H Sarria, Daniel H. Haft, Yasmin Mohammoud, Sean C. Daugherty, Lauren M. Brinkac, Owen White, Claire M. Fraser, Robert T. DeBoy, Catherine M. Ronning, George Dimitrov, Hoda Khouri, Nikhat Zafar, Christine Shamblin, Liwei Zhou, Steven A. Sullivan, William C. Nelson, H. Stanley Kim, Ramana Madupu, Claudia M. Romero, Ricky L. Ulrich, Tamara Feldblyum, and William C. Nierman
- Subjects
Genome evolution ,Molecular Sequence Data ,Burkholderia mallei ,Genome ,Open Reading Frames ,Cricetinae ,Antigenic variation ,medicine ,Animals ,Insertion sequence ,Oligonucleotide Array Sequence Analysis ,Genetics ,Whole genome sequencing ,Base Composition ,Multidisciplinary ,Base Sequence ,Mesocricetus ,Virulence ,biology ,Glanders ,Genome project ,Biological Sciences ,Chromosomes, Bacterial ,biology.organism_classification ,medicine.disease ,Liver ,Multigene Family ,Genome, Bacterial - Abstract
The complete genome sequence of Burkholderia mallei ATCC 23344 provides insight into this highly infectious bacterium's pathogenicity and evolutionary history. B. mallei , the etiologic agent of glanders, has come under renewed scientific investigation as a result of recent concerns about its past and potential future use as a biological weapon. Genome analysis identified a number of putative virulence factors whose function was supported by comparative genome hybridization and expression profiling of the bacterium in hamster liver in vivo . The genome contains numerous insertion sequence elements that have mediated extensive deletions and rearrangements of the genome relative to Burkholderia pseudomallei . The genome also contains a vast number (>12,000) of simple sequence repeats. Variation in simple sequence repeats in key genes can provide a mechanism for generating antigenic variation that may account for the mammalian host's inability to mount a durable adaptive immune response to a B. mallei infection.
- Published
- 2004
- Full Text
- View/download PDF
18. X-ray Crystal Structure of the Hypothetical Phosphotyrosine Phosphatase MDP-1 of the Haloacid Dehalogenase Superfamily
- Author
-
Debra Dunaway-Mariano, Ezra Peisach, Jeremy D. Selengut, and Karen N. Allen
- Subjects
Models, Molecular ,Hydrolases ,Molecular Sequence Data ,Crystal structure ,Crystallography, X-Ray ,Biochemistry ,Substrate Specificity ,Conserved sequence ,Mice ,Protein Phosphatase 1 ,Hydrolase ,Phosphoprotein Phosphatases ,Animals ,Humans ,Magnesium ,Amino Acid Sequence ,Phosphotyrosine ,Protein Structure, Quaternary ,Protein secondary structure ,chemistry.chemical_classification ,Binding Sites ,biology ,Chemistry ,Active site ,Substrate (chemistry) ,Hydrogen-Ion Concentration ,Ligand (biochemistry) ,Protein Structure, Tertiary ,Protein Phosphatase 2C ,Crystallography ,Enzyme ,Solvents ,biology.protein ,Sequence Alignment - Abstract
The haloacid dehalogenase (HAD) superfamily is comprised of structurally homologous enzymes that share several conserved sequence motifs (loops I-IV) in their active site. The majority of HAD members are phosphohydrolases and may be divided into three subclasses depending on domain organization. In classes I and II, a mobile "cap" domain reorients upon substrate binding, closing the active site to bulk solvent. Members of the third class lack this additional domain. Herein, we report the 1.9 A X-ray crystal structures of a member of the third subclass, magnesium-dependent phosphatase-1 (MDP-1) both in its unliganded form and with the product analogue, tungstate, bound to the active site. The secondary structure of MDP-1 is similar to that of the "core" domain of other type I and type II HAD members with the addition of a small, 28-amino acid insert that does not close down to exclude bulk solvent in the presence of ligand. In addition, the monomeric oligomeric state of MDP-1 does not allow the participation of a second subunit in the formation and solvent protection of the active site. The binding sites for the phosphate portion of the substrate and Mg(II) cofactor are also similar to those of other HAD members, with all previously observed contacts conserved. Unlike other subclass III HAD members, MDP-1 appears to be equally able to dephosphorylate phosphotyrosine and closed-ring phosphosugars. Modeling of possible substrates in the active site of MDP-1 reveals very few potential interactions with the substrate leaving group. The mapping of conserved residues in sequences of MDP-1 from different eukaryotic organisms reveals that they colocalize to a large region on the surface of the protein outside the active site. This observation combined with the modeling studies suggests that the target of MDP-1 is most likely a phosphotyrosine in an unknown protein rather than a small sugar-based substrate.
- Published
- 2004
- Full Text
- View/download PDF
19. Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics
- Author
-
Nikhat Zafar, Daniel H. Haft, Jeremy D. Selengut, Lauren M. Brinkac, and Owen White
- Subjects
Microbiological Techniques ,Statistics and Probability ,Proteome ,Information Storage and Retrieval ,Context (language use) ,Genomics ,Documentation ,Computational biology ,Biology ,Biochemistry ,Genome ,User-Computer Interface ,Protein Annotation ,Databases, Genetic ,Molecular Biology ,Gene ,Natural Language Processing ,Comparative genomics ,Genetics ,Gene Expression Profiling ,Chromosome Mapping ,Genome project ,Structural Classification of Proteins database ,Computer Science Applications ,Computational Mathematics ,Gene Expression Regulation ,Prokaryotic Cells ,Vocabulary, Controlled ,Computational Theory and Mathematics ,Database Management Systems ,Software ,Signal Transduction - Abstract
Motivation: The presence or absence of metabolic pathways and structures provide a context that makes protein annotation far more reliable. Compiling such information across microbial genomes improves the functional classification of proteins and provides a valuable resource for comparative genomics. Results: We have created a Genome Properties system to present key aspects of prokaryotic biology using standardized computational methods and controlled vocabularies. Properties reflect gene content, phenotype, phylogeny and computational analyses. The results of searches using hidden Markov models allow many properties to be deduced automatically, especially for families of proteins (equivalogs) conserved in function since their last common ancestor. Additional properties are derived from curation, published reports and other forms of evidence. Genome Properties system was applied to 156 complete prokaryotic genomes, and is easily mined to find differences between species, correlations between metabolic features and families of uncharacterized proteins, or relationships among properties. Availability: Genome Properties can be found at http://www.tigr.org/Genome_Properties Contact: selengut@tigr.org Supplementary information: http://www.tigr.org/tigr-scripts/CMR2/genome_properties_references.spl
- Published
- 2004
- Full Text
- View/download PDF
20. Whole genome comparisons of serotype 4b and 1/2a strains of the food-borne pathogen Listeria monocytogenes reveal new insights into the core genome components of this species
- Author
-
David A. Rasko, Nadia Fedorova, John B. Luchansky, Laura D. Wonderling, Derrick E. Fouts, A. Scott Durkin, Robert J. Dodson, Samuel V. Angiuoli, Sophia Kathariou, William C. Nelson, Robert T. DeBoy, Bao Tran, Jacques Ravel, Darrell O. Bayles, Steven R. Gill, Owen White, Sean C. Daugherty, Heather Forberger, Ian T. Paulsen, William C. Nierman, Susan Van Aken, Gaylen A. Uhlich, Daniel H. Haft, Jeremy Peterson, Jeremy D. Selengut, Hoda Khouri, Claire M. Fraser, Maureen J. Beanan, Lauren M. Brinkac, James F. Kolonay, Karen E. Nelson, Emmanuel F. Mongodin, and Ramana Madupu
- Subjects
Serotype ,Meat ,Operon ,Prophages ,Single-nucleotide polymorphism ,Biology ,medicine.disease_cause ,Polymorphism, Single Nucleotide ,Synteny ,Genome ,Microbiology ,Open Reading Frames ,Species Specificity ,Listeria monocytogenes ,Genetics ,medicine ,Serotyping ,Gene ,Base Composition ,Virulence ,Strain (biology) ,Genomics ,Articles ,Chromosomes, Bacterial ,Physical Chromosome Mapping ,Genes, Bacterial ,DNA Transposable Elements ,Food Microbiology ,Genome, Bacterial - Abstract
The genomes of three strains of Listeria monocytogenes that have been associated with food-borne illness in the USA were subjected to whole genome comparative analysis. A total of 51, 97 and 69 strain-specific genes were identified in L.monocytogenes strains F2365 (serotype 4b, cheese isolate), F6854 (serotype 1/2a, frankfurter isolate) and H7858 (serotype 4b, meat isolate), respectively. Eighty-three genes were restricted to serotype 1/2a and 51 to serotype 4b strains. These strain- and serotype-specific genes probably contribute to observed differences in pathogenicity, and the ability of the organisms to survive and grow in their respective environmental niches. The serotype 1/2a-specific genes include an operon that encodes the rhamnose biosynthetic pathway that is associated with teichoic acid biosynthesis, as well as operons for five glycosyl transferases and an adenine-specific DNA methyltransferase. A total of 8603 and 105 050 high quality single nucleotide polymorphisms (SNPs) were found on the draft genome sequences of strain H7858 and strain F6854, respectively, when compared with strain F2365. Whole genome comparative analyses revealed that the L.monocytogenes genomes are essentially syntenic, with the majority of genomic differences consisting of phage insertions, transposable elements and SNPs.
- Published
- 2004
- Full Text
- View/download PDF
21. The genome sequence of the anaerobic, sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough
- Author
-
Jonathan A. Eisen, George Dimitrov, Naomi L. Ward, A. Scott Durkin, Kevin Tran, Barbara A. Methé, Robert J. Dodson, Sean C. Daugherty, Tanja M. Davidsen, Robert T. DeBoy, Shelley A. Haveman, T. Utterback, Jeremy Peterson, James F. Kolonay, Derrick E. Fouts, Nikhat Zafar, Diana Radune, Claire M. Fraser, Rekha Seshadri, Mark Hance, Gerrit Voordouw, Daniel H. Haft, Ramana Madupu, Ian T. Paulsen, Christopher L. Hemme, Hoda Khouri, William C. Nelson, Steven A. Sullivan, Jeremy D. Selengut, John Gill, Lauren M. Brinkac, John F. Heidelberg, Liwei Zhou, Judy D. Wall, and Tamara Feldblyum
- Subjects
Anaerobic respiration ,biology ,Molecular Sequence Data ,Biomedical Engineering ,Bioengineering ,Periplasmic space ,biology.organism_classification ,Applied Microbiology and Biotechnology ,Genome ,Bioremediation ,Biochemistry ,Molecular Medicine ,Desulfovibrio vulgaris ,Sulfate-reducing bacteria ,Energy Metabolism ,Gene ,Genome, Bacterial ,Bacteria ,Biotechnology - Abstract
Desulfovibrio vulgaris Hildenborough is a model organism for studying the energy metabolism of sulfate-reducing bacteria (SRB) and for understanding the economic impacts of SRB, including biocorrosion of metal infrastructure and bioremediation of toxic metal ions. The 3,570,858 base pair (bp) genome sequence reveals a network of novel c-type cytochromes, connecting multiple periplasmic hydrogenases and formate dehydrogenases, as a key feature of its energy metabolism. The relative arrangement of genes encoding enzymes for energy transduction, together with inferred cellular location of the enzymes, provides a basis for proposing an expansion to the 'hydrogen-cycling' model for increasing energy efficiency in this bacterium. Plasmid-encoded functions include modification of cell surface components, nitrogen fixation and a type-III protein secretion system. This genome sequence represents a substantial step toward the elucidation of pathways for reduction (and bioremediation) of pollutants such as uranium and chromium and offers a new starting point for defining this organism's complex anaerobic respiration.
- Published
- 2004
- Full Text
- View/download PDF
22. Comparison of the genome of the oral pathogen Treponema denticola with other spirochete genomes
- Author
-
Jeremy D. Selengut, Jonathan A. Eisen, Rekha Seshadri, Lauren M. Brinkac, Robert J. Dodson, Keita Geer, Steven J. Norris, Sofiya Shatsman, Derrick E. Fouts, Hervé Tettelin, Garry S. A. Myers, Daniel H. Haft, Pankaj Vashisth, John F. Heidelberg, Elizabeth Gebregeorgis, Jyoti Shetty, Erica Sodergren, Getahun Tsegaye, Qin Xiang, George M. Weinstock, Jerrilyn K. Howell, Jamie Kolonay, David Šmajs, Ernesto Baca, Bola Ayodeji, Ramana Madupu, Sean C. Daugherty, Sangita Pal, Scott Durkin, Alla Shvartsbeyn, Tanja M. Davidsen, Robert T. DeBoy, Joel A. Malek, Claire M. Fraser, Thomas Z. McNeill, Ian T. Paulsen, Michael P. McLeod, Qinghu Ren, and Anita G. Amin
- Subjects
Molecular Sequence Data ,Genome ,Microbiology ,Bacterial Proteins ,Treponema ,Treponema pallidum ,Borrelia burgdorferi ,Gene ,Genetics ,Whole genome sequencing ,Mouth ,Multidisciplinary ,Base Sequence ,Sequence Homology, Amino Acid ,Models, Genetic ,biology ,Treponema denticola ,Biological Sciences ,biology.organism_classification ,stomatognathic diseases ,Genes, Bacterial ,Horizontal gene transfer ,ATP-Binding Cassette Transporters ,Leptospira interrogans ,Genome, Bacterial - Abstract
We present the complete 2,843,201-bp genome sequence of Treponema denticola (ATCC 35405) an oral spirochete associated with periodontal disease. Analysis of the T. denticola genome reveals factors mediating coaggregation, cell signaling, stress protection, and other competitive and cooperative measures, consistent with its pathogenic nature and lifestyle within the mixed-species environment of subgingival dental plaque. Comparisons with previously sequenced spirochete genomes revealed specific factors contributing to differences and similarities in spirochete physiology as well as pathogenic potential. The T. denticola genome is considerably larger in size than the genome of the related syphilis-causing spirochete Treponema pallidum . The differences in gene content appear to be attributable to a combination of three phenomena: genome reduction, lineage-specific expansions, and horizontal gene transfer. Genes lost due to reductive evolution appear to be largely involved in metabolism and transport, whereas some of the genes that have arisen due to lineage-specific expansions are implicated in various pathogenic interactions, and genes acquired via horizontal gene transfer are largely phage-related or of unknown function.
- Published
- 2004
- Full Text
- View/download PDF
23. The transcription factor Eyes absent is a protein tyrosine phosphatase
- Author
-
Ishara A. Mills, Jeremy D. Selengut, Ilaria Rebay, Robert R. Latek, Beth E. W. Parlikar, Tina L. Tootle, Serena J. Silver, Erin L. Davies, and Victoria Newman
- Subjects
Transactivation ,Multidisciplinary ,biology ,Biochemistry ,Phosphatase ,Transcriptional regulation ,Phosphorylation ,Protein phosphatase 2 ,Protein tyrosine phosphatase ,Drosophila melanogaster ,biology.organism_classification ,Transcription factor - Abstract
Post-translational modifications provide sensitive and flexible mechanisms to dynamically modulate protein function in response to specific signalling inputs1. In the case of transcription factors, changes in phosphorylation state can influence protein stability, conformation, subcellular localization, cofactor interactions, transactivation potential and transcriptional output1. Here we show that the evolutionarily conserved transcription factor Eyes absent (Eya)2,3 belongs to the phosphatase subgroup of the haloacid dehalogenase (HAD) superfamily4,5, and propose a function for it as a non-thiol-based protein tyrosine phosphatase. Experiments performed in cultured Drosophila cells and in vitro indicate that Eyes absent has intrinsic protein tyrosine phosphatase activity and can autocatalytically dephosphorylate itself. Confirming the biological significance of this function, mutations that disrupt the phosphatase active site severely compromise the ability of Eyes absent to promote eye specification and development in Drosophila. Given the functional importance of phosphorylation-dependent modulation of transcription factor activity, this evidence for a nuclear transcriptional coactivator with intrinsic phosphatase activity suggests an unanticipated method of fine-tuning transcriptional regulation.
- Published
- 2003
- Full Text
- View/download PDF
24. The complete genome sequence of the Arabidopsis and tomato pathogen Pseudomonas syringae pv. tomato DC3000
- Author
-
Alan Collmer, Liwei Zhou, Jia Liu, Nikhat Zafar, Bao Tran, William C. Nelson, Kristi Berry, Magdalen Lindeberg, Wen Ling Deng, Nadia Fedorova, David J. Schneider, Hoda Khouri, Daniel A. Russell, Owen White, Mark D'Ascenzo, Vinita Joardar, Tamara Feldblyum, Carol L. Bender, Terrence P. Delaney, C. Robin Buell, Jeremy D. Selengut, Gregory B. Martin, Samuel W. Cartinhour, Arun K. Chatterjee, Robert T. DeBoy, Robert J. Dodson, Sondra G. Lazarowitz, James R. Alfano, Daniel H. Haft, Lauren M. Brinkac, Sean C. Daugherty, Adela R. Ramos, Ian T. Paulsen, Qiaoping Yuan, Michelle L. Gwinn, James F. Kolonay, Ramana Madupu, Xiaoyan Tang, Maureen J. Beanan, A. Scott Durkin, Claire M. Fraser, Tanja M. Davidsen, Teresa Utterback, and Susan Van Aken
- Subjects
Molecular Sequence Data ,Arabidopsis ,Siderophores ,Virulence ,Genome ,Microbiology ,Solanum lycopersicum ,Plant Growth Regulators ,Pseudomonas ,Pseudomonas syringae ,Arabidopsis thaliana ,Gene ,Genetics ,Multidisciplinary ,Base Sequence ,biology ,fungi ,Biological Transport ,Biological Sciences ,biology.organism_classification ,Pseudomonas putida ,Mobile genetic elements ,Reactive Oxygen Species ,Genome, Bacterial ,Plasmids - Abstract
We report the complete genome sequence of the model bacterial pathogen Pseudomonas syringae pathovar tomato DC3000 (DC3000), which is pathogenic on tomato and Arabidopsis thaliana . The DC3000 genome (6.5 megabases) contains a circular chromosome and two plasmids, which collectively encode 5,763 ORFs. We identified 298 established and putative virulence genes, including several clusters of genes encoding 31 confirmed and 19 predicted type III secretion system effector proteins. Many of the virulence genes were members of paralogous families and also were proximal to mobile elements, which collectively comprise 7% of the DC3000 genome. The bacterium possesses a large repertoire of transporters for the acquisition of nutrients, particularly sugars, as well as genes implicated in attachment to plant surfaces. Over 12% of the genes are dedicated to regulation, which may reflect the need for rapid adaptation to the diverse environments encountered during epiphytic growth and pathogenesis. Comparative analyses confirmed a high degree of similarity with two sequenced pseudomonads, Pseudomonas putida and Pseudomonas aeruginosa , yet revealed 1,159 genes unique to DC3000, of which 811 lack a known function.
- Published
- 2003
- Full Text
- View/download PDF
25. The InterPro Database, 2003 brings increased coverage and new features
- Author
-
Emmanuel Courcelle, Teresa K. Attwood, Evgeny M. Zdobnov, Nicolas Hulo, Sandra Orchard, Chris P. Ponting, Margaret Biswas, Sam Griffiths-Jones, Paul Bradley, Ivica Letunic, Robert M. Vaughan, David M. Lonsdale, Richard R. Copley, Marco Pagni, Rodrigo Lopez, Daniel H. Haft, Peer Bork, Jeremy D. Selengut, Nicola Mulder, Ujjwal Das, Wolfgang Fleischmann, Richard Durbin, Laurent Falquet, Rolf Apweiler, Phillip Bucher, Florence Servant, Ville Silventoinen, Alexander Kanapin, Alex Bateman, David Peyruc, Nicola Harte, Christian J. A. Sigrist, Daniel Barrell, Amos Marc Bairoch, Daniel Kahn, David Binns, Maria Krestyaninova, Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS), Bioinformatique, phylogénie et génomique évolutive (BPGE), Département PEGASE [LBBE] (PEGASE), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Bairoch, Amos Marc, Falquet, Laurent, Hulo, Nicolas, Zdobnov, Evgeny, and MDC Library
- Subjects
Proteins/chemistry/genetics/metabolism ,Repetitive Sequences, Amino Acid ,0106 biological sciences ,InterPro ,Protein Structure ,[SDV.OT]Life Sciences [q-bio]/Other [q-bio.OT] ,Web server ,570 Life Sciences ,PROSITE ,Biology ,computer.software_genre ,01 natural sciences ,Repetitive Sequences ,610 Medical Sciences, Medicine ,Databases ,User-Computer Interface ,03 medical and health sciences ,TIGRFAMs ,Computer Graphics ,Genetics ,Animals ,Protein Databases ,ddc:576 ,Tertiary Protein Structure ,Databases, Protein ,Protein Processing ,030304 developmental biology ,0303 health sciences ,Database ,Protein ,Post-Translational ,Proteins ,Articles ,Protein Structure, Tertiary ,Amino Acid ,Post translational ,Cardiovascular and Metabolic Diseases ,Amino Acid Repetitive Sequences ,Protein processing ,Protein signature ,Protein Processing, Post-Translational ,computer ,Tertiary ,Post-Translational Protein Processing ,Autre (Sciences du Vivant) ,010606 plant biology & botany - Abstract
InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually. The latest release of InterPro contains 5629 entries describing 4280 families, 1239 domains, 95 repeats and 15 post-translational modifications. Currently, the combined signatures in InterPro cover more than 74% of all proteins in SWISS-PROT and TrEMBL, an increase of nearly 15% since the inception of InterPro. New features of the database include improved searching capabilities and enhanced graphical user interfaces for visualisation of the data. The database is available via a webserver (http://www.ebi.ac.uk/interpro) and anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).
- Published
- 2003
- Full Text
- View/download PDF
26. Sequence of Plasmodium falciparum chromosomes 2, 10, 11 and 14
- Author
-
Daniel J. Carucci, Malcolm J. Gardner, Claire Fujii, Steven L. Salzberg, Stephen L. Hoffman, Jane M. Carlton, Owen White, Leda M. Cummings, Mark Raymond Adams, David Granger, Shamira J. Shallom, Mihaela Pertea, Behnam Jarrahi, Tamara Feldblyum, Azita Moazzez, James Pederson, Bernard B. Suh, J. Craig Venter, Babak Parvizi, Jonathan E. Allen, Cheryl L. Hansen, Luke J. Tallon, Bruce Weaver, Jeremy Peterson, Jeremy D. Selengut, Vishvanath Nene, Samuel V. Angiuoli, Claire M. Fraser, Anne Ciecko, Jeffery Lynn, Hamilton O. Smith, Michael B. Brenner, Michael Rizzo, and Azadeh Shoaibi
- Subjects
Multidisciplinary ,biology ,Sequence analysis ,Chromosome ,Plasmodium falciparum ,biology.organism_classification ,medicine.disease ,Genome ,Virology ,parasitic diseases ,Proteome ,medicine ,Parasite hosting ,Malaria ,Sequence (medicine) - Abstract
The mosquito-borne malaria parasite Plasmodium falciparum kills an estimated 0.7-2.7 million people every year, primarily children in sub-Saharan Africa. Without effective interventions, a variety of factors-including the spread of parasites resistant to antimalarial drugs and the increasing insecticide resistance of mosquitoes-may cause the number of malaria cases to double over the next two decades. To stimulate basic research and facilitate the development of new drugs and vaccines, the genome of Plasmodium falciparum clone 3D7 has been sequenced using a chromosome-by-chromosome shotgun strategy. We report here the nucleotide sequences of chromosomes 10, 11 and 14, and a re-analysis of the chromosome 2 sequence. These chromosomes represent about 35% of the 23-megabase P. falciparum genome.
- Published
- 2002
- Full Text
- View/download PDF
27. TIGRFAMs and Genome Properties in 2013
- Author
-
Daniel H. Haft, Derek M. Harkins, Roland A. Richter, Erin Beck, Jeremy D. Selengut, and Malay Kumar Basu
- Subjects
Protein family ,Sequence alignment ,Genomics ,Computational biology ,Biology ,Genome ,03 medical and health sciences ,Annotation ,TIGRFAMs ,Genome, Archaeal ,Representative sequences ,Genetics ,Databases, Protein ,030304 developmental biology ,0303 health sciences ,Internet ,030306 microbiology ,Proteins ,Molecular Sequence Annotation ,Articles ,Markov Chains ,Sequence Alignment ,Genome, Bacterial - Abstract
TIGRFAMs, available online at http://www.jcvi.org/tigrfams is a database of protein family definitions. Each entry features a seed alignment of trusted representative sequences, a hidden Markov model (HMM) built from that alignment, cutoff scores that let automated annotation pipelines decide which proteins are members, and annotations for transfer onto member proteins. Most TIGRFAMs models are designated equivalog, meaning they assign a specific name to proteins conserved in function from a common ancestral sequence. Models describing more functionally heterogeneous families are designated subfamily or domain, and assign less specific but more widely applicable annotations. The Genome Properties database, available at http://www.jcvi.org/genome-properties, specifies how computed evidence, including TIGRFAMs HMM results, should be used to judge whether an enzymatic pathway, a protein complex or another type of molecular subsystem is encoded in a genome. TIGRFAMs and Genome Properties content are developed in concert because subsystems reconstruction for large numbers of genomes guides selection of seed alignment sequences and cutoff values during protein family construction. Both databases specialize heavily in bacterial and archaeal subsystems. At present, 4284 models appear in TIGRFAMs, while 628 systems are described by Genome Properties. Content derives both from subsystem discovery work and from biocuration of the scientific literature.
- Published
- 2012
28. Archaeosortases and Exosortases Are Widely Distributed Systems Linking Membrane Transit with Posttranslational Modification
- Author
-
Daniel H. Haft, Jeremy D. Selengut, and Samuel H. Payne
- Subjects
Glycosylation ,Archaeal Proteins ,Molecular Sequence Data ,Exosortase ,Halobacterium ,Proteomics ,Microbiology ,Gene Expression Regulation, Enzymologic ,chemistry.chemical_compound ,Bacterial Proteins ,Amino Acid Sequence ,Molecular Biology ,Gene ,Comparative genomics ,chemistry.chemical_classification ,Genetics ,biology ,Cell Membrane ,Alphaproteobacteria ,Articles ,Gene Expression Regulation, Bacterial ,biology.organism_classification ,Aminoacyltransferases ,Cysteine Endopeptidases ,chemistry ,Gene Expression Regulation, Archaeal ,Glycoprotein ,Protein Processing, Post-Translational - Abstract
Multiple new prokaryotic C-terminal protein-sorting signals were found that reprise the tripartite architecture shared by LPXTG and PEP-CTERM: motif, TM helix, basic cluster. Defining hidden Markov models were constructed for all. PGF-CTERM occurs in 29 archaeal species, some of which have more than 50 proteins that share the domain. PGF-CTERM proteins include the major cell surface protein in Halobacterium, a glycoprotein with a partially characterized diphytanylglyceryl phosphate linkage near its C terminus. Comparative genomics identifies a distant exosortase homolog, designated archaeosortase A (ArtA), as the likely protein-processing enzyme for PGF-CTERM. Proteomics suggests that the PGF-CTERM region is removed. Additional systems include VPXXXP-CTERM/archeaosortase B in two of the same archaea and PEF-CTERM/archaeosortase C in four others. Bacterial exosortases often fall into subfamilies that partner with very different cohorts of extracellular polymeric substance biosynthesis proteins; several species have multiple systems. Variant systems include the VPDSG-CTERM/exosortase C system unique to certain members of the phylum Verrucomicrobia, VPLPA-CTERM/exosortase D in several alpha- and deltaproteobacterial species, and a dedicated (single-target) VPEID-CTERM/exosortase E system in alphaproteobacteria. Exosortase-related families XrtF in the class Flavobacteria and XrtG in Gram-positive bacteria mark distinctive conserved gene neighborhoods. A picture emerges of an ancient and now well-differentiated superfamily of deeply membrane-embedded protein-processing enzymes. Their target proteins are destined to transit cellular membranes during their biosynthesis, during which most undergo additional posttranslational modifications such as glycosylation.
- Published
- 2012
29. Whole Genome Analysis of Leptospira licerasiae Provides Insight into Leptospiral Evolution and Pathogenicity
- Author
-
Derrick E. Fouts, Ravi Sanka, Joseph M. Vinetz, Jason S. Lehmann, Janaki Purushe, Jessica N. Ricaldi, Derek M. Harkins, Michael Torres, Angelo C. Moreno, Nicholas J. G. Webster, Jeremy D. Selengut, Kailash P. Patra, and Michael A. Matthias
- Subjects
Applied Microbiology ,Prophages ,Veterinary Microbiology ,O antigen ,gene regulatory network ,Leptospira licerasiae ,Genome ,bacterial genome ,Pathology ,cobalamin ,genome analysis ,Genetics ,Leptospira ,0303 health sciences ,genetic recombination ,lcsh:Public aspects of medicine ,mass fragmentography ,3. Good health ,non polymerase chain reaction ,Infectious Diseases ,Multigene Family ,Horizontal gene transfer ,Medicine ,Leptospira interrogans ,purl.org/pe-repo/ocde/ford#3.03.06 [https] ,Research Article ,DNA, Bacterial ,saprotroph ,lcsh:Arctic medicine. Tropical medicine ,Clinical Pathology ,gene locus ,Gene Transfer, Horizontal ,Genomic Islands ,lcsh:RC955-962 ,Virulence Factors ,riboswitch ,Molecular Sequence Data ,Locus (genetics) ,gene sequence ,Biology ,Microbiology ,antitoxin ,Evolution, Molecular ,03 medical and health sciences ,bacterium lipopolysaccharide ,Diagnostic Medicine ,Humans ,Gene ,Prophage ,030304 developmental biology ,Comparative genomics ,antigen structure ,030306 microbiology ,molecular evolution ,bacterial virulence ,Public Health, Environmental and Occupational Health ,lcsh:RA1-1270 ,Sequence Analysis, DNA ,sequence homology ,biology.organism_classification ,amino acid sequence ,Clinical Microbiology ,Veterinary Science ,Genome, Bacterial - Abstract
The whole genome analysis of two strains of the first intermediately pathogenic leptospiral species to be sequenced (Leptospira licerasiae strains VAR010 and MMD0835) provides insight into their pathogenic potential and deepens our understanding of leptospiral evolution. Comparative analysis of eight leptospiral genomes shows the existence of a core leptospiral genome comprising 1547 genes and 452 conserved genes restricted to infectious species (including L. licerasiae) that are likely to be pathogenicity-related. Comparisons of the functional content of the genomes suggests that L. licerasiae retains several proteins related to nitrogen, amino acid and carbohydrate metabolism which might help to explain why these Leptospira grow well in artificial media compared with pathogenic species. L. licerasiae strains VAR010T and MMD0835 possess two prophage elements. While one element is circular and shares homology with LE1 of L. biflexa, the second is cryptic and homologous to a previously identified but unnamed region in L. interrogans serovars Copenhageni and Lai. We also report a unique O-antigen locus in L. licerasiae comprised of a 6-gene cluster that is unexpectedly short compared with L. interrogans in which analogous regions may include >90 such genes. Sequence homology searches suggest that these genes were acquired by lateral gene transfer (LGT). Furthermore, seven putative genomic islands ranging in size from 5 to 36 kb are present also suggestive of antecedent LGT. How Leptospira become naturally competent remains to be determined, but considering the phylogenetic origins of the genes comprising the O-antigen cluster and other putative laterally transferred genes, L. licerasiae must be able to exchange genetic material with non-invasive environmental bacteria. The data presented here demonstrate that L. licerasiae is genetically more closely related to pathogenic than to saprophytic Leptospira and provide insight into the genomic bases for its infectiousness and its unique antigenic characteristics., Author Summary Leptospirosis is one of the most common diseases transmitted by animals worldwide and is important because it is a major cause of febrile illness in tropical areas and also occurs in epidemic form associated with natural disasters and flooding. The mechanisms through which Leptospira cause disease are not well understood. In this study we have sequenced the genomes of two strains of Leptospira licerasiae isolated from a person and a marsupial in the Peruvian Amazon. These strains were thought to be able to cause only mild disease in humans. We have compared these genomes with other leptospires that can cause severe illness and death and another leptospire that does not infect humans or animals. These comparisons have allowed us to demonstrate similarities among the disease-causing Leptospira. Studying genes that are common among infectious strains will allow us to identify genetic factors necessary for infecting, causing disease and determining the severity of disease. We have also found that L. licerasiae seems to be able to uptake and incorporate genetic information from other bacteria found in the environment. This information will allow us to begin to understand how Leptospira species have evolved.
- Published
- 2012
30. ProPhylo: partial phylogenetic profiling to guide protein family construction and assignment of biological process
- Author
-
Malay Kumar Basu, Jeremy D. Selengut, and Daniel H. Haft
- Subjects
Protein family ,Archaeal Proteins ,Computational biology ,Biology ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,Set (abstract data type) ,Structural Biology ,Phylogenetics ,Taxonomic rank ,lcsh:QH301-705.5 ,Molecular Biology ,Phylogeny ,computer.programming_language ,Genetics ,Applied Mathematics ,DNA ,Protein superfamily ,Archaea ,Computer Science Applications ,lcsh:Biology (General) ,lcsh:R858-859.7 ,Phylogenetic profiling ,Perl ,DNA microarray ,computer ,Methane ,Algorithms ,Software - Abstract
Background Phylogenetic profiling is a technique of scoring co-occurrence between a protein family and some other trait, usually another protein family, across a set of taxonomic groups. In spite of several refinements in recent years, the technique still invites significant improvement. To be its most effective, a phylogenetic profiling algorithm must be able to examine co-occurrences among protein families whose boundaries are uncertain within large homologous protein superfamilies. Results Partial Phylogenetic Profiling (PPP) is an iterative algorithm that scores a given taxonomic profile against the taxonomic distribution of families for all proteins in a genome. The method works through optimizing the boundary of each protein family, rather than by relying on prebuilt protein families or fixed sequence similarity thresholds. Double Partial Phylogenetic Profiling (DPPP) is a related procedure that begins with a single sequence and searches for optimal granularities for its surrounding protein family in order to generate the best query profiles for PPP. We present ProPhylo, a high-performance software package for phylogenetic profiling studies through creating individually optimized protein family boundaries. ProPhylo provides precomputed databases for immediate use and tools for manipulating the taxonomic profiles used as queries. Conclusion ProPhylo results show universal markers of methanogenesis, a new DNA phosphorothioation-dependent restriction enzyme, and efficacy in guiding protein family construction. The software and the associated databases are freely available under the open source Perl Artistic License from ftp://ftp.jcvi.org/pub/data/ppp/.
- Published
- 2011
31. Unexpected Abundance of Coenzyme F420-Dependent Enzymes in Mycobacterium tuberculosis and Other Actinobacteria▿ †
- Author
-
Jeremy D. Selengut and Daniel H. Haft
- Subjects
Tuberculosis ,Protein Conformation ,Riboflavin ,Molecular Sequence Data ,Coenzymes ,Flavin mononucleotide ,Microbiology ,Cofactor ,Mycobacterium tuberculosis ,chemistry.chemical_compound ,Nitroreductase ,medicine ,Amino Acid Sequence ,Molecular Biology ,Phylogeny ,Genetics ,Flavonoids ,Cofactor binding ,Binding Sites ,biology ,Molecular Structure ,Gene Expression Profiling ,Computational Biology ,Gene Expression Regulation, Bacterial ,biology.organism_classification ,medicine.disease ,Coenzyme F420 ,Actinobacteria ,chemistry ,Biochemistry ,biology.protein ,Phylogenetic profiling ,Genome, Bacterial - Abstract
Regimens targeting Mycobacterium tuberculosis , the causative agent of tuberculosis (TB), require long courses of treatment and a combination of three or more drugs. An increase in drug-resistant strains of M. tuberculosis demonstrates the need for additional TB-specific drugs. A notable feature of M. tuberculosis is coenzyme F 420 , which is distributed sporadically and sparsely among prokaryotes. This distribution allows for comparative genomics-based investigations. Phylogenetic profiling (comparison of differential gene content) based on F 420 biosynthesis nominated many actinobacterial proteins as candidate F 420 -dependent enzymes. Three such families dominated the results: the luciferase-like monooxygenase (LLM), pyridoxamine 5′-phosphate oxidase (PPOX), and deazaflavin-dependent nitroreductase (DDN) families. The DDN family was determined to be limited to F 420 -producing species. The LLM and PPOX families were observed in F 420 -producing species as well as species lacking F 420 but were particularly numerous in many actinobacterial species, including M. tuberculosis . Partitioning the LLM and PPOX families based on an organism's ability to make F 420 allowed the application of the SIMBAL (sites inferred by metabolic background assertion labeling) profiling method to identify F 420 -correlated subsequences. These regions were found to correspond to flavonoid cofactor binding sites. Significantly, these results showed that M. tuberculosis carries at least 28 separate F 420 -dependent enzymes, most of unknown function, and a paucity of flavin mononucleotide (FMN)-dependent proteins in these families. While prevalent in mycobacteria, markers of F 420 biosynthesis appeared to be absent from the normal human gut flora. These findings suggest that M. tuberculosis relies heavily on coenzyme F 420 for its redox reactions. This dependence and the cofactor's rarity may make F 420 -related proteins promising drug targets.
- Published
- 2010
32. The minimum information about a genome sequence (MIGS) specification
- Author
-
Dawn Field, Renzo Kottmann, Sandra L. Baldauf, Eugene Kolker, Phillip Lord, Ingio San Gil, George M. Garrity, Norman Morrison, Gareth A. Wilson, Nadeem Faruque, Bob Vaughan, Owen White, Tanya Gray, James R. Cole, Phil Hugenholtz, David W. Ussery, Peter Sterk, Robert Edwards, Henning Hermjakob, Barbara A. Methé, Naomi L. Ward, Samuel V. Angiuoli, Ilene Mizrachi, Anil Wipat, Paul De Vos, Andrew J. Spiers, Kelvin Li, Chris F. Taylor, Allyson L. Lister, Julian Parkhill, Frank Oliver Glöckner, George A. Kowalchuk, Jeffrey L. Boore, Leonid Kagan, Suzanna E. Lewis, Victor Markowitz, Robert G. Feldman, Paul Swift, Nikos C. Kyrpides, Guy Cochrane, Jack A. Gilbert, Daniel H. Haft, Natalia Maltsev, Trish Whetzel, Jennifer B. H. Martiny, Matthew D. Kane, Peter Dawyndt, Lita M. Proctor, Tatiana Tatusova, Claude W. dePamphilis, Saul A. Kravitz, Yoshio Tateno, Michael J. Allen, Nicholas R. Thomson, Nelson Axelrod, Karen E. Nelson, Ian Joint, Jim Leebens-Mack, Jeremy D. Selengut, Michael Ashburner, Robert P. Guralnick, Sarah L. Turner, Adrian Tett, David Hancock, S. Ballard, Christiane Hertz-Fowler, Richard Moxon, Susanna-Assunta Sansone, Philip Goldstein, Robert Stevens, Paul Gilna, Jessie Kennedy, and Terrestrial Microbial Ecology (TME)
- Subjects
Genetics ,Whole genome sequencing ,Internationality ,Standardization ,Databases, Factual ,Information Dissemination ,Biomedical Engineering ,Information Theory ,Chromosome Mapping ,Information Storage and Retrieval ,Bioengineering ,Genomics ,Biology ,Information theory ,Applied Microbiology and Biotechnology ,Transparency (behavior) ,Data science ,Genome ,Article ,Metadata ,Molecular Medicine ,Biotechnology - Abstract
With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases.
- Published
- 2008
33. Comparative Genomics of Emerging Human Ehrlichiosis Agents
- Author
-
Julie C, Dunning Hotopp, Mingqun, Lin, Ramana, Madupu, Jonathan, Crabtree, Samuel V, Angiuoli, Jonathan A, Eisen, Jonathan, Eisen, Rekha, Seshadri, Qinghu, Ren, Martin, Wu, Teresa R, Utterback, Shannon, Smith, Matthew, Lewis, Hoda, Khouri, Chunbin, Zhang, Hua, Niu, Quan, Lin, Norio, Ohashi, Ning, Zhi, William, Nelson, Lauren M, Brinkac, Robert J, Dodson, M J, Rosovitz, Jaideep, Sundaram, Sean C, Daugherty, Tanja, Davidsen, Anthony S, Durkin, Michelle, Gwinn, Daniel H, Haft, Jeremy D, Selengut, Steven A, Sullivan, Nikhat, Zafar, Liwei, Zhou, Faiza, Benahmed, Heather, Forberger, Rebecca, Halpin, Stephanie, Mulligan, Jeffrey, Robinson, Owen, White, Yasuko, Rikihisa, Hervé, Tettelin, and Richardson, Paul M
- Subjects
Cancer Research ,Neorickettsia ,DNA Repair ,animal diseases ,Genetics/Functional Genomics ,Ticks ,Models ,Ehrlichia chaffeensis ,Rickettsia ,Genetics (clinical) ,Phylogeny ,Genetics/Genomics ,0303 health sciences ,Genome ,biology ,Ehrlichia ,Genomics ,Anaplasmataceae ,Infectious Diseases ,Infection ,Research Article ,Biotechnology ,Human monocytotropic ehrlichiosis ,Ehrlichiosis ,lcsh:QH426-470 ,Evolution ,Biotin ,Models, Biological ,Microbiology ,Genetics/Comparative Genomics ,03 medical and health sciences ,Rare Diseases ,parasitic diseases ,medicine ,Genetics ,Animals ,Humans ,Anaplasma ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,030306 microbiology ,Prevention ,biology.organism_classification ,medicine.disease ,bacterial infections and mycoses ,Biological ,Vector-Borne Diseases ,Eubacteria ,lcsh:Genetics ,Emerging Infectious Diseases ,Good Health and Well Being ,bacteria ,Rickettsiales ,Developmental Biology - Abstract
Anaplasma (formerly Ehrlichia) phagocytophilum, Ehrlichia chaffeensis, and Neorickettsia (formerly Ehrlichia) sennetsu are intracellular vector-borne pathogens that cause human ehrlichiosis, an emerging infectious disease. We present the complete genome sequences of these organisms along with comparisons to other organisms in the Rickettsiales order. Ehrlichia spp. and Anaplasma spp. display a unique large expansion of immunodominant outer membrane proteins facilitating antigenic variation. All Rickettsiales have a diminished ability to synthesize amino acids compared to their closest free-living relatives. Unlike members of the Rickettsiaceae family, these pathogenic Anaplasmataceae are capable of making all major vitamins, cofactors, and nucleotides, which could confer a beneficial role in the invertebrate vector or the vertebrate host. Further analysis identified proteins potentially involved in vacuole confinement of the Anaplasmataceae, a life cycle involving a hematophagous vector, vertebrate pathogenesis, human pathogenesis, and lack of transovarial transmission. These discoveries provide significant insights into the biology of these obligate intracellular pathogens., Synopsis Ehrlichiosis is an acute disease that triggers flu-like symptoms in both humans and animals. It is caused by a range of bacteria transmitted by ticks or flukes. Because these bacteria are difficult to culture, however, the organisms are poorly understood. The genomes of three emerging human pathogens causing ehrlichiosis were sequenced. A database was designed to allow the comparison of these three genomes to sixteen other bacteria with similar lifestyles. Analysis from this database reveals new species-specific and disease-specific genes indicating niche adaptations, pathogenic traits, and other features. In particular, one of the organisms contains more than 100 copies of a single gene involved in interactions with the host(s). These comparisons also enabled a reconstruction of the metabolic potential of five representative genomes from these bacteria and their close relatives. With this work, scientists can study these emerging pathogens in earnest.
- Published
- 2006
34. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial 'pan-genome'
- Author
-
Jeremy D. Selengut, Robert T. DeBoy, Lauren M. Brinkac, Christopher R. Hauser, Maria Scarselli, Rino Rappuoli, M. J. Rosovitz, Tanja M. Davidsen, George Dimitrov, Craig E. Rubens, Steven A. Sullivan, Claudio Donati, Naomi L. Ward, Michael J. Cieslewicz, Amanda L. Jones, Lawrence C. Madoff, Immaculada Margarit Y Ros, Hoda Khouri, Guido Grandi, Jaideep P. Sundaram, Shannon Smith, Michael R. Wessels, Liwei Zhou, William C. Nelson, Hervé Tettelin, Dennis L. Kasper, Jeremy Peterson, Samuel V. Angiuoli, Daniel H. Haft, Nikhat Zafar, Vega Masignani, Sean C. Daugherty, Jonathan Crabtree, Claire M. Fraser, Marirosa Mora, Kisha Watkins, Owen White, John L. Telford, A. Scott Durkin, Robert J. Dodson, Kevin J. B. O'Connor, Duccio Medini, Teresa Utterback, Michelle L. Gwinn, Diana Radune, and Ramana Madupu
- Subjects
group B Streptococcus ,AD-HOC-COMMITTEE ,Sequence analysis ,GENE IDENTIFICATION ,Molecular Sequence Data ,VACCINE ,PROTEIN ,Gene Expression ,comparative genomics ,Biology ,medicine.disease_cause ,Genome ,SEQUENCE ,DNA sequencing ,GROUP-B STREPTOCOCCUS ,Streptococcus agalactiae ,medicine ,BACILLUS-ANTHRACIS ,bacterial species ,Amino Acid Sequence ,Gene ,PROTECTIVE ANTIBODIES ,Bacterial Capsules ,Phylogeny ,Comparative genomics ,Genetics ,Multidisciplinary ,Base Sequence ,Virulence ,Pan-genome ,Genetic Variation ,Genome project ,Sequence Analysis, DNA ,SPECIES DEFINITION ,Biological Sciences ,Genes, Bacterial ,SEROTYPE ,Sequence Alignment ,Genome, Bacterial - Abstract
The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae , the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for ≈80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes.
- Published
- 2005
35. A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes
- Author
-
Emmanuel F. Mongodin, Karen E. Nelson, Daniel H. Haft, and Jeremy D. Selengut
- Subjects
Haloarcula marismortui ,Yersinia pestis ,Evolution ,Genes, Fungal ,Biology ,Microbiology ,CRISPR Spacers ,Genes, Archaeal ,Cellular and Molecular Neuroscience ,Genetics ,Direct repeat ,CRISPR ,lcsh:QH301-705.5 ,Molecular Biology ,Dyad symmetry ,Ecology, Evolution, Behavior and Systematics ,Phylogeny ,Oligonucleotide Array Sequence Analysis ,Repetitive Sequences, Nucleic Acid ,Trans-activating crRNA ,CRISPR interference ,Genome ,Ecology ,Fungal genetics ,Proteins ,Archaea ,Markov Chains ,Eubacteria ,lcsh:Biology (General) ,Computational Theory and Mathematics ,Prokaryotic Cells ,Genes, Bacterial ,Modeling and Simulation ,Multigene Family ,CRISPR Loci ,Genome, Bacterial ,Bioinformatics - Computational Biology ,Research Article - Abstract
Clustered regularly interspaced short palindromic repeats (CRISPRs) are a family of DNA direct repeats found in many prokaryotic genomes. Repeats of 21–37 bp typically show weak dyad symmetry and are separated by regularly sized, nonrepetitive spacer sequences. Four CRISPR-associated (Cas) protein families, designated Cas1 to Cas4, are strictly associated with CRISPR elements and always occur near a repeat cluster. Some spacers originate from mobile genetic elements and are thought to confer “immunity” against the elements that harbor these sequences. In the present study, we have systematically investigated uncharacterized proteins encoded in the vicinity of these CRISPRs and found many additional protein families that are strictly associated with CRISPR loci across multiple prokaryotic species. Multiple sequence alignments and hidden Markov models have been built for 45 Cas protein families. These models identify family members with high sensitivity and selectivity and classify key regulators of development, DevR and DevS, in Myxococcus xanthus as Cas proteins. These identifications show that CRISPR/cas gene regions can be quite large, with up to 20 different, tandem-arranged cas genes next to a repeat cluster or filling the region between two repeat clusters. Distinctive subsets of the collection of Cas proteins recur in phylogenetically distant species and correlate with characteristic repeat periodicity. The analyses presented here support initial proposals of mobility of these units, along with the likelihood that loci of different subtypes interact with one another as well as with host cell defensive, replicative, and regulatory systems. It is evident from this analysis that CRISPR/cas loci are larger, more complex, and more heterogeneous than previously appreciated., Synopsis The family of clustered regularly interspaced short palindromic repeats (CRISPRs) describes a class of DNA repeats found in nearly half of all bacterial and archaeal genomes. These DNA repeat regions have a remarkably regular structure: unique sequences of constant size, called spacers, sit between each pair of repeats. The DNA repeats do not encode proteins, but appear to be transcribed and processed into small RNAs that may have any number of functions, including resistance to any phage (i.e., virus of bacteria) whose sequence matches a spacer; spacers change rapidly as microbial strains evolve. This work describes 41 new CRISPR-associated (cas) gene families, which are always found near these repeats, in addition to the four previously known. It shows that CRISPR systems belong to different classes, with different repeat patterns, sets of genes, and species ranges. Most of these seem to come and go rather rapidly from their host genomes. These possibly beneficial mobile genetic elements may play an important role in driving prokaryotic evolution.
- Published
- 2005
36. InterPro, progress and status in 2005
- Author
-
Ivica Letunic, Jeremy D. Selengut, Paul Bradley, Alex L. Mitchell, Ujjwal Das, David Binns, Julian Gough, John Maslen, Teresa K. Attwood, Robert M. Vaughan, Cathy H. Wu, Christian J. A. Sigrist, David J. Studholme, Anastasia N. Nikolskaya, Rodrigo Lopez, Martin Madera, Emmanuel Courcelle, Daniel H. Haft, Nicola Harte, Alexander Kanapin, Marco Pagni, Maria Krestyaninova, Rolf Apweiler, Nicolas Hulo, Richard R. Copley, Sandra Orchard, David M. Lonsdale, Chris P. Ponting, Alex Bateman, Lorenzo Cerutti, Amos Marc Bairoch, Daniel Kahn, Richard Durbin, Phillip Bucher, Peer Bork, Jennifer McDowall, Nicola Mulder, Wolfgang Fleischmann, Emmanuel Quevillon, Ville Silventoinen, Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS), Bioinformatique, phylogénie et génomique évolutive (BPGE), Département PEGASE [LBBE] (PEGASE), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), and MDC Library
- Subjects
InterPro ,[SDV.OT]Life Sciences [q-bio]/Other [q-bio.OT] ,Architecture domain ,Protein family ,Simple Modular Architecture Research Tool ,Protein Sequence Analysis ,Databases, Protein/trends ,570 Life Sciences ,Computational biology ,PROSITE ,Biology ,Bioinformatics ,610 Medical Sciences, Medicine ,03 medical and health sciences ,Annotation ,TIGRFAMs ,Sequence Analysis, Protein ,Genetics ,Humans ,Protein Databases ,ddc:576 ,Tertiary Protein Structure ,Databases, Protein ,030304 developmental biology ,Proteins/chemistry/classification ,0303 health sciences ,030302 biochemistry & molecular biology ,Proteins ,Articles ,Protein Structure, Tertiary ,Systems Integration ,Cardiovascular and Metabolic Diseases ,UniProt ,Sequence Alignment ,Autre (Sciences du Vivant) - Abstract
International audience; InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).
- Published
- 2005
- Full Text
- View/download PDF
37. Genome of Geobacter sulfurreducens: metal reduction in subsurface environments
- Author
-
Bao Tran, Jeremy D. Selengut, H. Khouri, R. Madupu, S. van Aken, Nikhat Zafar, Sean C. Daugherty, Lauren M. Brinkac, Ian T. Paulsen, James F. Kolonay, T. Utterback, John F. Heidelberg, Michelle L. Gwinn, Dongying Wu, Robert T. DeBoy, Tanja M. Davidsen, Tamara Feldblyum, Daniel H. Haft, Claudia M. Romero, J. Weidman, Karen E. Nelson, Naomi L. Ward, Derek R. Lovley, William C. Nelson, Claire M. Fraser, Jonathan A. Eisen, Robert J. Dodson, Martin Wu, Barbara A. Methé, Maureen J. Beanan, Owen White, Heather Forberger, Steven A. Sullivan, and Anthony S. Durkin
- Subjects
Movement ,chemistry.chemical_element ,Computational biology ,Acetates ,Genome ,Metal ,Electron Transport ,Open Reading Frames ,Bioremediation ,Bacterial Proteins ,Acetyl Coenzyme A ,Genes, Regulator ,Anaerobiosis ,Geobacter sulfurreducens ,Organism ,Phylogeny ,Multidisciplinary ,biology ,Ecology ,Chemotaxis ,Cytochromes c ,Chromosomes, Bacterial ,biology.organism_classification ,Electron transport chain ,Aerobiosis ,Carbon ,chemistry ,Genes, Bacterial ,Metals ,visual_art ,visual_art.visual_art_medium ,Energy Metabolism ,Geobacter ,Oxidation-Reduction ,Genome, Bacterial ,Hydrogen - Abstract
The complete genome sequence of Geobacter sulfurreducens , a δ-proteobacterium, reveals unsuspected capabilities, including evidence of aerobic metabolism, one-carbon and complex carbon metabolism, motility, and chemotactic behavior. These characteristics, coupled with the possession of many two-component sensors and many c-type cytochromes, reveal an ability to create alternative, redundant, electron transport networks and offer insights into the process of metal ion reduction in subsurface environments. As well as playing roles in the global cycling of metals and carbon, this organism clearly has the potential for use in bioremediation of radioactive metals and in the generation of electricity.
- Published
- 2003
38. The TIGRFAMs database of protein families
- Author
-
Daniel H. Haft, Owen White, and Jeremy D. Selengut
- Subjects
InterPro ,Database ,Protein family ,Sequence Homology, Amino Acid ,Proteins ,Sequence alignment ,Articles ,Protein superfamily ,Biology ,computer.software_genre ,Genome ,Markov Chains ,Mixed Function Oxygenases ,Set (abstract data type) ,Annotation ,TIGRFAMs ,Genetics ,Animals ,Databases, Protein ,computer ,Phylogeny ,Pyruvate Carboxylase - Abstract
TIGRFAMs is a collection of manually curated protein families consisting of hidden Markov models (HMMs), multiple sequence alignments, commentary, Gene Ontology (GO) assignments, literature references and pointers to related TIGRFAMs, Pfam and InterPro models. These models are designed to support both automated and manually curated annotation of genomes. TIGRFAMs contains models of full-length proteins and shorter regions at the levels of superfamilies, subfamilies and equivalogs, where equivalogs are sets of homologous proteins conserved with respect to function since their last common ancestor. The scope of each model is set by raising or lowering cutoff scores and choosing members of the seed alignment to group proteins sharing specific function (equivalog) or more general properties. The overall goal is to provide information with maximum utility for the annotation process. TIGRFAMs is thus complementary to Pfam, whose models typically achieve broad coverage across distant homologs but end at the boundaries of conserved structural domains. The database currently contains over 1600 protein families. TIGRFAMs is available for searching or downloading at www.tigr.org/TIGRFAMs.
- Published
- 2003
39. Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii
- Author
-
John R. Yates, Lawrence W. Bergman, Leo H.M. van Lin, Martha Sedegah, Tamara Feldblyum, Jane M. Carlton, Azadeh Shoaibi, Bernard B. Suh, Jeremy Peterson, Jonathan E. Allen, Steven L. Salzberg, Jennifer Cho, Shelby L. Bidwell, Chris J. Janse, Michael Harris, Mihaela Pertea, Leda M. Cummings, Shamira J. Shallom, Mihai Pop, J. Dale Raine, Malcolm J. Gardner, Akhil B. Vaidya, Hamilton O. Smith, Maria D. Ermolaeva, Owen White, Steven B. Riedmuller, Daniel J. Carucci, J. Craig Venter, Robert E. Sinden, Stephen L. Hoffman, Joana C. Silva, Andrew P. Waters, Martin Shumway, Hean L. Koo, Jeremy D. Selengut, Taco W. A. Kooij, Claire M. Fraser, Susan Van Aken, Deirdre A. Cunningham, Daniel S. Kosack, John Quackenbush, Peter R. Preiser, Laurence Florens, and Samuel V. Angiuoli
- Subjects
Sequence analysis ,Plasmodium falciparum ,Rodentia ,Genome ,Plasmodium ,Synteny ,Species Specificity ,parasitic diseases ,Animals ,Humans ,Genetics ,Whole genome sequencing ,Recombination, Genetic ,Multidisciplinary ,biology ,Plasmodium yoelii ,Sequence Analysis, DNA ,DNA, Protozoan ,Telomere ,Subtelomere ,biology.organism_classification ,Malaria ,Disease Models, Animal ,Multigene Family ,Genome, Protozoan ,Sequence Alignment - Abstract
Species of malaria parasite that infect rodents have long been used as models for malaria disease research. Here we report the whole-genome shotgun sequence of one species, Plasmodium yoelii yoelii, and comparative studies with the genome of the human malaria parasite Plasmodium falciparum clone 3D7. A synteny map of 2,212 P. y. yoelii contiguous DNA sequences (contigs) aligned to 14 P. falciparum chromosomes reveals marked conservation of gene synteny within the body of each chromosome. Of about 5,300 P. falciparum genes, more than 3,300 P. y. yoelii orthologues of predominantly metabolic function were identified. Over 800 copies of a variant antigen gene located in subtelomeric regions were found. This is the first genome sequence of a model eukaryotic parasite, and it provides insight into the use of such systems in the modelling of Plasmodium biology and disease.
- Published
- 2002
40. Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function
- Author
-
Douglas B. Rusch, Jeremy D. Selengut, and Daniel H. Haft
- Subjects
Protein family ,Molecular Sequence Data ,Computational biology ,Biology ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,03 medical and health sciences ,Structure-Activity Relationship ,Structural Biology ,Sequence Analysis, Protein ,Subsequence ,Amino Acid Sequence ,Molecular Biology ,Peptide sequence ,lcsh:QH301-705.5 ,Phylogeny ,030304 developmental biology ,Sequence (medicine) ,Comparative genomics ,Genetics ,0303 health sciences ,Biological data ,Applied Mathematics ,Methodology Article ,Gene Expression Profiling ,030302 biochemistry & molecular biology ,Proteins ,Computer Science Applications ,lcsh:Biology (General) ,lcsh:R858-859.7 ,Phylogenetic profiling ,DNA microarray ,Algorithms - Abstract
Background Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets. Results Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization. Conclusions SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites.
- Published
- 2010
41. Whole-genome sequence analysis of Pseudomonas syringae pv. phaseolicola 1448A reveals divergence among pathovars in genes involved in virulence and transposition
- Author
-
John W. Mansfield, Rebecca A. Halpin, Arun K. Chatterjee, Magdalen Lindeberg, Owen White, Sean C. Daugherty, Hoda Khouri, Liwei Zhou, Todd Creasy, Sam Cartinhour, Steven A. Sullivan, Michelle G. Giglio, Robert T. DeBoy, William C. Nelson, Vinita Joardar, Robert W. Jackson, A. Scott Durkin, Robert J. Dodson, Daniel H. Haft, Alan Collmer, Jeremy D. Selengut, Lauren M. Brinkac, Jonathan Crabtree, Claire M. Fraser, David J. Schneider, Tanja M. Davidsen, M. J. Rosovitz, Tamara Feldblyum, C. Robin Buell, Tara Holley, Nikhat Zafar, and Ramana Madupu
- Subjects
DNA, Bacterial ,Transposable element ,Whole genome sequencing ,Genetics ,Virulence ,Genomics and Proteomics ,biology ,Molecular Sequence Data ,Pseudomonas syringae ,Halo blight ,biology.organism_classification ,Microbiology ,Genome ,Bacterial Proteins ,Species Specificity ,Genes, Bacterial ,Pathovar ,ORFS ,Molecular Biology ,Gene ,Genome, Bacterial - Abstract
Pseudomonas syringae pv. phaseolicola, a gram-negative bacterial plant pathogen, is the causal agent of halo blight of bean. In this study, we report on the genome sequence of P. syringae pv. phaseolicola isolate 1448A, which encodes 5,353 open reading frames (ORFs) on one circular chromosome (5,928,787 bp) and two plasmids (131,950 bp and 51,711 bp). Comparative analyses with a phylogenetically divergent pathovar, P. syringae pv. tomato DC3000, revealed a strong degree of conservation at the gene and genome levels. In total, 4,133 ORFs were identified as putative orthologs in these two pathovars using a reciprocal best-hit method, with 3,941 ORFs present in conserved, syntenic blocks. Although these two pathovars are highly similar at the physiological level, they have distinct host ranges; 1448A causes disease in beans, and DC3000 is pathogenic on tomato and Arabidopsis . Examination of the complement of ORFs encoding virulence, fitness, and survival factors revealed a substantial, but not complete, overlap between these two pathovars. Another distinguishing feature between the two pathovars is their distinctive sets of transposable elements. With access to a fifth complete pseudomonad genome sequence, we were able to identify 3,567 ORFs that likely comprise the core Pseudomonas genome and 365 ORFs that are P. syringae specific.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.