6,739 results on '"Genome project"'
Search Results
2. Molecular tools to monitor health and disease – and lucky coincidences
- Author
-
Ulf Landegren
- Subjects
padlock probes ,proximity ligation ,proximity extension ,in situ pla ,superrca ,molecular genetics ,career ,synthetic oligonucleotides ,genome project ,patent ,commercialization ,Medicine - Abstract
Improved methods for molecular analyses are obviously central for medical research. I will describe herein our work developing tools to reveal molecular states in health and disease. I will recount how I got started in this endeavor, and how our early work characterizing genetic variation led onto high-throughput protein measurements and to techniques for imaging the distribution of proteins and their activity states in tissues. I will also describe a more recent technique to measure even exceedingly rare genetic variants in order to monitor recurrence of disease for tumor patients.
- Published
- 2022
- Full Text
- View/download PDF
3. Molecular tools to monitor health and disease - and lucky coincidences.
- Author
-
Landegren, Ulf
- Subjects
- *
GENETIC variation , *COINCIDENCE , *MOLECULAR genetics , *DISEASE progression , *DISEASE relapse - Abstract
Improved methods for molecular analyses are obviously central for medical research. I will describe herein our work developing tools to reveal molecular states in health and disease. I will recount how I got started in this endeavor, and how our early work characterizing genetic variation led onto high-throughput protein measurements and to techniques for imaging the distribution of proteins and their activity states in tissues. I will also describe a more recent technique to measure even exceedingly rare genetic variants in order to monitor recurrence of disease for tumor patients. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. The Pioneer Advantage: Filling the blank spots on the map of genome diversity in Europe.
- Author
-
Oleksyk, Taras K, Wolfsberger, Walter W, Schubelka, Khrystyna, Mangul, Serghei, and O'Brien, Stephen J
- Subjects
- *
GENE mapping , *HUMAN gene mapping , *NUCLEOTIDE sequencing , *GENOMES ,DEVELOPED countries - Abstract
Documenting genome diversity is important for the local biomedical communities and instrumental in developing precision and personalized medicine. Currently, tens of thousands of whole-genome sequences from Europe are publicly available, but most of these represent populations of developed countries of Europe. The uneven distribution of the available data is further impaired by the lack of data sharing. Recent whole-genome studies in Eastern Europe, one in Ukraine and one in Russia, demonstrated that local genome diversity and population structure from Eastern Europe historically had not been fully represented. An unexpected wealth of genomic variation uncovered in these studies was not so much a consequence of high variation within their population, but rather due to the "pioneer advantage." We discovered more variants because we were the first to prospect in the Eastern European genome pool. This simple comparison underscores the importance of removing the remaining geographic genome deserts from the rest of the world map of the human genome diversity. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. Chromosome-level de novo genome assemblies of over 100 plant species.
- Author
-
Kenta Shirasawa, Daijiro Harada, Hideki Hirakawa, Sachiko Isobe, and Kole, Chittaranjan
- Subjects
- *
PLANT species , *GENOMES , *GENOME size , *SEQUENCE analysis , *NUCLEOTIDE sequencing , *PLANT genomes - Abstract
Genome sequence analysis in higher plants began with the whole-genome sequencing of Arabidopsis thaliana. Owing to the great advances in sequencing technologies, also known as next-generation sequencing (NGS) technologies, genomes of more than 400 plant species have been sequenced to date. Long-read sequencing technologies, together with sequence scaffolding methods, have enabled the synthesis of chromosome-level de novo genome sequence assemblies, which has further allowed comparative analysis of the structural features of multiple plant genomes, thus elucidating the evolutionary history of plants. However, the quality of the assembled chromosome-level sequences varies among plant species. In this review, we summarize the status of chromosome-level assemblies of 114 plant species, with genome sizes ranging from 125 Mb to 16.9 Gb. While the average genome coverage of the assembled sequences reached up to 89.1%, the average coverage of chromosome-level pseudomolecules was 73.3%. Thus, further improvements in sequencing technologies and scaffolding, and data analysis methods, are required to establish gap-free telomere-to-telomere genome sequence assemblies. With the forthcoming new technologies, we are going to enter into a new genomics era where pan-genomics and the >1,000 or >1 million genomes' project will be routine in higher plants. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
6. Polymicrogyria-associated epilepsy: a multicenter phenotypic study from the Epilepsy Phenome/Genome Project.
- Author
-
Shain, Catherine, Ramgopal, Sriram, Fallil, Zianka, Parulkar, Isha, Alongi, Richard, Knowlton, Robert, Poduri, Annapurna, and EPGP Investigators
- Subjects
EPGP Investigators ,Cerebral Cortex ,Humans ,Epilepsy ,Magnetic Resonance Imaging ,Electroencephalography ,Retrospective Studies ,Cohort Studies ,Age of Onset ,Phenotype ,Adolescent ,Adult ,Middle Aged ,Child ,Child ,Preschool ,Infant ,Female ,Male ,Functional Laterality ,Malformations of Cortical Development ,Young Adult ,Epilepsy Phenome/Genome Project ,Perisylvian ,Polymicrogyria ,Neurosciences ,Neurodegenerative ,Pediatric ,Brain Disorders ,Clinical Research ,2.1 Biological and endogenous factors ,Aetiology ,Neurological ,Epilepsy Phenome ,Genome Project ,Clinical Sciences ,Neurology & Neurosurgery - Abstract
PurposePolymicrogyria (PMG) is an epileptogenic malformation of cortical development. We describe the clinical epilepsy and imaging features of a large cohort with PMG-related epilepsy.MethodsParticipants were recruited through the Epilepsy Phenome/Genome Project, a multicenter collaborative effort to collect detailed phenotypic data on individuals with epilepsy. We reviewed phenotypic data from participants with epilepsy and PMG.Key findingsWe identified 87 participants, 43 female and 44 male, with PMG and epilepsy. Median age of seizure onset was 3 years (range
- Published
- 2013
7. Genomic and physiological characterization of Novosphingobium terrae sp. nov., an alphaproteobacterium isolated from Cerrado soil containing a mega-sized chromid
- Author
-
Carla Simone Vizzotto, Marcos Rogério Tótola, Georgios J. Pappas, Marcelo Henrique Soller Ramada, Aline Belmok, Felipe Marques de Almeida, Rodrigo Theodoro Rocha, Ricardo Henrique Krüger, and Cynthia Maria Kyaw
- Subjects
Genetics ,Novosphingobium ,Extrachromosomal DNA ,Strain (biology) ,Media Technology ,Sequence assembly ,Genome project ,Biology ,16S ribosomal RNA ,biology.organism_classification ,Genome ,Gene ,Microbiology - Abstract
A novel bacterial strain, designated GeG2T, was isolated from soils of native Cerrado, a highly biodiverse savanna-like Brazilian biome. 16S rRNA gene sequence analysis of strain GeG2T revealed high sequence identity (100%) to the alphaproteobacterium Novosphingobium rosa, however, comparisons with N. rosa DSM7285T showed several distinctive features, prompting a full characterization of the new strain in terms of growth, morphology, biochemistry and, ultimately, its genome. GeG2T cells were Gram-stain negative bacilli, facultatively anaerobic, motile, positive for catalase and oxidase activities and for starch hydrolysis. Strain GeG2T presented planktonic-sessile dimorphism and cell aggregates surrounded by extracellular matrix and nanometric spherical structures were observed in liquid cultures, suggesting the production of exopolysaccharides (EPS) and outer membrane vesicles (OMVs). Whole genome assembly revealed four circular replicons: a 4.1 Mb chromosome, a 2.7 Mb extrachromosomal megareplicon and two plasmids (212.7 and 68.6 kb). The megareplicon contains few core genes and plasmid-type replication/maintenance systems, consistent with its classification as a chromid. Genome annotation shows a vast repertoire of carbohydrate active enzymes and genes involved in the degradation of aromatic compounds, highlighting the biotechnological potential of the new isolate obtained from Cerrado soils, especially regarding EPS production and biodegradation of recalcitrant compounds. Chemotaxonomic features, including polar lipid and fatty acid profiles, as well as physiological, molecular and whole genome comparisons showed significant differences between strain GeG2T and a N. rosa, clearly indicating that it represents a novel species, for which the name Novosphingobium terrae is proposed. The type strain is GeG2T (=CBMAI 2313T =CBAS 753T).IMPORTANCENovosphingobium is an alphaproteobacterial genus presenting diverse physiological profiles and broad biotechnological applications. However, many aspects regarding the biology of this important bacterial group remain elusive. A novel Novosphingobium strain was isolated from soils of Cerrado, an important Brazilian biome. Despite 100% 16S rRNA gene identity with Novosphingobium rosa, polyphasic characterizations, including physiological, chemotaxonomic, and whole genome- based analyses revealed significant differences between GeG2T and N. rosa DSM7285T, reinforcing resolution limitations in phylogenetic analysis based solely on 16S RNA and highlighting the importance of employing different approaches for the description of bacterial species. Using short and long read sequencing approaches, a high-quality fully resolved genome assembly was generated and one of the largest chromids reported to date was identified. A comprehensive characterization of environmental isolates allows us to better elucidate the diversity and biology of members of this bacterial group with potential biotechnological importance, guiding future bioprospecting efforts and genomic studies.
- Published
- 2023
8. Opportunities and Challenges for Molecular Understanding of Ciliopathies–The 100,000 Genomes Project
- Author
-
Gabrielle Wheway and Hannah M. Mitchison
- Subjects
Genome Project ,ciliopathies ,cilia ,genomics ,genetics ,Genetics ,QH426-470 - Abstract
Cilia are highly specialized cellular organelles that serve multiple functions in human development and health. Their central importance in the body is demonstrated by the occurrence of a diverse range of developmental disorders that arise from defects of cilia structure and function, caused by a range of different inherited mutations found in more than 150 different genes. Genetic analysis has rapidly advanced our understanding of the cell biological basis of ciliopathies over the past two decades, with more recent technological advances in genomics rapidly accelerating this progress. The 100,000 Genomes Project was launched in 2012 in the UK to improve diagnosis and future care for individuals affected by rare diseases like ciliopathies, through whole genome sequencing (WGS). In this review we discuss the potential promise and medical impact of WGS for ciliopathies and report on current progress of the 100,000 Genomes Project, reviewing the medical, technical and ethical challenges and opportunities that new, large scale initiatives such as this can offer.
- Published
- 2019
- Full Text
- View/download PDF
9. 2 Genomics to Study Basal Lineage Fungal Biology: Phylogenomics Suggests a Common Origin
- Author
-
Shelest, Ekaterina, Voigt, Kerstin, Esser, Karl, Series editor, and Nowrousian, Minou, editor
- Published
- 2014
- Full Text
- View/download PDF
10. Genomics of Subtelomeres: Technical Problems, Solutions and the Future
- Author
-
Becker, Marion M., Louis, Edward J., Louis, Edward J, editor, and Becker, Marion M, editor
- Published
- 2014
- Full Text
- View/download PDF
11. Iranome: A catalog of genomic variations in the Iranian population.
- Author
-
Fattahi, Zohreh, Beheshtian, Maryam, Mohseni, Marzieh, Poustchi, Hossein, Sellars, Erin, Nezhadi, Sayyed Hossein, Amini, Amir, Arzhangi, Sanaz, Jalalvand, Khadijeh, Jamali, Peyman, Mohammadi, Zahra, Davarnia, Behzad, Nikuei, Pooneh, Oladnabi, Morteza, Mohammadzadeh, Akbar, Zohrehvand, Elham, Nejatizadeh, Azim, Shekari, Mohammad, Bagherzadeh, Maryam, and Shamsi‐Gooshki, Ehsan
- Abstract
Considering the application of human genome variation databases in precision medicine, population‐specific genome projects are continuously being developed. However, the Middle Eastern population is underrepresented in current databases. Accordingly, we established Iranome database (www.iranome.com) by performing whole exome sequencing on 800 individuals from eight major Iranian ethnic groups representing the second largest population of Middle East. We identified 1,575,702 variants of which 308,311 were novel (19.6%). Also, by presenting higher frequency for 37,384 novel or known rare variants, Iranome database can improve the power of molecular diagnosis. Moreover, attainable clinical information makes this database a good resource for classifying pathogenicity of rare variants. Principal components analysis indicated that, apart from Iranian‐Baluchs, Iranian‐Turkmen, and Iranian‐Persian Gulf Islanders, who form their own clusters, rest of the population were genetically linked, forming a super‐population. Furthermore, only 0.6% of novel variants showed counterparts in "Greater Middle East Variome Project", emphasizing the value of Iranome at national level by releasing a comprehensive catalog of Iranian genomic variations and also filling another gap in the catalog of human genome variations at international level. We introduce Iranome as a resource which may also be applicable in other countries located in neighboring regions historically called Greater Iran (Persia). [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
12. Opportunities and Challenges for Molecular Understanding of Ciliopathies–The 100,000 Genomes Project.
- Author
-
Wheway, Gabrielle and Mitchison, Hannah M.
- Subjects
CILIA & ciliary motion ,GENOMES ,COMPREHENSION - Abstract
Cilia are highly specialized cellular organelles that serve multiple functions in human development and health. Their central importance in the body is demonstrated by the occurrence of a diverse range of developmental disorders that arise from defects of cilia structure and function, caused by a range of different inherited mutations found in more than 150 different genes. Genetic analysis has rapidly advanced our understanding of the cell biological basis of ciliopathies over the past two decades, with more recent technological advances in genomics rapidly accelerating this progress. The 100,000 Genomes Project was launched in 2012 in the UK to improve diagnosis and future care for individuals affected by rare diseases like ciliopathies, through whole genome sequencing (WGS). In this review we discuss the potential promise and medical impact of WGS for ciliopathies and report on current progress of the 100,000 Genomes Project, reviewing the medical, technical and ethical challenges and opportunities that new, large scale initiatives such as this can offer. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
13. Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies
- Author
-
Alaina Shumate, Jonathan Wood, Benedict Paten, Karen H. Miga, Giulio Formenti, Daniela C. Soto, Ivan Sović, Andrey Bzikadze, Arang Rhie, Kishwar Shafin, Adam M. Phillippy, Glennis A. Logsdon, Chirag Jain, Sergey Koren, Michael Alonge, Justin M. Zook, Alla Mikheenko, Arkarachai Fungtammasan, Kerstin Howe, and Ann M Mc Cartney
- Subjects
Genome, Human ,Computer science ,High-Throughput Nucleotide Sequencing ,Sequence assembly ,Polishing ,Sequence Analysis, DNA ,Genome project ,Computational biology ,Cell Biology ,Telomere ,Genome ,Biochemistry ,Nanopores ,Tandem repeat ,Pregnancy ,Humans ,Female ,Human genome ,Nanopore sequencing ,Molecular Biology ,Segmental duplication ,Biotechnology - Abstract
Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k-mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.
- Published
- 2022
14. Computational methods for inferring location and genealogy of overlapping genes in virus genomes: approaches and applications
- Author
-
Angelo Pavesi
- Subjects
Coronavirus disease 2019 (COVID-19) ,SARS-CoV-2 ,Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) ,Reading frame ,COVID-19 ,Computational Biology ,Genome, Viral ,Genome project ,Computational biology ,Biology ,Genome ,Article ,Virus ,Evolution, Molecular ,Open Reading Frames ,Virology ,Codon usage bias ,Genes, Overlapping ,Humans ,Pandemics ,Gene - Abstract
Viruses may evolve to increase the amount of encoded genetic information by means of overlapping genes, which utilize several reading frames. Such overlapping genes may be especially impactful for genomes of small size, often serving a source of novel accessory proteins, some of which play a crucial role in viral pathogenicity or in promoting the systemic spread of virus. Diverse genome-based metrics were proposed to facilitate recognition of overlapping genes that otherwise may be overlooked during genome annotation. They can detect the atypical codon bias associated with the overlap (e.g. a statistically significant reduction in variability at synonymous sites) or other sequence-composition features peculiar to overlapping genes. In this review, I compare nine computational methods, discuss their strengths and limitations, and survey how they were applied to detect candidate overlapping genes in the genome of SARS-CoV-2, the etiological agent of COVID-19 pandemic.
- Published
- 2022
15. Improving the Genome Annotation of Rhizoctonia solani Using Proteogenomics
- Author
-
Li Ming, Ge Feng, Shu Jiantao, Yang Pingfang, Yang Mingkun, and Zhang Cheng
- Subjects
Rhizoctonia solani ,biology ,Genetics ,food and beverages ,Computational biology ,Genome project ,Proteogenomics ,biology.organism_classification ,Genetics (clinical) - Abstract
Background: Rhizoctonia solani is a pathogenic fungus that causes serious diseases in many crops, including rice, wheat, and soybeans. In crop production, it is very important to understand the pathogenicity of this fungus, which is still elusive. It might be helpful to comprehensively understand its genomic information using different genome annotation strategies. Methods: Aiming to improve the genome annotation of R. solani, we performed a proteogenomic study based on the existing data. Based on our study, a total of 1060 newly identified genes, 36 revised genes, 139 single amino acid variants (SAAVs), 8 alternative splicing genes, and diverse post-translational modifications (PTMs) events were identified in R. solani AG3. Further functional annotation on these 1060 newly identified genes was performed through homology analysis with its 5 closest relative fungi. Results: Based on this, 2 novel candidate pathogenic genes, which might be associated with pathogen-host interaction, were discovered. In addition, in order to increase the reliability and novelty of the newly identified genes in R. solani AG3, 1060 newly identified genes were compared with the newly published available R. solani genome sequences of AG1, AG2, AG4, AG5, AG6, and AG8. There are 490 homologous sequences. We combined the proteogenomic results with the genome alignment results and finally identified 570 novel genes in R. solani. Conclusion: These findings extended R. solani genome annotation and provided a wealth of resources for research on R. solani.
- Published
- 2021
16. Phage Commander, an Application for Rapid Gene Identification in Bacteriophage Genomes Using Multiple Programs
- Author
-
Geordie Ryder, Sarah L. Harris, Philippos K. Tsourkas, and Matt Lazeroff
- Subjects
Bacteriophage ,Genomics ,Identification (biology) ,macromolecular substances ,Genome project ,Computational biology ,Biology ,biology.organism_classification ,Genome ,Gene - Abstract
The number of sequenced bacteriophage genomes is growing at an exponential rate. The majority of sequenced bacteriophage genomes are annotated by one or more of several freely available gene identification programs (Glimmer, GeneMark, RAST, Prodigal, etc.). No program has been shown to consistently outperform the others; thus, the choice of which program to use is not obvious. We present the Phage Commander application for rapid identification of bacteriophage genes using multiple gene identification programs. Phage Commander runs a bacteriophage genome sequence through nine gene identification programs (and an additional program for identification of tRNAs) and integrates the results within a single output table. Phage Commander also generates formatted output files for direct export to National Center for Biotechnology Information GenBank or genome visualization programs such as DNA Master. Users can select the threshold for which genes to export (genes identified by at least one program, genes identified by at least two programs, etc.). Phage Commander was benchmarked using eight high-quality bacteriophage genomes whose genes are backed by experimental data. Our results show that the most accurate annotations are obtained by exporting genes identified by at least two or three programs. Many groups opt to manually curate the annotations obtained from gene identification programs, and Phage Commander was designed to facilitate manual curation of genome annotations. Our benchmarking results show that manual curation does indeed produce more accurate annotations than any individual gene identification program. The authors thus recommend manually curating the output of Phage Commander to generate maximally accurate annotations. Phage Commander is currently being used in the corresponding author's bacteriophage genome annotation class and has reduced the labor cost and improved the quality of genome annotations.
- Published
- 2021
17. Draft genome sequence of Staphylococcus agnetis 4244, a strain with gene clusters encoding distinct post-translationally modified antimicrobial peptides
- Author
-
K.M. Towle, Maria do Carmo de Freire Bastos, Marco J. van Belkum, Andreza Freitas de Souza Duarte, Sorina Chiorean, Marcus Lívio Varella Coelho, Ingolf F. Nes, Márcia Silva Francisco, Gabriela Silva Almeida, and John C. Vederas
- Subjects
Methicillin-Resistant Staphylococcus aureus ,Microbiology (medical) ,Staphylococcus aureus ,Antibiotic resistance ,Staphylococcus ,Immunology ,Biology ,Microbiology ,Genome ,DNA sequencing ,Thiopeptide ,Bacteriocin ,Animals ,Immunology and Allergy ,Gene ,Prophage ,Whole genome sequencing ,Genetics ,Genome project ,biochemical phenomena, metabolism, and nutrition ,Draft genome sequence ,QR1-502 ,Multigene Family ,Staphylococcus agnetis ,Cattle ,Female ,Sactipeptide ,Antimicrobial Peptides - Abstract
Objectives Here we report the draft genome sequence of Staphylococcus agnetis 4244, a strain involved in bovine mastitis, and its ability to inhibit different species of antibiotic-resistant Gram-positive bacteria owing to bacteriocin production. Methods An Illumina MiSeq platform was used for genome sequencing. De novo genome assembly was done using the A5-miseq pipeline. Genome annotation was performed by the RAST server, and mining of bacteriocinogenic gene clusters was done using the BAGEL4 and antiSMASH v.5.0 platforms. Investigation of the spectrum of activity of S. agnetis 4244 was performed on BHI agar by deferred antagonism assay. Results The total scaffold size was determined to be 2 511 708 bp featuring a G+C content of 35.6%. The genome contains 2431 protein-coding sequences and 80 RNA sequences. Genome analyses revealed three prophage sequences inserted in the genome as well as several genes involved in drug resistance and two bacteriocin gene clusters (encoding a thiopeptide and a sactipeptide) encoded on the bacterial chromosome. Staphylococcus agnetis 4244 was able to inhibit all 44 strains of antibiotic-resistant Gram-positive bacteria tested in this study, including vancomycin-resistant enterococci (VRE), methicillin-resistant Staphylococcus aureus (MRSA) and other antibiotic-resistant staphylococcal strains. Conclusion This study emphasises the potential biotechnological application of this strain for production of bacteriocins that could be used in the food industry as biopreservatives and/or in medicine as alternative therapeutic options against VRE, MRSA, vancomycin-intermediate S. aureus and other antibiotic-resistant Gram-positive bacteria, including biofilm-forming isolates. It also provides some genetic features of the draft genome of S. agnetis 4244.
- Published
- 2021
18. Managing Trust and Risk in New Biotechnologies: The Case of Population Genome Project and Organ Transplantation in Latvia
- Author
-
Putnina, Aivita, Robbins, Peter T., editor, and Huzair, Farah, editor
- Published
- 2012
- Full Text
- View/download PDF
19. Long-read assembly and comparative evidence-based reanalysis of Cryptosporidium genome sequences reveal expanded transporter repertoire and duplication of entire chromosome ends including subtelomeric regions
- Author
-
Garrett W. Cooper, Peter Georgeson, Brendan R E Ansell, Mandy Sanders, Rui Xiao, Ethan D. Smith, Bernard J. Pope, Boris Striepen, Rodrigo P. Baptista, Jennifer E. Dumaine, Aaron R. Jex, Karen Brooks, Alan Tracey, Jessica C. Kissinger, James Cotton, Matthew Berriman, Yiran Li, and Adam Sateriale
- Subjects
Cryptosporidium parvum ,Genetics ,Sequence assembly ,Copy-number variation ,Computational biology ,Genome project ,Biology ,biology.organism_classification ,Genome ,Gene ,Cryptosporidium hominis ,Genetics (clinical) ,Reference genome - Abstract
Cryptosporidiosis is a leading cause of waterborne diarrheal disease globally and an important contributor to mortality in infants and the immunosuppressed. Despite its importance, the Cryptosporidium community has only had access to a good, but incomplete, Cryptosporidium parvum IOWA reference genome sequence. Incomplete reference sequences hamper annotation, experimental design, and interpretation. We have generated a new C. parvum IOWA genome assembly supported by Pacific Biosciences (PacBio) and Oxford Nanopore long-read technologies and a new comparative and consistent genome annotation for three closely related species: C. parvum, Cryptosporidium hominis, and Cryptosporidium tyzzeri. We made 1926 C. parvum annotation updates based on experimental evidence. They include new transporters, ncRNAs, introns, and altered gene structures. The new assembly and annotation revealed a complete Dnmt2 methylase ortholog. Comparative annotation between C. parvum, C. hominis, and C. tyzzeri revealed that most “missing” orthologs are found, suggesting that the biological differences between the species must result from gene copy number variation, differences in gene regulation, and single-nucleotide variants (SNVs). Using the new assembly and annotation as reference, 190 genes are identified as evolving under positive selection, including many not detected previously. The new C. parvum IOWA reference genome assembly is larger, gap free, and lacks ambiguous bases. This chromosomal assembly recovers all 16 chromosome ends, 13 of which are contiguously assembled. The three remaining chromosome ends are provisionally placed. These ends represent duplication of entire chromosome ends including subtelomeric regions revealing a new level of genome plasticity that will both inform and impact future research.
- Published
- 2021
20. Chromosome‐scale genome assembly of Castanopsis tibetana provides a powerful comparative framework to study the evolution and adaptation of Fagaceae trees
- Author
-
Yi Feng, Xiaorong Zeng, Risheng Chen, Jianling Guo, Shuang Chen, Kai Yang, and Ye Sun
- Subjects
Comparative genomics ,Genome ,biology ,Sequence assembly ,Molecular Sequence Annotation ,Genomics ,Genome project ,Castanopsis ,Fagaceae ,biology.organism_classification ,Chromosomes ,Trees ,Evolutionary biology ,Genetics ,Genome size ,Phylogeny ,Ecology, Evolution, Behavior and Systematics ,Biotechnology - Abstract
Fagaceae species are increasingly used as models to elucidate the process and mechanism of adaptation and speciation by integrating ecology, evolution and genomics. The genus Castanopsis belongs to the family Fagaceae and is mainly distributed across subtropical and tropical Asia. In the present study, we reported the first chromosome-scale genome assembly of Castanopsis tibetana, a common species of evergreen broadleaved forests in subtropical China. The combination of Nanopore sequencing and Hi-C technologies enabled a high-quality genome assembly. The final assembled genome size of C. tibetana was 878.6 Mb (97.6% of the estimated genome size), consisting of 477 contigs with an N50 length of 3.3 Mb. The benchmarking universal single-copy orthologue (BUSCO) assessment indicated a completeness of 93.0%. Hi-C scaffolding generated 12 pseudochromosomes, representing 98.7% of the assembled genome. Subsequently, 40,937 protein-coding genes were predicted and 90.04% of them were functionally annotated. More than 476.9 Mb of repetitive sequences (54.3% of the genome) were identified, and the percentage of the genome covered by TE elements was 39.98%. Comparative genomics analysis revealed that C. tibetana was most closely related to Castanea mollissima and diverged at 18.48 Ma, and that C. tibetana has undergone considerable gene family expansion and contraction. Evidence of positive selection was detected in 53 genes, which showed different arrangement pattern compared to Quercus robur. The chromosome-scale genome assembly of C. tibetana will expand Fagaceae genome resources across the family and provide a powerful comparative framework to study the adaptation and evolution of Fagaceae trees.
- Published
- 2021
21. The Normal Human Adult Hypothalamus Proteomic Landscape: Rise of Neuroproteomics in Biological Psychiatry and Systems Biology
- Author
-
Jayshree Advani, Thottethodi Subrahmanya Keshava Prasad, Bipin G. Nair, Oishi Chatterjee, Susarla K. Shankar, Praseeda Mol, Lathika Gopalakrishnan, and Anita Mahadevan
- Subjects
Adult ,Proteomics ,Proteome ,Proteomic Profiling ,Systems Biology ,Pseudogene ,Systems biology ,Hypothalamus ,Computational biology ,Genome project ,Biology ,Biochemistry ,Neuroproteomics ,Genetics ,Humans ,Molecular Medicine ,Protein Processing, Post-Translational ,Molecular Biology ,Gene ,Biological Psychiatry ,Biotechnology - Abstract
The human hypothalamus is central to the regulation of neuroendocrine and neurovegetative systems, as well as modulation of chronobiology and behavioral aspects in human health and disease. Surprisingly, a deep proteomic analysis of the normal human hypothalamic proteome has been missing for such an important organ so far. In this study, we delineated the human hypothalamus proteome using a high-resolution mass spectrometry approach which resulted in the identification of 5349 proteins, while a multiple post-translational modification (PTM) search identified 191 additional proteins, which were missed in the first search. A proteogenomic analysis resulted in the discovery of multiple novel protein-coding regions as we identified proteins from noncoding regions (pseudogenes) and proteins translated from short open reading frames that can be missed using the traditional pipeline of prediction of protein-coding genes as a part of genome annotation. We also identified several PTMs of hypothalamic proteins that may be required for normal hypothalamic functions. Moreover, we observed an enrichment of proteins pertaining to autophagy and adult neurogenesis in the proteome data. We believe that the hypothalamic proteome reported herein would help to decipher the molecular basis for the diverse range of physiological functions attributed to it, as well as its role in neurological and psychiatric diseases. Extensive proteomic profiling of the hypothalamic nuclei would further elaborate on the role and functional characterization of several hypothalamus-specific proteins and pathways to inform future research and clinical discoveries in biological psychiatry, neurology, and system biology.
- Published
- 2021
22. Genome assisted probiotic characterization and application of Bacillus velezensis ZBG17 as an alternative to antibiotic growth promoters in broiler chickens
- Author
-
Hareshkumar Keharia, Anjali Bose, S. Paul, S.V. Rama Rao, Ninad Pandit, Jayraj Doshi, M.V.L.N. Raju, and Riteshri Soni
- Subjects
Probiotics ,Broiler ,Bacillus ,Genome project ,Biology ,medicine.disease_cause ,biology.organism_classification ,Animal Feed ,Genome ,Cell aggregation ,Anti-Bacterial Agents ,Diet ,Microbiology ,law.invention ,Probiotic ,Salmonella enterica ,law ,Genetics ,medicine ,Animals ,Chickens ,Gene ,Escherichia coli - Abstract
The present study describes genome annotation and phenotypic characterization of Bacillus velezensis ZBG17 and evaluation of its performance as antibiotic growth promoter substitute in broiler chickens. ZBG17 comprises 3.89 Mbp genome with GC content of 46.5%. ZBG17 could tolerate simulated gastrointestinal juices prevalent in the animal gut. Some adhesion-associated genomic features of ZBG17 supported the experimentally determined cell surface hydrophobicity and cell aggregation results. ZBG17 encoded multiple secondary metabolite gene clusters correlating with its broad-spectrum antibacterial activity. Interestingly, ZBG17 completely inhibited Salmonella enterica and Escherichia coli within 6 h and 8 h in liquid co-culture assay, respectively. ZBG17 genome analysis did not reveal any genetic determinant associated with reported safety hazards for use as a poultry direct-fed microbial. Dietary supplementation of ZBG17 significantly improved feed utilization efficiency and humoral immune response in broiler chickens, suggesting its prospective application as a direct-fed microbial in broiler chickens.
- Published
- 2021
23. Single-molecule real-time transcript sequencing of developing cotton anthers facilitates genome annotation and fertility restoration candidate gene discovery
- Author
-
Huini Tang, Juanjuan Feng, Liping Guo, Zhidan Zuo, Jianyong Wu, Ting Li, Xuexian Zhang, Bingbing Zhang, Xiuqin Qiao, Chaozhu Xing, Tingxiang Qi, Hailin Wang, Yongjie Zhang, and Meng Zhang
- Subjects
Candidate gene ,Cytoplasmic male sterility ,Alternative splicing ,Genome project ,Computational biology ,Biology ,Transcriptome ,Fertility ,Genetics ,RNA, Long Noncoding ,RNA-Seq ,ORFS ,Gene ,Genetic Association Studies ,Single molecule real time sequencing - Abstract
Heterosis refers to the superior phenotypes observed in hybrids. Cytoplasmic male sterility (CMS) system plays an important role in cotton heterosis utilization. However, the global gene expression patterns of CMS-D2 and its interaction with the restorer gene Rf1 remain unclear. Here, the full-length transcript sequencing was performed in anthers of the CMS-D2 restorer line using PacBio single-molecule real-time sequencing technology. Combining PacBio SMRT long-read isoforms and Illumina RNA-seq data, 107,066 isoforms from 44,338 loci were obtained, including 10,086 novel isoforms of novel genes and 66,419 new isoforms of known genes. Totally 56,572 alternative splicing (AS) events, 1146 lncRNAs, 61 fusion transcripts and 10,466 genes exhibited alternative polyadenylation (APA), and 60,995 novel isoforms with predicted open reading frames (ORFs) were further identified. Furthermore, the specifically expressed genes in restorer line were selected and confirmed by qRT-PCR. These findings provide a basis for upland cotton genome annotation and transcriptome research, and will help to reveal the molecular mechanism of interaction between Rf1 and CMS-D2 cytoplasm.
- Published
- 2021
24. Annotation depth confounds direct comparison of gene expression across species
- Author
-
Matthew Martin, Elias M. Oziolor, and Seda Arat
- Subjects
Normalization (statistics) ,QH301-705.5 ,Annotation ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Computational biology ,Biology ,TPM ,Biochemistry ,Transcriptome ,Mice ,Dogs ,Structural Biology ,Abundance (ecology) ,Gene expression ,Animals ,Biology (General) ,Molecular Biology ,Gene ,Genome ,Sequence Analysis, RNA ,Pre-clinical species ,Methodology Article ,Applied Mathematics ,Molecular Sequence Annotation ,Genome project ,RNAseq ,Rats ,Computer Science Applications ,Macaca fascicularis ,Cross-species comparisons ,DNA microarray - Abstract
Background Comparisons of the molecular framework among organisms can be done on both structural and functional levels. One of the most common top-down approaches for functional comparisons is RNA sequencing. This estimation of organismal transcriptional responses is of interest for understanding evolution of molecular activity, which is used for answering a diversity of questions ranging from basic biology to pre-clinical species selection and translation. However, direct comparison between species is often hindered by evolutionary divergence in structure of molecular framework, as well as large difference in the depth of our understanding of the genetic background between humans and other species. Here, we focus on the latter. We attempt to understand how differences in transcriptome annotation affect direct gene abundance comparisons between species. Results We examine and suggest some straightforward approaches for direct comparison given the current available tools and using a sample dataset from human, cynomolgus monkey, dog, rat and mouse with a common quantitation and normalization approach. In addition, we examine how variation in genome annotation depth and quality across species may affect these direct comparisons. Conclusions Our findings suggest that further efforts for better genome annotation or computational normalization tools may be of strong interest.
- Published
- 2021
25. Parallel subgenome structure and divergent expression evolution of allo-tetraploid common carp and goldfish
- Author
-
Zai-Jie Dong, Qi Wang, Ming-Shu Cui, Hong-Wei Wang, Ju-Hua Yu, Ya-Xin Wang, Jiong-Tang Li, Chen-Ru Yang, Yan Zhang, Ran Zhao, Qing-Song Li, Mei-Di Huang Yang, Zhi-Ying Jia, Yu-Jie Zhao, Xiyin Wang, and Xiao-Qing Sun
- Subjects
Candidate gene ,Carps ,Karyotype ,Sequence assembly ,Biology ,Synteny ,Article ,Evolution, Molecular ,Common carp ,Negative selection ,Species Specificity ,Goldfish ,Genetics ,Animals ,Selection, Genetic ,Phylogeny ,Likelihood Functions ,Genome ,Dosage compensation ,Base Sequence ,Genetic Variation ,food and beverages ,Molecular Sequence Annotation ,Genomics ,Genome project ,Tetraploidy ,Alternative Splicing ,Gene Expression Regulation ,Evolutionary biology ,Gene expression ,Ploidy ,Zoology - Abstract
How two subgenomes in allo-tetraploids adapt to coexistence and coordinate through structure and expression evolution requires extensive studies. In the present study, we report an improved genome assembly of allo-tetraploid common carp, an updated genome annotation of allo-tetraploid goldfish and the chromosome-scale assemblies of a progenitor-like diploid Puntius tetrazona and an outgroup diploid Paracanthobrama guichenoti. Parallel subgenome structure evolution in the allo-tetraploids was featured with equivalent chromosome components, higher protein identities, similar transposon divergence and contents, homoeologous exchanges, better synteny level, strong sequence compensation and symmetric purifying selection. Furthermore, we observed subgenome expression divergence processes in the allo-tetraploids, including inter-/intrasubgenome trans-splicing events, expression dominance, decreased expression levels, dosage compensation, stronger expression correlation, dynamic functionalization and balancing of differential expression. The potential disorders introduced by different progenitors in the allo-tetraploids were hypothesized to be alleviated by increasing structural homogeneity and performing versatile expression processes. Resequencing three common carp strains revealed two major ecotypes and uncovered candidate genes relevant to growth and survival rate., Genomic analysis of allo-tetraploid common carp and goldfish identifies parallel subgenome structure and divergent expression processes.
- Published
- 2021
26. Genome of Bifidobacterium longum NCIM 5672 provides insights into its acid-tolerance mechanism and probiotic properties
- Author
-
Kanika Bansal, Aravind Sundararaman, Prakash M. Halami, Jameema Sidhic, and Prabhu B. Patil
- Subjects
Whole genome sequencing ,Genetics ,Candidate gene ,Bifidobacterium longum ,Probiotics ,food and beverages ,General Medicine ,Genome project ,Biology ,biology.organism_classification ,Biochemistry ,Microbiology ,Genome ,law.invention ,Feces ,Probiotic ,law ,Humans ,Bifidobacterium ,Molecular Biology ,Gene ,Genome, Bacterial ,GC-content - Abstract
Bifidobacterium longum NCIM 5672 is a probiotic strain isolated from the Indian infant feces. The probiotic efficacy of Bifidobacteria is majorly affected by its acid tolerance. This study determined the probiotic properties and acid-tolerance mechanism of B. longum NCIM 5672 using whole-genome sequencing. The genome annotation is carried out using the RAST web server and NCBI PGAAP. The draft genome sequence of this strain, assembled in 63 contigs, consists of 22,46,978 base pairs, 1900 coding sequences and a GC content of 59.6%. The genome annotation revealed that seven candidate genes might be involved in regulating the acid tolerance of B. longum NCIM 5672. Furthermore, the presence of genes associated with immunomodulation and cell adhesion support the probiotic background of the strain. The analysis of candidate acid- tolerance-associated genes revealed three genes, argC, argH, and dapA, may play an essential role in high acid tolerance in B. longum NCIM 5672. The results of RT-qPCR supported this conclusion. Altogether, the results presented here supply an effective way to select acid-resistant strains for the food industry and provide new strategies to enhance this species' industrial applications and health-promoting properties.
- Published
- 2021
27. Syn Wiki: Functional annotation of the first artificial organism Mycoplasma mycoides JCVI‐syn3A
- Author
-
Neil Singh, Jörg Stülke, Christoph Elfmann, and Tiago Pedreira
- Subjects
genome annotation ,Computational biology ,Biochemistry ,03 medical and health sciences ,Synthetic biology ,Bacterial Proteins ,Databases, Genetic ,essential genes ,Molecular Biology ,Gene ,Organism ,030304 developmental biology ,0303 health sciences ,biology ,Tools for Protein Science ,030306 microbiology ,SynWiki ,Mycoplasma mycoides ,Molecular Sequence Annotation ,Genome project ,biology.organism_classification ,Functional annotation ,Synthetic Biology ,Function (biology) ,Genome, Bacterial ,Software - Abstract
The new field of synthetic biology aims at the creation of artificially designed organisms. A major breakthrough in the field was the generation of the artificial synthetic organism Mycoplasma mycoides JCVI‐syn3A. This bacterium possesses only 452 protein‐coding genes, the smallest number for any organism that is viable independent of a host cell. However, about one third of the proteins have no known function indicating major gaps in our understanding of simple living cells. To facilitate the investigation of the components of this minimal bacterium, we have generated the database SynWiki (http://synwiki.uni-goettingen.de/). SynWiki is based on a relational database and gives access to published information about the genes and proteins of M. mycoides JCVI‐syn3A. To gain a better understanding of the functions of the genes and proteins of the artificial bacteria, protein–protein interactions that may provide clues for the protein functions are included in an interactive manner. SynWiki is an important tool for the synthetic biology community that will support the comprehensive understanding of a minimal cell as well as the functional annotation of so far uncharacterized proteins.
- Published
- 2021
28. Advances in Genome-Scale Metabolic Modeling toward Microbial Community Analysis of the Human Microbiome
- Author
-
Kutlu O. Ulgen and Elif Esvap
- Subjects
endocrine system diseases ,Systems biology ,Biomedical Engineering ,Computational biology ,Biology ,Models, Biological ,Biochemistry, Genetics and Molecular Biology (miscellaneous) ,Machine Learning ,Humans ,Microbiome ,Precision Medicine ,Organism ,business.industry ,Microbiota ,Human microbiome ,Parkinson Disease ,Genomics ,General Medicine ,Genome project ,Inflammatory Bowel Diseases ,Gastrointestinal Microbiome ,Microbial population biology ,Metagenomics ,Personalized medicine ,business ,Metabolic Networks and Pathways - Abstract
A genome-scale metabolic model (GEM) represents metabolic pathways of an organism in a mathematical form and can be built using biochemistry and genome annotation data. GEMs are invaluable for understanding organisms since they analyze the metabolic capabilities and behaviors quantitatively and can predict phenotypes. The development of high-throughput data collection techniques led to an immense increase in omics data such as metagenomics, which expand our knowledge on the human microbiome, but this also created a need for systematic analysis of these data. In recent years, GEMs have also been reconstructed for microbial species, including human gut microbiota, and methods for the analysis of microbial communities have been developed to examine the interaction between the organisms or the host. The purpose of this review is to provide a comprehensive guide for the applications of GEMs in microbial community analysis. Starting with GEM repositories, automatic GEM reconstruction tools, and quality control of models, this review will give insights into microbe-microbe and microbe-host interaction predictions and optimization of microbial community models. Recent studies that utilize microbial GEMs and personalized models to infer the influence of microbiota on human diseases such as inflammatory bowel diseases (IBD) or Parkinson's disease are exemplified. Being powerful system biology tools for both species-level and community-level analysis of microbes, GEMs integrated with omics data and machine learning techniques will be indispensable for studying the microbiome and their effects on human physiology as well as for deciphering the mechanisms behind human diseases.
- Published
- 2021
29. Genome-wide DNA polymorphisms of Citrus unshiu Marc. cv. Miyagawa-wase cultivated in different regions based on whole-genome re-sequencing
- Author
-
Chang-Ho Eun and In-Jung Kim
- Subjects
Citrus unshiu ,Genetics ,biology ,Genetic variation ,Single-nucleotide polymorphism ,Plant Science ,Genome project ,Indel ,biology.organism_classification ,Genome ,Gene ,Biotechnology ,Reference genome - Abstract
Citrus unshiu Marc. cv. Miyagawa-wase is the most widely cultivated citrus variety in Korea. To determine whether the C. unshiu genome used in this study shows genetic variation compared to the published reference genome (C. unshiu Marc. Miyagawa-wase), we conducted genome re-sequencing of two C. unshiu cultivars (Miyagawa-1 and Miyagawa-2) cultivated on Jeju in Korea. Compared with the reference genome, 1,198,650 and 1,207,084 single-nucleotide polymorphisms (SNPs) and 172,259 and 172,391 insertion/deletion polymorphisms (InDels) were detected in the Miyagawa-1 and -2 genomes, respectively. In SNP and InDel classifications by genome annotation, 367,591 and 369,068 SNPs and 45,362 and 45,464 InDels were located in the genic regions of Miyagawa-1 and -2, respectively. Among the SNPs of Miyagawa-1 and -2, transitions were more frequent than transversions. The majority of InDels was distributed in 1-bp InDels in both cultivars. The comparative number of total SNPs between Miyagawa-1 and -2 was smaller than the number of SNPs between the reference genome and Miyagawa-1 or -2. Gene ontology (GO) analysis showed that 23,164 and 23,049 genes with SNPs and 16,830 and 16,774 genes with InDels were annotated in the GO database. Taken together, Miyagawa-1 and -2 show genome-wide variation, including SNPs and InDels, compared to the published C. unshiu Marc. cv. Miyagawa-wase genome. This study suggests it would be more accurate to use the Miyagawa-1 and -2 genome sequences as a reference when conducting research using C. unshiu cultivated in Korea.
- Published
- 2021
30. Genome Warehouse: A Public Repository Housing Genome-scale Data
- Author
-
Yingke Ma, Wenming Zhao, Hongen Kang, Song Wu, Zhaohua Li, Xingjian Xu, Xinchang Zheng, Jingfa Xiao, Meili Chen, Zhang Zhang, Zheng Gong, Yiming Bao, Jian Sang, and Lili Hao
- Subjects
China ,Computer science ,Data management ,Sequence assembly ,Genomics ,Biochemistry ,Genome ,World Wide Web ,03 medical and health sciences ,Annotation ,0302 clinical medicine ,Databases, Genetic ,Genetics ,Molecular Biology ,030304 developmental biology ,Whole genome sequencing ,0303 health sciences ,business.industry ,Genome project ,Metadata ,Computational Mathematics ,Housing ,business ,030217 neurology & neurosurgery - Abstract
The Genome Warehouse (GWH) is a public repository housing genome assembly data for a wide range of species and delivering a series of web services for genome data submission, storage, release, and sharing. As one of the core resources in the National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB, https://ngdc.cncb.ac.cn), GWH accepts both full genome and partial genome (chloroplast, mitochondrion, and plasmid) sequences with different assembly levels, as well as an update of existing genome assemblies. For each assembly, GWH collects detailed genome-related metadata of biological project, biological sample, and genome assembly, in addition to genome sequence and annotation. To archive high-quality genome sequences and annotations, GWH is equipped with a uniform and standardized procedure for quality control. Besides basic browse and search functionalities, all released genome sequences and annotations can be visualized with JBrowse. By May 21, 2021, GWH has received 19,124 direct submissions covering a diversity of 1108 species and has released 8772 of them. Collectively, GWH serves as an important resource for genome-scale data management and provides free and publicly accessible data to support research activities throughout the world. GWH is publicly accessible at https://ngdc.cncb.ac.cn/gwh.
- Published
- 2021
31. Multi-locus genome-wide association studies for five yield-related traits in rice
- Author
-
Hua Zhong, Shuai Liu, Weilong Kong, Tong Sun, Zhaohua Peng, Xiaoxiao Deng, and Yangsheng Li
- Subjects
Yield ,MLM ,Quantitative Trait Loci ,Single-nucleotide polymorphism ,Genome-wide association study ,Locus (genetics) ,Oryza sativa ,Plant Science ,Quantitative trait locus ,Biology ,Polymorphism, Single Nucleotide ,Linkage Disequilibrium ,Gene ,Genetic association ,Genetics ,Botany ,food and beverages ,Oryza ,QTNs ,Genome project ,Plant Breeding ,Phenotype ,QK1-989 ,ML-GWAS ,Research Article ,Genome-Wide Association Study - Abstract
Background Improving the overall production of rice with high quality is a major target of breeders. Mining potential yield-related loci have been geared towards developing efficient rice breeding strategies. In this study, one single-locus genome-wide association studies (SL-GWAS) method (MLM) in conjunction with five multi-locus genome-wide association studies (ML-GWAS) approaches (mrMLM, FASTmrMLM, pLARmEB, pKWmEB, and ISIS EM-BLASSO) were conducted in a panel consisting of 529 rice core varieties with 607,201 SNPs. Results A total of 152, 106, 12, 111, and 64 SNPs were detected by the MLM model associated with the five yield-related traits, namely grain length (GL), grain width (GW), grain thickness (GT), thousand-grain weight (TGW), and yield per plant (YPP), respectively. Furthermore, 74 significant quantitative trait nucleotides (QTNs) were presented across at least two ML-GWAS methods to be associated with the above five traits successively. Finally, 20 common QTNs were simultaneously discovered by both SL-GWAS and ML-GWAS methods. Based on genome annotation, gene expression analysis, and previous studies, two candidate key genes (LOC_Os09g02830 and LOC_Os07g31450) were characterized to affect GW and TGW, separately. Conclusions These outcomes will provide an indication for breeding high-yielding rice varieties in the immediate future.
- Published
- 2021
32. COMPUTATIONAL IDENTIFICATION OF PROMOTER REGIONS IN PROKARYOTES AND EUKARYOTES
- Author
-
Gopal P. Agarwal, Shanmughavel piramanayakam, and Sudheer Menon
- Subjects
chemistry.chemical_compound ,chemistry ,Drug discovery ,Regulatory sequence ,Sequence analysis ,Transcriptional regulation ,Promoter ,Genome project ,Computational biology ,Biology ,Gene ,DNA - Abstract
Promoters are modular DNA structures that contain complex regulatory elements required for the initiation of gene transcription. Therefore, the use of machine learning methods to identify promoters is very important for improving genome annotation and understanding transcriptional regulation. In recent years, many methods for predicting eukaryotic and prokaryotic promoters have been proposed. However, the performance of these methods is still far from satisfactory. In this article, we have developed a hybrid method (called IPMD) that combines a position correlation score function and diversity increment with modified Mahalanobis Discriminant to predict eukaryotic and prokaryotic promoters. The precise calculation and identification of promoters remains a challenge because these key DNA regulatory regions have variable structures composed of functional motifs that can provide gene-specific transcription initiation. The promoter is a regulatory DNA region, which is very important for gene transcription regulation. It is located near the transcription start site (TSS) upstream of the corresponding gene. In the post-genomics era, the availability of data makes it possible to build computational models to detect promoters robustly, because these models are expected to be helpful to academia and drug discovery. Until recently, the developed model only focused on distinguishing sequences into promoters and non-promoters. However, by considering the classification of weak and strong promoters, promoter predictors can be further improved. INDEX TERMS—: deep learning, DNA sequence analysis, Promoter prediction, Promoters, Promoter elements
- Published
- 2021
33. Whole-Genome Sequence Data Analysis of Anoxybacillus kamchatkensis NASTPD13 Isolated from Hot Spring of Myagdi, Nepal
- Author
-
Girish Sahni, Tribikram Bhattarai, Punam Yadav, G. S. Prasad, Lakshmaiah Sreerama, Shikha Sharma, and Jyoti Maharjan
- Subjects
Data Analysis ,0301 basic medicine ,Xylose isomerase ,Article Subject ,Glycoside Hydrolases ,030106 microbiology ,Biology ,Genome ,Hot Springs ,General Biochemistry, Genetics and Molecular Biology ,Open Reading Frames ,03 medical and health sciences ,Bacterial Proteins ,Nepal ,Xylose metabolism ,Amino Acid Sequence ,Gene ,Phylogeny ,Xylose ,Whole Genome Sequencing ,General Immunology and Microbiology ,Molecular Sequence Annotation ,General Medicine ,Genome project ,Ribosomal RNA ,Pentose transport ,030104 developmental biology ,Biochemistry ,Xylulokinase ,Medicine ,DNA, Circular ,Anoxybacillus ,Genome, Bacterial ,Research Article - Abstract
Anoxybacillus kamchatkensis NASTPD13 isolated from Paudwar hot spring of Myagdi, Nepal, upon morphological and biochemical analysis revealed to be Gram-positive, straight or slightly curved, rod-shaped, spore-forming, catalase, and oxidase-positive facultative anaerobes. It grows over a wide range of pH (5.0-11) and temperature (37-75°C), which showed growth in different reduced carbon sources such as starch raffinose, glucose, fructose, inositol, trehalose, sorbitol, mellobiose, and mannitol in aerobic conditions. Furthermore, the partial sequence obtained upon sequencing showed 99% sequence similarity in 16S rRNA gene sequence with A. kamchatkensis JW/VK-KG4 and was suggested to be Anoxybacillus kamchatkensis. Moreover, whole-genome analysis of NASTPD13 revealed 2,866,796 bp genome with a G+C content of 41.6%. Analysis of the genome revealed the presence of 102 RNA genes, which includes sequences coding for 19 rRNA and 79 tRNA genes. While the 16S rRNA gene sequence of strain NASTPD13 showed high similarity (>99%) to those of A. kamchatkensis JW/VK-KG4, RAST analysis of NASTPD13 genome suggested that A. kamchatkensis G10 is actually the closest neighbor in terms of sequence similarity. The genome annotation by RAST revealed various genes encoding glycoside hydrolases supporting that it can utilize several reduced carbon sources as observed and these genes could be important for carbohydrate-related industries. Xylanase pathway, particularly the genomic region encoding key enzymes for xylan depolymerization and xylose metabolism, further confirmed the presence of the complete gene in xylan metabolism. In addition, the complete xylose utilization gene locus analysis of NASTPD13 genome revealed all including D-xylose transport ATP-binding protein XylG and XylF, the xylose isomerase encoding gene XylA, and the gene XylB coding for a xylulokinase supported the fact that the isolate contains a complete set of genes related to xylan degradation, pentose transport, and metabolism. The results of the present study suggest that the isolated A. kamchatkensis NASTPD13 containing xylanase-producing genes could be useful in lignocellulosic biomass-utilizing industries where pentose polymers could also be utilized along with the hexose polymers.
- Published
- 2021
34. Draft de novo Genome Assembly of the Elusive Jaguarundi, Puma yagouaroundi
- Author
-
Natalia A. Serdyukova, Aleksey Komissarov, Ksenia Krasheninnikova, Polina L. Perelman, Stephen J. O'Brien, Daria V. Zhernakova, Pavel Dobrynin, Klaus-Peter Koepfli, Sergei Kliver, David W. Mohr, Gaik Tamazian, Alan F. Scott, Nikolay Cherkasov, Alexander S. Graphodatsky, and Anna S. Zhuk
- Subjects
Male ,AcademicSubjects/SCI01140 ,0106 biological sciences ,Felidae ,genome annotation ,Jhered/6 ,Genome Resources ,Sequence assembly ,Genomics ,010603 evolutionary biology ,01 natural sciences ,Genome ,linked reads ,03 medical and health sciences ,Puma ,biology.animal ,Genetics ,Animals ,Acinonyx jubatus ,10X Genomics Chromium ,Molecular Biology ,Genetics (clinical) ,Puma yagouaroundi ,030304 developmental biology ,0303 health sciences ,biology ,Contig ,Molecular Sequence Annotation ,Genome project ,biology.organism_classification ,whole genome assembly ,Evolutionary biology ,Female ,Biotechnology - Abstract
The Puma lineage within the family Felidae consists of 3 species that last shared a common ancestor around 4.9 million years ago. Whole-genome sequences of 2 species from the lineage were previously reported: the cheetah (Acinonyx jubatus) and the mountain lion (Puma concolor). The present report describes a whole-genome assembly of the remaining species, the jaguarundi (Puma yagouaroundi). We sequenced the genome of a male jaguarundi with 10X Genomics linked reads and assembled the whole-genome sequence. The assembled genome contains a series of scaffolds that reach the length of chromosome arms and is similar in scaffold contiguity to the genome assemblies of cheetah and puma, with a contig N50 = 100.2 kbp and a scaffold N50 = 49.27 Mbp. We assessed the assembled sequence of the jaguarundi genome using BUSCO, aligned reads of the sequenced individual and another published female jaguarundi to the assembled genome, annotated protein-coding genes, repeats, genomic variants and their effects with respect to the protein-coding genes, and analyzed differences of the 2 jaguarundis from the reference mitochondrial genome. The jaguarundi genome assembly and its annotation were compared in quality, variants, and features to the previously reported genome assemblies of puma and cheetah. Computational analyzes used in the study were implemented in transparent and reproducible way to allow their further reuse and modification.
- Published
- 2021
35. Insights into the Polyhydroxybutyrate Biosynthesis in Ralstonia solanacearum Using Parallel 13C Tracers and Comparative Genome Analysis
- Author
-
Nitin Patil, Shyam K. Masakapalli, and Poonam Jyoti
- Subjects
0301 basic medicine ,Ralstonia solanacearum ,biology ,010405 organic chemistry ,Chemistry ,General Medicine ,Genome project ,biology.organism_classification ,01 natural sciences ,Biochemistry ,Genome ,0104 chemical sciences ,Polyhydroxybutyrate ,03 medical and health sciences ,chemistry.chemical_compound ,030104 developmental biology ,Biosynthesis ,Proton NMR ,Fluorescence microscope ,Molecular Medicine ,Gene - Abstract
Bacterial accumulation of poly(3-hydroxybutyrate) [P(3HB)] is a metabolic strategy often adopted to cope with challenging surroundings. Ralstonia solanacearum, a phytopathogen, seems to be an ideal candidate with inherent ability to accumulate this biodegradable polymer of high industrial relevance. This study is focused on investigating the metabolic networks that channel glucose into P(3HB) using comparative genome analysis, 13C tracers, microscopy, gas chromatography-mass spectrometry (GC-MS), and proton nuclear magnetic resonance (1H NMR). Comparative genome annotation of 87 R. solanacearum strains confirmed the presence of a conserved P(3HB) biosynthetic pathway genes in the chromosome. Parallel 13C glucose feeding ([1-13C], [1,2-13C]) analysis mapped the glucose oxidation to 3-hydroxybutyrate (3HB), the metabolic precursor of P(3HB) via the Entner-Doudoroff pathway (ED pathway), potentially to meet the NADPH demands. Fluorescence microscopy, GC-MS, and 1H NMR analysis further confirmed the ability of R. solanacearum to accumulate P(3HB) granules. In addition, it is demonstrated that the carbon/nitrogen (C/N) ratio influences the P(3HB) yields, thereby highlighting the need to further optimize the bioprocessing parameters. This study provided key insights into the biosynthetic abilities of R. solanacearum as a promising P(3HB) producer.
- Published
- 2021
36. Phocaeicola faecalis sp. nov., a strictly anaerobic bacterial strain adapted to the human gut ecosystem
- Author
-
Leilei Yu, Sijia Li, Wei Chen, Chen Wang, Fengwei Tian, Zhendong Zhang, Qixiao Zhai, and Zhiming Yu
- Subjects
DNA, Bacterial ,0301 basic medicine ,030106 microbiology ,Biology ,Microbiology ,Genome ,03 medical and health sciences ,Phylogenetics ,RNA, Ribosomal, 16S ,Humans ,Anaerobiosis ,Molecular Biology ,Gene ,Ecosystem ,Phospholipids ,Phylogeny ,Genetics ,Base Composition ,Phylogenetic tree ,Strain (chemistry) ,Fatty Acids ,Nucleic Acid Hybridization ,Sequence Analysis, DNA ,General Medicine ,Genome project ,Ribosomal RNA ,biology.organism_classification ,Bacterial Typing Techniques ,030104 developmental biology ,Female ,Bacteria - Abstract
A novel strictly anaerobic, Gram-negative bacterium, designated as strain FXJYN30E22T, was isolated from the feces of a healthy woman in Yining county, Xinjiang province, China. This strain was non-spore-forming, bile-resistant, non-motile and rod-shaped. It was found to belong to a single separate group in the Phocaeicola genus based on its 16 S ribosomal RNA (rRNA) gene sequence. Alignments of 16 S rRNA gene sequences showed only a low sequence identity (≤ 95.5 %) between strain FXJYN30E22T and all other Phocaeicola strains in public data bases. The genome (43.0% GC) of strain FXJYN30E22T was sequenced, and used for phylogenetic analysis which showed that strain FXJYN30E22T was most closely related to the type strain Phocaeicola massiliensis JCM 13223T. The average nucleotide identity (ANI) value and digital DNA–DNA hybridization (dDDH) between FXJYN30E22T and P. massiliensis JCM 13223T were 90.4 and 41.9 %, which were lower than the generally accepted species boundaries (94.0 and 70 %, respectively). The major cellular fatty acids and polar lipids were anteiso-branched C15:0 and phosphatidylethanolamine, respectively. The result of genome annotation and KEGG analysis showed that strain FXJYN30E22T contains a number of genes in polysaccharide and fatty acid synthesis that indicated adaptation to the human gut system. Furthermore, a pbpE (penicillin-binding protein) gene was found in the genome of strain FXJYN30E22T but in no other Phocaeicola species, which suggested this gene might be contribute to the adaptive capacity of strain FXJYN30E22T. Based on our data, strain FXJYN30E22T (= CGMCC1.17870T/KCTC25195T) was classified as a novel Phocaeicola species, and the name Phocaeicola faecalis sp. nov., was proposed.
- Published
- 2021
37. Assembly and characterization of the genome of chard (Beta vulgaris ssp. vulgaris var. cicla)
- Author
-
Heinz Himmelbauer, Lisa Blazek, Juliane C. Dohm, André E. Minoche, and Reinhard Lehner
- Subjects
Crops, Agricultural ,0106 biological sciences ,0301 basic medicine ,Retroelements ,Sequence assembly ,Bioengineering ,Retrotransposon ,Genomics ,01 natural sciences ,Applied Microbiology and Biotechnology ,Genome ,03 medical and health sciences ,010608 biotechnology ,Botany ,Gene ,Synteny ,biology ,fungi ,food and beverages ,General Medicine ,Genome project ,biology.organism_classification ,030104 developmental biology ,Sugar beet ,Beta vulgaris ,Genome, Plant ,Biotechnology - Abstract
Chard (Beta vulgaris ssp. vulgaris var. cicla) is a member of one of four different cultigroups of beets. While the genome of sugar beet, the most prominent beet crop, has been studied extensively, molecular data on other beet cultivars is scant. Here, we present a genome assembly of chard, a vegetable crop grown for its fleshy leaves. We report a de novo genome assembly of 604 Mbp, slightly larger than sugar beet assemblies presented so far. About 57 % of the assembly was annotated as repetitive sequence, of which LTR retrotransposons were the most abundant. Based on the presence of conserved genes, the chard assembly was estimated to be at least 96 % complete regarding its gene space. We predicted 34,521 genes of which 27,582 genes were supported by evidence from transcriptomic sequencing reads, and 5503 of the evidence-supported genes had multiple isoforms. We compared the chard gene set with gene sets from sugar beet and two wild beets (i.e. Beta vulgaris ssp. maritima and Beta patula) to find orthology relationships and identified genome-wide syntenic regions between chard and sugar beet. Lastly, we determined genomic variants that distinguish sugar beet and chard. Assessing the variation distribution along the chard chromosomes, we found extensive haplotype sharing between the two cultivars. In summary, our work provides a foundation for the molecular analysis of Beta vulgaris cultigroups as a basis for chard genomics and to unravel the domestication history of beet crops.
- Published
- 2021
38. Genomic and phylogenetic analysis of a multidrug-resistant mcr-1-carrying Klebsiella pneumoniae recovered from a urinary tract infection in China
- Author
-
Zhi Ruan, Hangfei Chen, Qiuying Zha, Jianyong Wu, Qingyang Sun, Tian Jiang, and Shurui Jin
- Subjects
Microbiology (medical) ,Klebsiella pneumoniae ,Immunology ,Multidrug resistance ,mcr-1 ,Microbiology ,Genome ,Drug Resistance, Multiple, Bacterial ,medicine ,Humans ,Immunology and Allergy ,Phylogeny ,Genetics ,Whole genome sequencing ,Whole-genome sequencing ,Urinary tract infection ,biology ,Genomics ,Genome project ,biology.organism_classification ,QR1-502 ,Multiple drug resistance ,Urinary Tract Infections ,Colistin ,Multilocus sequence typing ,MCR-1 ,medicine.drug - Abstract
Objectives The emergence and dissemination of colistin-resistant Enterobacterales has become a major global public-health threat. Here we investigated the genomic and phylogenetic characteristics of a multidrug-resistant Klebsiella pneumoniae strain (KP4823) carrying the mcr-1 gene recovered from a urinary tract infection in China. Methods Antimicrobial susceptibility of K. pneumoniae KP4823 was determined by broth microdilution. Whole genomic DNA was extracted and sequenced using Oxford Nanopore MinION and Illumina NovaSeq 6000 platforms. Hybrid assembly with long and short reads was performed using Unicycler, and the genome was annotated using the NCBI Prokaryotic Genome Annotation Pipeline (PGAP). The sequence type (ST), capsular type, and antimicrobial resistance and virulence genes were identified from the genome sequence. Core genome multilocus sequence typing (cgMLST) analysis was performed by BacWGSTdb 2.0 server. Results Klebsiella pneumoniae KP4823 was resistant to colistin, ceftazidime, cefepime, cefotaxime, fosfomycin and aztreonam. The complete genome sequence of KP4823 consists of five contigs comprising 5 445 519 bp, including one chromosome and four plasmids. The isolate was assigned to ST101 with capsular serotype KL106. Several antimicrobial resistance genes were identified, including the colistin resistance gene mcr-1, which was located on a 34 685-bp IncX4 plasmid. The closest relative of K. pneumoniae KP4823 was another ST101 isolate (08EU827) recovered from Sweden in 2008, which differed by 191 cgMLST loci. Conclusion Our study reports the genome sequence of a multidrug-resistant mcr-1-carrying K. pneumoniae in China. These data may help to understand the antimicrobial resistance mechanisms, genomic features and transmission dynamics of colistin resistance in clinical settings.
- Published
- 2021
39. Gauging the trends of pseudogenes in plants
- Author
-
Kashmir Singh, Neetu Goyal, Shivalika Pathania, Jagdeep Kaur, and Naina Garewal
- Subjects
0106 biological sciences ,Regulation of gene expression ,0303 health sciences ,Genome ,Pseudogene ,General Medicine ,Genome project ,Biology ,Biological Evolution ,01 natural sciences ,Applied Microbiology and Biotechnology ,Mice ,03 medical and health sciences ,Order (biology) ,Gene Expression Regulation ,Evolutionary biology ,010608 biotechnology ,Animals ,Functional significance ,Identification (biology) ,Gene ,Pseudogenes ,030304 developmental biology ,Biotechnology - Abstract
Pseudogenes, the debilitated parts of ancient genes, were previously scrapped off as junk or discarded genes with no functional significance. Pseudogenes have come under scrutiny for their functionality, since recent studies have unveiled their importance in the regulation of their corresponding parent genes and various biological mechanisms. Despite the enormous occurrence of pseudogenes in plants, the lack of experimental validation has contributed toward their unresolved roles in gene regulation. Contrarily, most of the studies associated with gene regulation have been mainly reported for humans, mice, and other mammalian genomes. Consequently, in order to present a cumulative report on plant-based pseudogenes research, an attempt has been made to assemble multiple studies presenting the pseudogene classification, the prediction and the determination of comparative accuracies of various computational pipelines, and recent trends in analyzing their biological functions, and regulatory mechanisms. This review represents the classical, as well as the recent advances on pseudogene identification and their potential roles in transcriptional regulation, which could possibly invigorate the quality of genome annotation, evolutionary analysis, and complexity surrounding the regulatory pathways in plants. Thus, when the ambiguous boundary girdling the pseudogenes eventually recedes on account of their explicit orchestration role, research in flora would no longer saunter compared to that on fauna.
- Published
- 2021
40. Liftoff: accurate mapping of gene annotations
- Author
-
Steven L. Salzberg and Alaina Shumate
- Subjects
Statistics and Probability ,0303 health sciences ,Computer science ,Genome project ,Computational biology ,Gene Annotation ,Original Papers ,Biochemistry ,Genome ,DNA sequencing ,Sequence identity ,Computer Science Applications ,Chimpanzee genome project ,03 medical and health sciences ,Computational Mathematics ,Exon ,0302 clinical medicine ,Computational Theory and Mathematics ,Molecular Biology ,Gene ,030217 neurology & neurosurgery ,030304 developmental biology ,Reference genome - Abstract
Motivation Improvements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however, for most species, only the reference genome is well-annotated. Results One strategy to annotate new or improved genome assemblies is to map or ‘lift over’ the genes from a previously annotated reference genome. Here, we describe Liftoff, a new genome annotation lift-over tool capable of mapping genes between two assemblies of the same or closely related species. Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript and gene. We show that Liftoff can accurately map 99.9% of genes between two versions of the human reference genome with an average sequence identity >99.9%. We also show that Liftoff can map genes across species by successfully lifting over 98.3% of human protein-coding genes to a chimpanzee genome assembly with 98.2% sequence identity. Availability and implementation Liftoff can be installed via bioconda and PyPI. In addition, the source code for Liftoff is available at https://github.com/agshumate/Liftoff. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2021
41. Isolation, characterization, and genomic analysis of the novel T4-like bacteriophage ΦCJ20
- Author
-
Jong Pyo Chae, Na-Gyeong Lee, Sung-Sik Yoon, Gyeong-Hwuii Kim, Kim Jae Won, Jae-Gon Kim, and Jun-Ok Moon
- Subjects
0106 biological sciences ,Gel electrophoresis ,biology ,Myoviridae ,04 agricultural and veterinary sciences ,Genome project ,medicine.disease_cause ,biology.organism_classification ,040401 food science ,01 natural sciences ,Applied Microbiology and Biotechnology ,Genome ,Microbiology ,Bacteriophage ,0404 agricultural biotechnology ,Pathogenic Escherichia coli ,010608 biotechnology ,medicine ,Escherichia coli ,Genome size ,Research Article ,Food Science ,Biotechnology - Abstract
Pathogenic Escherichia coli infections have been consistently reported annually. The basic characteristics and genome of the newly isolated ΦCJ20 from swine feces was analyzed. To determine basic characteristics, dotting assays and double-layer agar assays were conducted. Bacteriophage particles were analyzed via transmission electron microscopy. Sodium dodecyl sulfate–polyacrylamide gel electrophoresis was performed to determine the sizes of major structural proteins. The complete genome of the phage was analyzed. Bacteriophage particles were identified as Myoviridae, with a head measuring 110.57 ± 1.89 nm and a contractile tail measuring 107.97 ± 3.20 nm and were found to infect E. coli. Major structural proteins of ΦCJ20 showed two well-pronounced bands of approximately 53.6 and 70.9 kDa. The genome size of ΦCJ20 was 169,884 bp, and 118 of 307 open reading frames were annotated. This study provides a baseline for the development of E. coli infection treatment strategies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10068-021-00906-y.
- Published
- 2021
42. The Genome Sequence of the Citrus Melanose Pathogen Diaporthe citri and Two Citrus-Related Diaporthe Species
- Author
-
Yunpeng Gai, Lei Li, Xiaoe Xiao, Yating Zeng, Hongye Li, Tao Xiong, Brendan K. Riely, and Pudong Li
- Subjects
0106 biological sciences ,Genetics ,Comparative genomics ,Whole genome sequencing ,0303 health sciences ,education.field_of_study ,biology ,Population ,Diaporthe citri ,Plant Science ,Genome project ,biology.organism_classification ,01 natural sciences ,Genome ,03 medical and health sciences ,Diaporthe ,education ,Agronomy and Crop Science ,Genome size ,030304 developmental biology ,010606 plant biology & botany - Abstract
Melanose disease is one the most widely distributed and economically important fungal diseases of citrus worldwide. The causative agent is the filamentous fungus Diaporthe citri (syn. Phomopsis citri). Here, we report the genome assemblies of three strains of D. citri, namely strains ZJUD2, ZJUD14, and Q7, which were generated using a combination of PacBio Sequel long-read and Illumina paired-end sequencing data. The assembled genomes of D. citri ranged from 52.06 to 63.61 Mb in genome size, containing 15,977 to 16,622 protein-coding genes. We also sequenced and annotated the genome sequences of two citrus-related Diaporthe species, D. citriasiana and D. citrichinensis. In addition, a database for citrus-related Diaporthe genomes was established to provide a public platform to access genome sequences, genome annotation and comparative genomics data of these Diaporthe species. The described genome sequences and the citrus-related Diaporthe genomes database provide a useful resource for the study of fungal biology, pathogen−host interaction, molecular diagnostic marker development, and population genomic analyses of Diaporthe species. The database will be updated regularly when the genomes of newly isolated Diaporthe species are sequenced. The citrus-related Diaporthe genomes database is freely available for nonprofit use at zjudata.com/blast/diaporthe.php .
- Published
- 2021
43. Genome‐wide analysis of European sea bass provides insights into the evolution and functions of single‐exon genes
- Author
-
Mbaye Tine, Richard Reinhardt, Heiner Kuhl, and Peter R. Teske
- Subjects
0106 biological sciences ,Nonsynonymous substitution ,comparative genomics ,Biology ,010603 evolutionary biology ,01 natural sciences ,Genome ,behavioral disciplines and activities ,Homology (biology) ,03 medical and health sciences ,Gene density ,evolution ,Dicentrarchus labrax ,European sea bass ,Gene ,Ecology, Evolution, Behavior and Systematics ,QH540-549.5 ,030304 developmental biology ,Nature and Landscape Conservation ,Original Research ,Comparative genomics ,0303 health sciences ,promoter ,Ecology ,Chromosome ,Genome project ,single‐exon gene ,nervous system ,Evolutionary biology ,psychological phenomena and processes - Abstract
Several studies have attempted to understand the origin and evolution of single‐exon genes (SEGs) in eukaryotic organisms, including fishes, but few have examined the functional and evolutionary relationships between SEGs and multiple‐exon gene (MEG) paralogs, in particular the conservation of promoter regions. Given that SEGs originate via the reverse transcription of mRNA from a “parental” MEGs, such comparisons may enable identifying evolutionarily‐related SEG/MEG paralogs, which might fulfill equivalent physiological functions. Here, the relationship of SEG proportion with MEG count, gene density, intron count, and chromosome size was assessed for the genome of the European sea bass, Dicentrarchus labrax. Then, SEGs with an MEG parent were identified, and promoter sequences of SEG/MEG paralogs were compared, to identify highly conserved functional motifs. The results revealed a total count of 1,585 (8.3% of total genes) SEGs in the European sea bass genome, which was correlated with MEG count but not with gene density. The significant correlation of SEG content with the number of MEGs suggests that SEGs were continuously and independently generated over evolutionary time following species divergence through retrotranscription events, followed by tandem duplications. Functional annotation showed that the majority of SEGs are functional, as is evident from their expression in RNA‐seq data used to support homology‐based genome annotation. Differences in 5′UTR and 3′UTR lengths between SEG/MEG paralogs observed in this study may contribute to gene expression divergence between them and therefore lead to the emergence of new SEG functions. The comparison of nonsynonymous to synonymous changes (Ka/Ks) between SEG/MEG parents showed that 74 of them are under positive selection (Ka/Ks > 1; p = .0447). An additional fifteen SEGs with an MEG parent have a common promoter, which implies that they are under the influence of common regulatory networks., This study investigated the relationship of SEG proportion with MEG count, gene density, intron count, and chromosome size for the genome of sea bass, Dicentrarchus labrax. Then, SEGs with an MEG parent were identified, and promoter sequences of SEG/MEG orthologs were compared, to identify highly conserved functional motifs. The results revealed a significant correlation between SEG and MEG counts over the genome and allowed identifying SEG/MEG orthologs that share the same promoter sequence, suggesting that they are under the influence of common regulatory networks.
- Published
- 2021
44. Uncovering transcriptional dark matter via gene annotation independent single-cell RNA sequencing analysis
- Author
-
Charles G. Danko, Gaetano J. Scuderi, Michael F. Z. Wang, Iwijn De Vlaminck, Madhav Mantri, David W. McKellar, Jonathan T. Butcher, and Shao-Pei Chou
- Subjects
0301 basic medicine ,Genetic Markers ,Cell type ,Transcription, Genetic ,Science ,Cell ,General Physics and Astronomy ,Computational biology ,Chick Embryo ,Biology ,Genome informatics ,Genome ,General Biochemistry, Genetics and Molecular Biology ,Article ,Transcriptome ,03 medical and health sciences ,Annotation ,0302 clinical medicine ,Gene expression ,medicine ,Animals ,Humans ,natural sciences ,RNA, Messenger ,Gene ,Transcriptional activity ,Multidisciplinary ,Sequence Analysis, RNA ,RNA ,Heart ,Molecular Sequence Annotation ,General Chemistry ,Genome project ,Gene Annotation ,030104 developmental biology ,medicine.anatomical_structure ,ComputingMethodologies_PATTERNRECOGNITION ,Sequence annotation ,Gene Expression Regulation ,030220 oncology & carcinogenesis ,Single-Cell Analysis - Abstract
Conventional scRNA-seq expression analyses rely on the availability of a high quality genome annotation. Yet, as we show here with scRNA-seq experiments and analyses spanning human, mouse, chicken, mole rat, lemur and sea urchin, genome annotations are often incomplete, in particular for organisms that are not routinely studied. To overcome this hurdle, we created a scRNA-seq analysis routine that recovers biologically relevant transcriptional activity beyond the scope of the best available genome annotation by performing scRNA-seq analysis on any region in the genome for which transcriptional products are detected. Our tool generates a single-cell expression matrix for all transcriptionally active regions (TARs), performs single-cell TAR expression analysis to identify biologically significant TARs, and then annotates TARs using gene homology analysis. This procedure uses single-cell expression analyses as a filter to direct annotation efforts to biologically significant transcripts and thereby uncovers biology to which scRNA-seq would otherwise be in the dark., Conventional single-cell RNA sequencing analysis rely on genome annotations that may be incomplete or inaccurate especially for understudied organisms. Here the authors present a bioinformatic tool that leverages single-cell data to uncover biologically relevant transcripts beyond the best available genome annotation.
- Published
- 2021
45. Chromosome-level de novo genome assemblies of over 100 plant species
- Author
-
Shirasawa, Kenta, Harada, Daijiro, Hirakawa, Hideki, Isobe, Sachiko, and Kole, Chittaranjan
- Subjects
genome project ,Invited Review ,long-read sequencing technology ,pseudomolecule sequence ,next-generation sequencing technology ,scaffolding technology - Abstract
Genome sequence analysis in higher plants began with the whole-genome sequencing of Arabidopsis thaliana. Owing to the great advances in sequencing technologies, also known as next-generation sequencing (NGS) technologies, genomes of more than 400 plant species have been sequenced to date. Long-read sequencing technologies, together with sequence scaffolding methods, have enabled the synthesis of chromosome-level de novo genome sequence assemblies, which has further allowed comparative analysis of the structural features of multiple plant genomes, thus elucidating the evolutionary history of plants. However, the quality of the assembled chromosome-level sequences varies among plant species. In this review, we summarize the status of chromosome-level assemblies of 114 plant species, with genome sizes ranging from 125 Mb to 16.9 Gb. While the average genome coverage of the assembled sequences reached up to 89.1%, the average coverage of chromosome-level pseudomolecules was 73.3%. Thus, further improvements in sequencing technologies and scaffolding, and data analysis methods, are required to establish gap-free telomere-to-telomere genome sequence assemblies. With the forthcoming new technologies, we are going to enter into a new genomics era where pan-genomics and the >1,000 or >1 million genomes’ project will be routine in higher plants.
- Published
- 2021
46. Identification, Structure Analyses and Expression Pattern of the ERF Transcription Factor Family in Coffea arabica
- Author
-
Luiz Filipe Protasio Pereira, Mondher Bouzayen, Valéria Carpentieri-Pípolo, Tiago Benedito dos Santos, Douglas Silva Domingues, Anne Bernadac, Silvia Graciele Hulse de Souza, and Giuliano Degrassi
- Subjects
Genetics ,biology ,Abiotic stress ,Coffea arabica ,In silico ,fungi ,food and beverages ,General Medicine ,General Chemistry ,Genome project ,biology.organism_classification ,Gene expression profiling ,Arabidopsis ,Gene ,Transcription factor - Abstract
Members of the ERF Family of Transcription Factors play an important role in plant development and gene expression that regulates responses to biotic and abiotic stress. This work identified 36 ERF family genes in Coffea arabica within the AP2/ERF full domain, using the EST-based genomic resource of the Brazilian Coffee Genome Project. The ERF family genes were classified into nine of the ten existing groups through phylogenetic analysis of the deduced amino acid sequences and comparison with the sequences of the ERF family genes in Arabidopsis. In addition to the AP2 domain, other conserved domains were identified, typical of members of each group. The in silico analysis and expression profiling showed high levels of expression for libraries derived from tissues of fruits, leaves and flowers as well as for libraries subjected to water stress. These results suggest the participation of the ERF family genes of C. arabica in distinct biological functions, such as control of development, maturation, and responses to water stress. The results of this work imply in the selection of promising genes for further functional characterizations that will provide a better understanding of the complex regulatory networks related to plant development and responses to stress, opening up opportunities for coffee breeding programs.
- Published
- 2021
47. Phylogenic position and marker studies using cpDNA of C. wightii: A critically endangered and medicinally important plant in India
- Author
-
Vrinda S. Thaker and Jagdishchandra Monpara
- Subjects
0106 biological sciences ,0301 basic medicine ,Inverted repeat ,food and beverages ,Plant Science ,Genome project ,Biology ,01 natural sciences ,Genome ,DNA sequencing ,Nucleotide diversity ,03 medical and health sciences ,030104 developmental biology ,Chloroplast DNA ,Evolutionary biology ,Microsatellite ,Gene ,010606 plant biology & botany - Abstract
Complete chloroplast genome of Commiphora wightii (family Burseraceae), a medicinally important-critically endangered plant, was assembled using next generation sequencing. Genome annotation was conducted using CpGAVAS web server. The genome is 156,064 bp in length, presenting a typical quadripartite structure of large (LSC; 93,841 bp) and small (SSC; 19,897 bp) single-copy regions separated by a pair of inverted repeats (IRs; 21,163 bp). The genome encodes 125 genes including 86 protein genes, 29 tRNAs, and eight rRNAs. Total 16 simple sequence repeats (SSR), detected in the plastome include the di, tri, tetra, and pentanucleotide repeat. When compared with the plastome of other members of Burseraceae it showed one unique SSR. Total twenty-two hypervariable regions of loci were found in the genome, which could serve as DNA barcodes for species identification. The newly sequenced complete cp genome identified SSRs, nucleotide diversity, phylogenetic analysis, and IR contraction will help in understanding the plastome evolution and genetic conservation of this critically endangered medicinal plant in the future.
- Published
- 2021
48. Long‐range assembly of sequences helps to unravel the genome structure and small variation of the wheat– Haynaldia villosa translocated chromosome 6VS.6AL
- Author
-
Chen Peidu, Qiang Wang, Liu Jiaqian, Liping Xing, Jan Bartoš, Cao Shuqi, Aizhong Cao, Chunhong Yin, Jan Vrána, Jaroslav Doležel, Zengshuai Lv, Miroslava Karafiátová, Zhenpu Huang, Ruiqi Zhang, and Lu Yuan
- Subjects
0106 biological sciences ,0301 basic medicine ,genome annotation ,InDel markers ,Gene prediction ,Genomics ,Chromosomal translocation ,Plant Science ,Computational biology ,Biology ,Poaceae ,physical bin map ,01 natural sciences ,Chromosomes, Plant ,Translocation, Genetic ,03 medical and health sciences ,Centromere ,Indel ,Gene ,Triticum ,Research Articles ,Chromosome ,Genome project ,Chicago long‐range linkage assembly ,Plant Breeding ,030104 developmental biology ,wheat–Haynaldia villosa translocation line T6VS·6AL ,Agronomy and Crop Science ,Research Article ,010606 plant biology & botany ,Biotechnology - Abstract
Summary Genomics studies in wild species of wheat have been limited due to the lack of references; however, new technologies and bioinformatics tools have much potential to promote genomic research. The wheat–Haynaldia villosa translocation line T6VS·6AL has been widely used as a backbone parent of wheat breeding in China. Therefore, revealing the genome structure of translocation chromosome 6VS·6AL will clarify how this chromosome formed and will help to determine how it affects agronomic traits. In this study, chromosome flow sorting, NGS sequencing and Chicago long‐range linkage assembly were innovatively used to produce the assembled sequences of 6VS·6AL, and gene prediction and genome structure characterization at the molecular level were effectively performed. The analysis discovered that the short arm of 6VS·6AL was actually composed of a large distal segment of 6VS, a small proximal segment of 6AS and the centromere of 6A, while the collinear region in 6VS corresponding to 230–260 Mb of 6AS‐Ta was deleted when the recombination between 6VS and 6AS occurred. In addition to the molecular mechanism of the increased grain weight and enhanced spike length produced by the translocation chromosome, it may be correlated with missing GW2‐V and an evolved NRT‐V cluster. Moreover, a fine physical bin map of 6VS was constructed by the high‐throughput developed 6VS‐specific InDel markers and a series of newly identified small fragment translocation lines involving 6VS. This study will provide essential information for mining of new alien genes carried by the 6VS·6AL translocation chromosome.
- Published
- 2021
49. A Hybrid Supervised Approach to Human Population Identification Using Genomics Data
- Author
-
Sahar Araghi and Thanh Nguyen
- Subjects
0206 medical engineering ,Population ,Single-nucleotide polymorphism ,Genomics ,02 engineering and technology ,Computational biology ,Polymorphism, Single Nucleotide ,Multiclass classification ,Statistics::Machine Learning ,Lasso (statistics) ,Genetics ,Humans ,Statistics::Methodology ,education ,Principal Component Analysis ,education.field_of_study ,business.industry ,Applied Mathematics ,Genome project ,Quantitative Biology::Genomics ,Genetics, Population ,Principal component analysis ,Supervised Machine Learning ,Personalized medicine ,business ,020602 bioinformatics ,Biotechnology - Abstract
Single nucleotide polymorphisms (SNPs) are one type of genetic variations and each SNP represents a difference in a single DNA building block, namely a nucleotide. Previous research demonstrated that SNPs can be used to identify the correct source population of an individual. In addition, variations in the DNA sequences have an influence on human diseases. In this regard, SNPs studies are helpful for personalized medicine and treatment. In the literature, unsupervised clustering methods especially principal component analysis (PCA) have been popular for studying population structure. In this study, we investigate supervised approaches, particularly the LASSO multinomial regression classification method, for recognizing individuals' origin genetic population. Then, we introduce PCA-LASSO as an extension of LASSO method that benefits from advantageous characteristics of both PCA and LASSO regression. The experimental results obtained on the 1,000 genome project dataset show PCA-LASSO's significantly high accuracy in prediction of individual's origin population.
- Published
- 2021
50. FindNonCoding: rapid and simple detection of non-coding RNAs in genomes
- Author
-
Erik S. Wright
- Subjects
Statistics and Probability ,False discovery rate ,Computer science ,RNA ,Genome project ,Computational biology ,Applications Notes ,Biochemistry ,Genome ,Computer Science Applications ,Bioconductor ,Computational Mathematics ,Computational Theory and Mathematics ,DECIPHER ,Sequence motif ,Molecular Biology ,Gene - Abstract
Summary Non-coding RNAs are often neglected during genome annotation due to their difficulty of detection relative to protein coding genes. FindNonCoding takes a pattern mining approach to capture the essential sequence motifs and hairpin loops representing a non-coding RNA family and quickly identify matches in genomes. FindNonCoding was designed for ease of use and accurately finds non-coding RNAs with a low false discovery rate. Availability and implementation FindNonCoding is implemented within the DECIPHER package (v2.19.3) for R (v4.1) available from Bioconductor. Pre-trained models of common non-coding RNA families are included for bacteria, archaea and eukarya. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2021
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.