81 results on '"Thibaud-Nissen F"'
Search Results
2. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research
- Author
-
Morales, J, Pujar, S, Loveland, JE, Astashyn, A, Bennett, R, Berry, A, Cox, E, Davidson, C, Ermolaeva, O, Farrell, CM, Fatima, R, Gil, L, Goldfarb, T, Gonzalez, JM, Haddad, D, Hardy, M, Hunt, T, Jackson, J, Joardar, VS, Kay, M, Kodali, VK, McGarvey, KM, McMahon, A, Mudge, JM, Murphy, DN, Murphy, MR, Rajput, B, Rangwala, SH, Riddick, LD, Thibaud-Nissen, F, Threadgold, G, Vatsan, AR, Wallin, C, Webb, D, Flicek, P, Birney, E, Pruitt, KD, Frankish, A, Cunningham, F, Murphy, TD, Morales, J, Pujar, S, Loveland, JE, Astashyn, A, Bennett, R, Berry, A, Cox, E, Davidson, C, Ermolaeva, O, Farrell, CM, Fatima, R, Gil, L, Goldfarb, T, Gonzalez, JM, Haddad, D, Hardy, M, Hunt, T, Jackson, J, Joardar, VS, Kay, M, Kodali, VK, McGarvey, KM, McMahon, A, Mudge, JM, Murphy, DN, Murphy, MR, Rajput, B, Rangwala, SH, Riddick, LD, Thibaud-Nissen, F, Threadgold, G, Vatsan, AR, Wallin, C, Webb, D, Flicek, P, Birney, E, Pruitt, KD, Frankish, A, Cunningham, F, and Murphy, TD
- Abstract
Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.
- Published
- 2022
3. Semi-automated assembly of high-quality diploid human reference genomes
- Author
-
Jarvis, E.D., Formenti, G., Rhie, A., Guarracino, A., Yang, C., Wood, J., Tracey, A., Thibaud-Nissen, F., Vollger, M.R., Porubsky, D., Cheng, H., Asri, M., Logsdon, G.A., Carnevali, P., Chaisson, M.J.P., Chin, C.S., Cody, S., Collins, J., Ebert, P., Escalona, M., Fedrigo, O., Fulton, R.S., Fulton, L.L., Garg, S., Gerton, J.L., Ghurye, J., Granat, A., Green, R.E., Harvey, W., Hasenfeld, P., Hastie, A., Haukness, M., Jaeger, E.B., Jain, M., Kirsche, M., Kolmogorov, M., Korbel, J.O., Koren, S., Korlach, J., Lee, J., Li, D., Lindsay, T., Lucas, J., Luo, F., Marschall, T., Mitchell, M.W., McDaniel, J., Nie, F., Olsen, H.E., Olson, N.D., Pesout, T., Potapova, T., Puiu, D., Regier, A., Ruan, J., Salzberg, S.L., Sanders, A.D., Schatz, M.C., Schmitt, A., Schneider, V.A., Selvaraj, S., Shafin, K., Shumate, A., Stitziel, N.O., Stober, C., Torrance, J., Wagner, J., Wang, J., Wenger, A., Xiao, C., Zimin, A.V., Zhang, G., Wang, T., Li, H., Garrison, E., Haussler, D., Hall, I., Zook, J.M., Eichler, E.E., Phillippy, A.M., Paten, B., Howe, K., and Miga, K.H.
- Subjects
Cancer Research ,Haplotypes ,Genome, Human ,Humans ,Chromosome Mapping ,High-Throughput Nucleotide Sequencing ,Chromosomes, Human ,Genetic Variation ,Sequence Analysis, DNA ,Genomics ,Reference Standards ,Diploidy - Abstract
The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted societysup1,2/sup. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individualssup3,4/sup. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genomesup5/sup. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversitysup6/sup. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
- Published
- 2021
4. Population genomics of the critically endangered kākāpō.
- Author
-
Dussex, N, van der Valk, T, Morales, HE, Wheat, CW, Díez-Del-Molino, D, von Seth, J, Foster, Y, Kutschera, VE, Guschanski, K, Rhie, A, Phillippy, AM, Korlach, J, Howe, K, Chow, W, Pelan, S, Mendes Damas, JD, Lewin, HA, Hastie, AR, Formenti, G, Fedrigo, O, Guhlin, J, Harrop, TWR, Le Lec, MF, Dearden, PK, Haggerty, L, Martin, FJ, Kodali, V, Thibaud-Nissen, F, Iorns, D, Knapp, M, Gemmell, NJ, Robertson, F, Moorhouse, R, Digby, A, Eason, D, Vercoe, D, Howard, J, Jarvis, ED, Robertson, BC, Dalén, L, Dussex, N, van der Valk, T, Morales, HE, Wheat, CW, Díez-Del-Molino, D, von Seth, J, Foster, Y, Kutschera, VE, Guschanski, K, Rhie, A, Phillippy, AM, Korlach, J, Howe, K, Chow, W, Pelan, S, Mendes Damas, JD, Lewin, HA, Hastie, AR, Formenti, G, Fedrigo, O, Guhlin, J, Harrop, TWR, Le Lec, MF, Dearden, PK, Haggerty, L, Martin, FJ, Kodali, V, Thibaud-Nissen, F, Iorns, D, Knapp, M, Gemmell, NJ, Robertson, F, Moorhouse, R, Digby, A, Eason, D, Vercoe, D, Howard, J, Jarvis, ED, Robertson, BC, and Dalén, L
- Abstract
The kākāpō is a flightless parrot endemic to New Zealand. Once common in the archipelago, only 201 individuals remain today, most of them descending from an isolated island population. We report the first genome-wide analyses of the species, including a high-quality genome assembly for kākāpō, one of the first chromosome-level reference genomes sequenced by the Vertebrate Genomes Project (VGP). We also sequenced and analyzed 35 modern genomes from the sole surviving island population and 14 genomes from the extinct mainland population. While theory suggests that such a small population is likely to have accumulated deleterious mutations through genetic drift, our analyses on the impact of the long-term small population size in kākāpō indicate that present-day island kākāpō have a reduced number of harmful mutations compared to mainland individuals. We hypothesize that this reduced mutational load is due to the island population having been subjected to a combination of genetic drift and purging of deleterious mutations, through increased inbreeding and purifying selection, since its isolation from the mainland ∼10,000 years ago. Our results provide evidence that small populations can survive even when isolated for hundreds of generations. This work provides key insights into kākāpō breeding and recovery and more generally into the application of genetic tools in conservation efforts for endangered species.
- Published
- 2021
5. Improved reference genome of the arboviral vector Aedes albopictus
- Author
-
Palatini, U., primary, Masri, R.A., additional, Cosme, L.V., additional, Koren, S., additional, Thibaud-Nissen, F., additional, Biedler, J.K., additional, Krsticevic, F., additional, Johnston, J.S., additional, Halbach, R., additional, Crawford, J.E., additional, Antoshechkin, I., additional, Failloux, A., additional, Pischedda, E., additional, Marconcini, M., additional, Ghurye, J., additional, Rhie, A., additional, Sharma, A., additional, Karagodin, D.A., additional, Jenrette, J., additional, Gamez, S., additional, Miesen, P., additional, Caccone, A., additional, Sharakhova, M.V., additional, Tu, Z., additional, Papathanos, P.A., additional, Van Rij, R.P., additional, Akbari, O. S., additional, Powell, J., additional, Phillippy, A. M., additional, and M., Bonizzoni, additional
- Published
- 2020
- Full Text
- View/download PDF
6. A liquid-medium-based protocol for rapid regeneration from embryogenic soybean cultures
- Author
-
Samoylov, V. M., Tucker, D. M., Thibaud-Nissen, F., and Parrott, W. A.
- Published
- 1998
- Full Text
- View/download PDF
7. P8008 The NCBI Eukaryotic Genome Annotation Pipeline
- Author
-
Thibaud-Nissen, F., primary, DiCuccio, M., additional, Hlavina, W., additional, Kimchi, A., additional, Kitts, P. A., additional, Murphy, T. D., additional, Pruitt, K. D., additional, and Souvorov, A., additional
- Published
- 2016
- Full Text
- View/download PDF
8. P8007 RefSeq and Gene—NCBI resources to support comparative genomics
- Author
-
Pruitt, K. D., primary, Murphy, T. D., additional, Thibaud-Nissen, F., additional, and Kitts, P. A., additional
- Published
- 2016
- Full Text
- View/download PDF
9. Genome sequence of the pea aphid Acyrthosiphon pisum
- Author
-
Richards, S, Gibbs, RA, Gerardo, NM, Moran, N, Nakabachi, A, Stern, D, Tagu, D, Wilson, ACC, Muzny, D, Kovar, C, Cree, A, Chacko, J, Chandrabose, MN, Dao, MD, Dinh, HH, Gabisi, RA, Hines, S, Hume, J, Jhangian, SN, Joshi, V, Lewis, LR, Liu, Y-S, Lopez, J, Morgan, MB, Nguyen, NB, Okwuonu, GO, Ruiz, SJ, Santibanez, J, Wright, RA, Fowler, GR, Hitchens, ME, Lozado, RJ, Moen, C, Steffen, D, Warren, JT, Zhang, J, Nazareth, LV, Chavez, D, Davis, C, Lee, SL, Patel, BM, Pu, L-L, Bell, SN, Johnson, AJ, Vattathil, S, Jr, WRL, Shigenobu, S, Dang, PM, Morioka, M, Fukatsu, T, Kudo, T, Miyagishima, S-Y, Jiang, H, Worley, KC, Legeai, F, Gauthier, J-P, Collin, O, Zhang, L, Chen, H-C, Ermolaeva, O, Hlavina, W, Kapustin, Y, Kiryutin, B, Kitts, P, Maglott, D, Murphy, T, Pruitt, K, Sapojnikov, V, Souvorov, A, Thibaud-Nissen, F, Camara, F, Guigo, R, Stanke, M, Solovyev, V, Kosarev, P, Gilbert, D, Gabaldon, T, Huerta-Cepas, J, Marcet-Houben, M, Pignatelli, M, Moya, A, Rispe, C, Ollivier, M, Quesneville, H, Permal, E, Llorens, C, Futami, R, Hedges, D, Robertson, HM, Alioto, T, Mariotti, M, Nikoh, N, McCutcheon, JP, Burke, G, Kamins, A, Latorre, A, Moran, NA, Ashton, P, Calevro, F, Charles, H, Colella, S, Douglas, A, Jander, G, Jones, DH, Febvay, G, Kamphuis, LG, Kushlan, PF, Macdonald, S, Ramsey, J, Schwartz, J, Seah, S, Thomas, G, Vellozo, A, Cass, B, Degnan, P, Hurwitz, B, Leonardo, T, Koga, R, Altincicek, B, Anselme, C, Atamian, H, Barribeau, SM, de Vos, M, Duncan, EJ, Evans, J, Ghanim, M, Heddi, A, Kaloshian, I, Vincent-Monegat, C, Parker, BJ, Perez-Brocal, V, Rahbe, Y, Spragg, CJ, Tamames, J, Tamarit, D, Tamborindeguy, C, Vilcinskas, A, Bickel, RD, Brisson, JA, Butts, T, Chang, C-C, Christiaens, O, Davis, GK, Duncan, E, Ferrier, D, Iga, M, Janssen, R, Lu, H-L, McGregor, A, Miura, T, Smagghe, G, Smith, J, van der Zee, M, Velarde, R, Wilson, M, Dearden, P, Edwards, OR, Gordon, K, Hilgarth, RS, Jr, RSD, Srinivasan, D, Walsh, TK, Ishikawa, A, Jaubert-Possamai, S, Fenton, B, Huang, W, Rizk, G, Lavenier, D, Nicolas, J, Smadja, C, Zhou, J-J, Vieira, FG, He, X-L, Liu, R, Rozas, J, Field, LM, Ashton, PD, Campbell, P, Carolan, JC, Douglas, AE, Fitzroy, CIJ, Reardon, KT, Reeck, GR, Singh, K, Wilkinson, TL, Huybrechts, J, Abdel-latief, M, Robichon, A, Veenstra, JA, Hauser, F, Cazzamali, G, Schneider, M, Williamson, M, Stafflinger, E, Hansen, KK, Grimmelikhuijzen, CJP, Price, DRG, Caillaud, M, van Fleet, E, Ren, Q, Gatehouse, JA, Brault, V, Monsion, B, Diaz, J, Hunnicutt, L, Ju, H-J, Pechuan, X, Aguilar, J, Cortes, T, Ortiz-Rivas, B, Martinez-Torres, D, Dombrovsky, A, Dale, RP, Davies, TGE, Williamson, MS, Jones, A, Sattelle, D, Williamson, S, Wolstenholme, A, Cottret, L, Sagot, MF, Heckel, DG, Hunter, W, Consortium, IAG, Universitat de Barcelona, Princeton University, Biologie des organismes et des populations appliquées à la protection des plantes (BIO3P), Institut National de la Recherche Agronomique (INRA)-Université de Rennes (UR)-AGROCAMPUS OUEST, Biologie Fonctionnelle, Insectes et Interactions (BF2I), Institut National de la Recherche Agronomique (INRA)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA), Baylor College of Medicine (BCM), Baylor University, An algorithmic view on genomes, cells, and environments (BAMBOO), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS), IAGC, Institut National de la Recherche Agronomique (INRA)-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-AGROCAMPUS OUEST, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro), Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon, Eisen, Jonathan A., and Eisen, Jonathan A
- Subjects
0106 biological sciences ,TANDEM REPEATS ,Genome, Insect ,Gene Transfer ,RRES175 ,Sequència genòmica ,Faculty of Science\Computer Science ,CPG METHYLATION ,01 natural sciences ,Genome ,Medical and Health Sciences ,International Aphid Genomics Consortium ,Biologiska vetenskaper ,Biology (General) ,GENE-EXPRESSION ,2. Zero hunger ,Genetics ,0303 health sciences ,Aphid ,Afídids ,General Neuroscience ,GENOME SEQUENCE ,food and beverages ,DROSOPHILA CIRCADIAN CLOCK ,Biological Sciences ,Genetics and Genomics/Microbial Evolution and Genomics ,INSECTE ,Genètica microbiana ,puceron ,APIS-MELLIFERA ,General Agricultural and Biological Sciences ,Infection ,symbiose ,Biotechnology ,Research Article ,VIRUS VECTORING ,175_Genetics ,SYMBIOTIC BACTERIA ,Gene Transfer, Horizontal ,QH301-705.5 ,ACYRTHOSIPHON PISUM ,Biology ,HOLOMETABOLOUS INSECTS ,HOST-PLANT ,010603 evolutionary biology ,PEA APHID ,INSECT-PLANT ,PHENOTYPIC PLASTICITY ,RAVAGEUR DES CULTURES ,SOCIAL INSECT ,General Biochemistry, Genetics and Molecular Biology ,Horizontal ,03 medical and health sciences ,Buchnera ,Gene family ,Life Science ,Animals ,Symbiosis ,Gene ,030304 developmental biology ,Whole genome sequencing ,General Immunology and Microbiology ,Annotation ,Genome sequence ,Agricultural and Veterinary Sciences ,175_Entomology ,Genètica animal ,Bacteriocyte ,génome ,gène ,Human Genome ,Biology and Life Sciences ,15. Life on land ,biochemical phenomena, metabolism, and nutrition ,biology.organism_classification ,REPETITIVE ELEMENTS ,DNA-SEQUENCES ,Acyrthosiphon pisum ,Genome Sequence ,Genetics and Genomics/Genome Projects ,Aphids ,PHEROMONE-BINDING ,Insect ,Developmental Biology ,[SDV.EE.IEO]Life Sciences [q-bio]/Ecology, environment/Symbiosis - Abstract
The genome of the pea aphid shows remarkable levels of gene duplication and equally remarkable gene absences that shed light on aspects of aphid biology, most especially its symbiosis with Buchnera., Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems., Author Summary Aphids are common pests of crops and ornamental plants. Facilitated by their ancient association with intracellular symbiotic bacteria that synthesize essential amino acids, aphids feed on phloem (sap). Exploitation of a diversity of long-lived woody and short-lived herbaceous hosts by many aphid species is a result of specializations that allow aphids to discover and exploit suitable host plants. Such specializations include production by a single genotype of multiple alternative phenotypes including asexual, sexual, winged, and unwinged forms. We have generated a draft genome sequence of the pea aphid, an aphid that is a model for the study of symbiosis, development, and host plant specialization. Some of the many highlights of our genome analysis include an expanded total gene set with remarkable levels of gene duplication, as well as aphid-lineage-specific gene losses. We find that the pea aphid genome contains all genes required for epigenetic regulation by methylation, that genes encoding the synthesis of a number of essential amino acids are distributed between the genomes of the pea aphid and its symbiont, Buchnera aphidicola, and that many genes encoding immune system components are absent. These genome data will form the basis for future aphid research and have already underpinned a variety of genome-wide approaches to understanding aphid biology.
- Published
- 2010
10. Genome of the house fly, Musca domestica L., a global vector of diseases with adaptations to a septic environment
- Author
-
Scott, JG, Warren, WC, Beukeboom, LW, Bopp, D, Clark, AG, Giers, SD, Hediger, M, Jones, A, Kasai, S, Leichter, CA, Li, M, Meisel, RP, Minx, P, Murphy, TD, Nelson, DR, Reid, WR, Rinkevich, FD, Robertson, HM, Sackton, TB, Sattelle, DB, Thibaud-Nissen, F, Tomlinson, C, van de Zande, L, Walden, KK, Wilson, RK, Liu, N, Scott, JG, Warren, WC, Beukeboom, LW, Bopp, D, Clark, AG, Giers, SD, Hediger, M, Jones, A, Kasai, S, Leichter, CA, Li, M, Meisel, RP, Minx, P, Murphy, TD, Nelson, DR, Reid, WR, Rinkevich, FD, Robertson, HM, Sackton, TB, Sattelle, DB, Thibaud-Nissen, F, Tomlinson, C, van de Zande, L, Walden, KK, Wilson, RK, and Liu, N
- Abstract
BACKGROUND: Adult house flies, Musca domestica L., are mechanical vectors of more than 100 devastating diseases that have severe consequences for human and animal health. House fly larvae play a vital role as decomposers of animal wastes, and thus live in intimate association with many animal pathogens. RESULTS: We have sequenced and analyzed the genome of the house fly using DNA from female flies. The sequenced genome is 691 Mb. Compared with Drosophila melanogaster, the genome contains a rich resource of shared and novel protein coding genes, a significantly higher amount of repetitive elements, and substantial increases in copy number and diversity of both the recognition and effector components of the immune system, consistent with life in a pathogen-rich environment. There are 146 P450 genes, plus 11 pseudogenes, in M. domestica, representing a significant increase relative to D. melanogaster and suggesting the presence of enhanced detoxification in house flies. Relative to D. melanogaster, M. domestica has also evolved an expanded repertoire of chemoreceptors and odorant binding proteins, many associated with gustation. CONCLUSIONS: This represents the first genome sequence of an insect that lives in intimate association with abundant animal pathogens. The house fly genome provides a rich resource for enabling work on innovative methods of insect control, for understanding the mechanisms of insecticide resistance, genetic adaptation to high pathogen loads, and for exploring the basic biology of this important pest. The genome of this species will also serve as a close out-group to Drosophila in comparative genomic studies.
- Published
- 2014
11. Functional analysis of a TGA factor-binding site located in the promoter region controlling salicylic acid-induced NIMIN-1 expression in Arabidopsis
- Author
-
Fonseca, J.P., primary, Menossi, M., additional, Thibaud-Nissen, F., additional, and Town, C.D., additional
- Published
- 2010
- Full Text
- View/download PDF
12. The TIGR Rice Genome Annotation Resource: improvements and new features
- Author
-
Ouyang, S., primary, Zhu, W., additional, Hamilton, J., additional, Lin, H., additional, Campbell, M., additional, Childs, K., additional, Thibaud-Nissen, F., additional, Malek, R. L., additional, Lee, Y., additional, Zheng, L., additional, Orvis, J., additional, Haas, B., additional, Wortman, J., additional, and Buell, C. R., additional
- Published
- 2007
- Full Text
- View/download PDF
13. Increased sulfur amino acids in soybean plants overexpressing the maize 15 kDa zein protein
- Author
-
Dinkins, R. D., Srinivasa Reddy, M. S., Meurer, C. A., Yan, B., Trick, H., Thibaud-Nissen, F., John Finer, Parrott, W. A., and Collins, G. B.
14. Identification and characterization of pseudogenes in the rice gene complement
- Author
-
Thibaud-Nissen Françoise, Ouyang Shu, and Buell C Robin
- Subjects
Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background The Osa1 Genome Annotation of rice (Oryza sativa L. ssp. japonica cv. Nipponbare) is the product of a semi-automated pipeline that does not explicitly predict pseudogenes. As such, it is likely to mis-annotate pseudogenes as functional genes. A total of 22,033 gene models within the Osa1 Release 5 were investigated as potential pseudogenes as these genes exhibit at least one feature potentially indicative of pseudogenes: lack of transcript support, short coding region, long untranslated region, or, for genes residing within a segmentally duplicated region, lack of a paralog or significantly shorter corresponding paralog. Results A total of 1,439 pseudogenes, identified among genes with pseudogene features, were characterized by similarity to fully-supported gene models and the presence of frameshifts or premature translational stop codons. Significant difference in the length of duplicated genes within segmentally-duplicated regions was the optimal indicator of pseudogenization. Among the 816 pseudogenes for which a probable origin could be determined, 75% originated from gene duplication events while 25% were the result of retrotransposition events. A total of 12% of the pseudogenes were expressed. Finally, F-box proteins, BTB/POZ proteins, terpene synthases, chalcone synthases and cytochrome P450 protein families were found to harbor large numbers of pseudogenes. Conclusion These pseudogenes still have a detectable open reading frame and are thus distinct from pseudogenes detected within intergenic regions which typically lack definable open reading frames. Families containing the highest number of pseudogenes are fast-evolving families involved in ubiquitination and secondary metabolism.
- Published
- 2009
- Full Text
- View/download PDF
15. EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome
- Author
-
Hamilton John P, Campbell Matthew, Thibaud-Nissen Françoise, Zhu Wei, and Buell C
- Subjects
Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation) is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1) the submission of gene annotation to an annotation project, 2) the review of the submitted models by project annotators, and 3) the incorporation of the submitted models in the ongoing annotation effort. Results We have developed the Eukaryotic Community Annotation Package (EuCAP), an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA) has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs) in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website http://rice.tigr.org, as well as in the Community Annotation track of the Genome Browser. Conclusion We have applied EuCAP to rice. As of July 2007, the structural and/or functional annotation of 1,094 genes representing 57 families have been deposited and integrated into the current gene set. All of the EuCAP components are open-source, thereby allowing the implementation of EuCAP for the annotation of other genomes. EuCAP is available at http://sourceforge.net/projects/eucap/.
- Published
- 2007
- Full Text
- View/download PDF
16. Microarrays for global expression constructed with a low redundancy set of 27,500 sequenced cDNAs representing an array of developmental stages and physiological conditions of the soybean plant
- Author
-
Retzel Ernest, Schmidt Christina, Shoop Elizabeth, Strömvik Martina V, Sidarous Mark, Thibaud-Nissen Françoise, Zabala Gracia, Philip Reena, Gonzalez Delkin, Clough Steven J, Shealy Robin, Khanna Anupama, Vodkin Lila O, Erpelding John, Shoemaker Randy C, Rodriguez-Huete Alicia M, Polacco Joseph C, Coryell Virginia, Keim Paul, Gong George, Liu Lei, Pardinas Jose, and Schweitzer Peter
- Subjects
Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Microarrays are an important tool with which to examine coordinated gene expression. Soybean (Glycine max) is one of the most economically valuable crop species in the world food supply. In order to accelerate both gene discovery as well as hypothesis-driven research in soybean, global expression resources needed to be developed. The applications of microarray for determining patterns of expression in different tissues or during conditional treatments by dual labeling of the mRNAs are unlimited. In addition, discovery of the molecular basis of traits through examination of naturally occurring variation in hundreds of mutant lines could be enhanced by the construction and use of soybean cDNA microarrays. Results We report the construction and analysis of a low redundancy 'unigene' set of 27,513 clones that represent a variety of soybean cDNA libraries made from a wide array of source tissue and organ systems, developmental stages, and stress or pathogen-challenged plants. The set was assembled from the 5' sequence data of the cDNA clones using cluster analysis programs. The selected clones were then physically reracked and sequenced at the 3' end. In order to increase gene discovery from immature cotyledon libraries that contain abundant mRNAs representing storage protein gene families, we utilized a high density filter normalization approach to preferentially select more weakly expressed cDNAs. All 27,513 cDNA inserts were amplified by polymerase chain reaction. The amplified products, along with some repetitively spotted control or 'choice' clones, were used to produce three 9,728-element microarrays that have been used to examine tissue specific gene expression and global expression in mutant isolines. Conclusions Global expression studies will be greatly aided by the availability of the sequence-validated and low redundancy cDNA sets described in this report. These cDNAs and ESTs represent a wide array of developmental stages and physiological conditions of the soybean plant. We also demonstrate that the quality of the data from the soybean cDNA microarrays is sufficiently reliable to examine isogenic lines that differ with respect to a mutant phenotype and thereby to define a small list of candidate genes potentially encoding or modulated by the mutant phenotype.
- Published
- 2004
- Full Text
- View/download PDF
17. InterPro: the protein sequence classification resource in 2025.
- Author
-
Blum M, Andreeva A, Florentino LC, Chuguransky SR, Grego T, Hobbs E, Pinto BL, Orr A, Paysan-Lafosse T, Ponamareva I, Salazar GA, Bordin N, Bork P, Bridge A, Colwell L, Gough J, Haft DH, Letunic I, Llinares-López F, Marchler-Bauer A, Meng-Papaxanthos L, Mi H, Natale DA, Orengo CA, Pandurangan AP, Piovesan D, Rivoire C, Sigrist CJA, Thanki N, Thibaud-Nissen F, Thomas PD, Tosatto SCE, Wu CH, and Bateman A
- Abstract
InterPro (https://www.ebi.ac.uk/interpro) is a freely accessible resource for the classification of protein sequences into families. It integrates predictive models, known as signatures, from multiple member databases to classify sequences into families and predict the presence of domains and significant sites. The InterPro database provides annotations for over 200 million sequences, ensuring extensive coverage of UniProtKB, the standard repository of protein sequences, and includes mappings to several other major resources, such as Gene Ontology (GO), Protein Data Bank in Europe (PDBe) and the AlphaFold Protein Structure Database. In this publication, we report on the status of InterPro (version 101.0), detailing new developments in the database, associated web interface and software. Notable updates include the increased integration of structures predicted by AlphaFold and the enhanced description of protein families using artificial intelligence. Over the past two years, more than 5000 new InterPro entries have been created. The InterPro website now offers access to 85 000 protein families and domains from its member databases and serves as a long-term archive for retired databases. InterPro data, software and tools are freely available., (© The Author(s) 2024. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2024
- Full Text
- View/download PDF
18. NCBI RefSeq: reference sequence standards through 25 years of curation and annotation.
- Author
-
Goldfarb T, Kodali VK, Pujar S, Brover V, Robbertse B, Farrell CM, Oh DH, Astashyn A, Ermolaeva O, Haddad D, Hlavina W, Hoffman J, Jackson JD, Joardar VS, Kristensen D, Masterson P, McGarvey KM, McVeigh R, Mozes E, Murphy MR, Schafer SS, Souvorov A, Spurrier B, Strope PK, Sun H, Vatsan AR, Wallin C, Webb D, Brister JR, Hatcher E, Kimchi A, Klimke W, Marchler-Bauer A, Pruitt KD, Thibaud-Nissen F, and Murphy TD
- Abstract
Reference sequences and annotations serve as the foundation for many lines of research today, from organism and sequence identification to providing a core description of the genes, transcripts and proteins found in an organism's genome. Interpretation of data including transcriptomics, proteomics, sequence variation and comparative analyses based on reference gene annotations informs our understanding of gene function and possible disease mechanisms, leading to new biomedical discoveries. The Reference Sequence (RefSeq) resource created at the National Center for Biotechnology Information (NCBI) leverages both automatic processes and expert curation to create a robust set of reference sequences of genomic, transcript and protein data spanning the tree of life. RefSeq continues to refine its annotation and quality control processes and utilize better quality genomes resulting from advances in sequencing technologies as well as RNA-Seq data to produce high-quality annotated genomes, ortholog predictions across more organisms and other products that are easily accessible through multiple NCBI resources. This report summarizes the current status of the eukaryotic, prokaryotic and viral RefSeq resources, with a focus on eukaryotic annotation, the increase in taxonomic representation and the effect it will have on comparative genomics. The RefSeq resource is publicly accessible at https://www.ncbi.nlm.nih.gov/refseq., (Published by Oxford University Press on behalf of Nucleic Acids Research 2024.)
- Published
- 2024
- Full Text
- View/download PDF
19. Database resources of the National Center for Biotechnology Information in 2025.
- Author
-
Sayers EW, Beck J, Bolton EE, Brister JR, Chan J, Connor R, Feldgarden M, Fine AM, Funk K, Hoffman J, Kannan S, Kelly C, Klimke W, Kim S, Lathrop S, Marchler-Bauer A, Murphy TD, O'Sullivan C, Schmieder E, Skripchenko Y, Stine A, Thibaud-Nissen F, Wang J, Ye J, Zellers E, Schneider VA, and Pruitt KD
- Abstract
The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence repository and the PubMed® repository of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 31 distinct repositories and knowledgebases. The E-utilities serve as the programming interface for most of these. Resources receiving significant updates in the past year include PubMed, PubMed Central, Bookshelf, the NIH Comparative Genomics Resource, BLAST, Sequence Read Archive, Taxonomy, iCn3D, Conserved Domain Database, Pathogen Detection, antimicrobial resistance resources and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov., (Published by Oxford University Press on behalf of Nucleic Acids Research 2024.)
- Published
- 2024
- Full Text
- View/download PDF
20. Development and extensive sequencing of a broadly-consented Genome in a Bottle matched tumor-normal pair.
- Author
-
McDaniel JH, Patel V, Olson ND, He HJ, He Z, Cole KD, Schmitt A, Sikkink K, Sedlazeck FJ, Doddapaneni H, Jhangiani SN, Muzny DM, Gingras MC, Mehta H, Paulin LF, Hastie AR, Yu HC, Weigman V, Rojas A, Kennedy K, Remington J, Gonzalez I, Sudkamp M, Wiseman K, Lajoie BR, Levy S, Jain M, Akeson S, Narzisi G, Steinsnyder Z, Reeves C, Shelton J, Kingan SB, Lambert C, Bayabyan P, Wenger AM, McLaughlin IJ, Adamson A, Kingsley C, Wescott M, Kim Y, Paten B, Park J, Violich I, Miga KH, Gardner J, McNulty B, Rosen G, McCoy R, Brundu F, Sayyari E, Scheffler K, Truong S, Catreux S, Hannah LC, Lipson D, Benjamin H, Iremadze N, Soifer I, Eacker S, Wood M, Cross E, Husar G, Gross S, Vernich M, Kolmogorov M, Ahmad T, Keskus A, Bryant A, Thibaud-Nissen F, Trow J, Proszynski J, Hirschberg JW, Ryon K, Mason CE, Wagner J, Xiao C, Liss AS, and Zook JM
- Abstract
The Genome in a Bottle Consortium (GIAB), hosted by the National Institute of Standards and Technology (NIST), is developing new matched tumor-normal samples, the first to be explicitly consented for public dissemination of genomic data and cell lines. Here, we describe a comprehensive genomic dataset from the first individual, HG008, including DNA from an adherent, epithelial-like pancreatic ductal adenocarcinoma (PDAC) tumor cell line and matched normal cells from duodenal and pancreatic tissues. Data for the tumor-normal matched samples comes from thirteen distinct state-of-the-art whole genome measurement technologies, including high depth short and long-read bulk whole genome sequencing (WGS), single cell WGS, and Hi-C, and karyotyping. These data will be used by the GIAB Consortium to develop matched tumor-normal benchmarks for somatic variant detection. We expect these data to facilitate innovation for whole genome measurement technologies, de novo assembly of tumor and normal genomes, and bioinformatic tools to identify small and structural somatic mutations. This first-of-its-kind broadly consented open-access resource will facilitate further understanding of sequencing methods used for cancer biology., Competing Interests: Competing interests A.S. and K.S. are employees of Arima Genomics. L.F.P. from BCM, was sponsored by Genentech Inc until September 2023. F.J.S from BCM, received research support from Illumina, ONT and Pacbio. A.R.H and H-C.Y. are employees of Bionano Genomics and own stock shares and options of Bionano Genomics, Inc. V.W., K.K., J.R., and I.G. are employees of BioSkryb Genomics. M.S., K.B., B.R.L. and S.L. are employees of Element Biosciences. S.B.K., C.L., P.B., A.M.W., I.J.M., A.A., C.K., M.W., and Y.K. are employees and shareholders of PacBio, Inc. D.L., H.B., N.I., and I.S. are employees and shareholders of Ultima Genomics. S.E. and M.W. are employees of Phase Genomics. E.C., G.H., S.G., and M.V. are employees of KromaTiD, Inc, E.C. is also a shareholder. F.B., E.S., K.S., S.T. and S.C. are employees of Illumina, Inc. All other authors have no competing interests.
- Published
- 2024
- Full Text
- View/download PDF
21. Complete sequencing of ape genomes.
- Author
-
Yoo D, Rhie A, Hebbar P, Antonacci F, Logsdon GA, Solar SJ, Antipov D, Pickett BD, Safonova Y, Montinaro F, Luo Y, Malukiewicz J, Storer JM, Lin J, Sequeira AN, Mangan RJ, Hickey G, Anez GM, Balachandran P, Bankevich A, Beck CR, Biddanda A, Borchers M, Bouffard GG, Brannan E, Brooks SY, Carbone L, Carrel L, Chan AP, Crawford J, Diekhans M, Engelbrecht E, Feschotte C, Formenti G, Garcia GH, de Gennaro L, Gilbert D, Green RE, Guarracino A, Gupta I, Haddad D, Han J, Harris RS, Hartley GA, Harvey WT, Hiller M, Hoekzema K, Houck ML, Jeong H, Kamali K, Kellis M, Kille B, Lee C, Lee Y, Lees W, Lewis AP, Li Q, Loftus M, Loh YHE, Loucks H, Ma J, Mao Y, Martinez JFI, Masterson P, McCoy RC, McGrath B, McKinney S, Meyer BS, Miga KH, Mohanty SK, Munson KM, Pal K, Pennell M, Pevzner PA, Porubsky D, Potapova T, Ringeling FR, Roha JL, Ryder OA, Sacco S, Saha S, Sasaki T, Schatz MC, Schork NJ, Shanks C, Smeds L, Son DR, Steiner C, Sweeten AP, Tassia MG, Thibaud-Nissen F, Torres-González E, Trivedi M, Wei W, Wertz J, Yang M, Zhang P, Zhang S, Zhang Y, Zhang Z, Zhao SA, Zhu Y, Jarvis ED, Gerton JL, Rivas-González I, Paten B, Szpiech ZA, Huber CD, Lenz TL, Konkel MK, Yi SV, Canzar S, Watson CT, Sudmant PH, Molloy E, Garrison E, Lowe CB, Ventura M, O'Neill RJ, Koren S, Makova KD, Phillippy AM, and Eichler EE
- Abstract
We present haplotype-resolved reference genomes and comparative analyses of six ape species, namely: chimpanzee, bonobo, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. We achieve chromosome-level contiguity with unparalleled sequence accuracy (<1 error in 500,000 base pairs), completely sequencing 215 gapless chromosomes telomere-to-telomere. We resolve challenging regions, such as the major histocompatibility complex and immunoglobulin loci, providing more in-depth evolutionary insights. Comparative analyses, including human, allow us to investigate the evolution and diversity of regions previously uncharacterized or incompletely studied without bias from mapping to the human reference. This includes newly minted gene families within lineage-specific segmental duplications, centromeric DNA, acrocentric chromosomes, and subterminal heterochromatin. This resource should serve as a definitive baseline for all future evolutionary studies of humans and our closest living ape relatives., Competing Interests: COMPETING INTERESTS E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc. C.T.W. is a co-founder/CSO of Clareo Biosciences, Inc. W.L. is a co-founder/CIO of Clareo Biosciences, Inc. The other authors declare no competing interests.
- Published
- 2024
- Full Text
- View/download PDF
22. The complete sequence and comparative analysis of ape sex chromosomes.
- Author
-
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bornberg-Bauer E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJC, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PGS, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Pond SLK, LaPolice TM, Lee C, Lewis AP, Loh YE, Masterson P, McGarvey KM, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O'Neill RJ, Eichler EE, and Phillippy AM
- Subjects
- Animals, Female, Male, Gorilla gorilla genetics, Hylobatidae genetics, Pan paniscus genetics, Pan troglodytes genetics, Phylogeny, Pongo abelii genetics, Pongo pygmaeus genetics, Telomere genetics, Evolution, Molecular, DNA Copy Number Variations genetics, Humans, Endangered Species, Reference Standards, Hominidae genetics, Hominidae classification, X Chromosome genetics, Y Chromosome genetics
- Abstract
Apes possess two sex chromosomes-the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility
1 . The X chromosome is vital for reproduction and cognition2 . Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements-owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species., (© 2024. The Author(s).)- Published
- 2024
- Full Text
- View/download PDF
23. A chromosome-scale fishing cat reference genome for the evaluation of potential germline risk variants.
- Author
-
Carroll RA, Rice ES, Murphy WJ, Lyons LA, Thibaud-Nissen F, Coghill LM, Swanson WF, Terio KA, Boyd T, and Warren WC
- Subjects
- Cats, Animals, Humans, Genome genetics, Genomics, Germ Cells pathology, Urinary Bladder Neoplasms pathology, Carcinoma, Transitional Cell pathology
- Abstract
The fishing cat, Prionailurus viverrinus, faces a population decline, increasing the importance of maintaining healthy zoo populations. Unfortunately, zoo-managed individuals currently face a high prevalence of transitional cell carcinoma (TCC), a form of bladder cancer. To investigate the genetics of inherited diseases among captive fishing cats, we present a chromosome-scale assembly, generate the pedigree of the zoo-managed population, reaffirm the close genetic relationship with the Asian leopard cat (Prionailurus bengalensis), and identify 7.4 million single nucleotide variants (SNVs) and 23,432 structural variants (SVs) from whole genome sequencing (WGS) data of healthy and TCC cats. Only BRCA2 was found to have a high recurrent number of missense mutations in fishing cats diagnosed with TCC when compared to inherited human cancer risk variants. These new fishing cat genomic resources will aid conservation efforts to improve their genetic fitness and enhance the comparative study of feline genomes., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
24. RefSeq and the prokaryotic genome annotation pipeline in the age of metagenomes.
- Author
-
Haft DH, Badretdin A, Coulouris G, DiCuccio M, Durkin AS, Jovenitti E, Li W, Mersha M, O'Neill KR, Virothaisakun J, and Thibaud-Nissen F
- Subjects
- Genome, Archaeal genetics, Genome, Bacterial genetics, Internet, Molecular Sequence Annotation, Proteins genetics, Archaea genetics, Bacteria genetics, Databases, Nucleic Acid standards, Databases, Nucleic Acid trends, Metagenome
- Abstract
The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains over 315 000 bacterial and archaeal genomes and 236 million proteins with up-to-date and consistent annotation. In the past 3 years, we have expanded the diversity of the RefSeq collection by including the best quality metagenome-assembled genomes (MAGs) submitted to INSDC (DDBJ, ENA and GenBank), while maintaining its quality by adding validation checks. Assemblies are now more stringently evaluated for contamination and for completeness of annotation prior to acceptance into RefSeq. MAGs now account for over 17000 assemblies in RefSeq, split over 165 orders and 362 families. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP), which is used to annotate nearly all RefSeq assemblies include better detection of protein-coding genes. Nearly 83% of RefSeq proteins are now named by a curated Protein Family Model, a 4.7% increase in the past three years ago. In addition to literature citations, Enzyme Commission numbers, and gene symbols, Gene Ontology terms are now assigned to 48% of RefSeq proteins, allowing for easier multi-genome comparison. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/. PGAP is available as a stand-alone tool able to produce GenBank-ready files at https://github.com/ncbi/pgap., (Published by Oxford University Press on behalf of Nucleic Acids Research 2023.)
- Published
- 2024
- Full Text
- View/download PDF
25. Chromosome-level genome assembly of chub mackerel (Scomber japonicus) from the Indo-Pacific Ocean.
- Author
-
Lee YH, Abueg L, Kim JK, Kim YW, Fedrigo O, Balacco J, Formenti G, Howe K, Tracey A, Wood J, Thibaud-Nissen F, Nam BH, No ES, Kim HR, Lee C, Jarvis ED, and Kim H
- Subjects
- Animals, Chromosomes, Pacific Ocean, Cyprinidae genetics, Genome, Perciformes genetics
- Abstract
Chub mackerels (Scomber japonicus) are a migratory marine fish widely distributed in the Indo-Pacific Ocean. They are globally consumed for their high Omega-3 content, but their population is declining due to global warming. Here, we generated the first chromosome-level genome assembly of chub mackerel (fScoJap1) using the Vertebrate Genomes Project assembly pipeline with PacBio HiFi genomic sequencing and Arima Hi-C chromosome contact data. The final assembly is 828.68 Mb with 24 chromosomes, nearly all containing telomeric repeats at their ends. We annotated 31,656 genes and discovered that approximately 2.19% of the genome contained DNA transposon elements repressed within duplicated genes. Analyzing 5-methylcytosine (5mC) modifications using HiFi reads, we observed open/close chromatin patterns at gene promoters, including the FADS2 gene involved in Omega-3 production. This chromosome-level reference genome provides unprecedented opportunities for advancing our knowledge of chub mackerels in biology, industry, and conservation., (© 2023. The Author(s).)
- Published
- 2023
- Full Text
- View/download PDF
26. The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes.
- Author
-
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bomberg E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJC, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PGS, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Kosakovsky Pond SL, LaPolice TM, Lee C, Lewis AP, Loh YE, Masterson P, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O'Neill RJ, Eichler E, and Phillippy AM
- Abstract
Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species., Competing Interests: Competing Interests EEE is a scientific advisory board (SAB) member of Variant Bio, Inc. RJO is a scientific advisory board (SAB) member of Colossal Biosciences, Inc. CL is a scientific advisory board (SAB) member of Nabsys, Inc. and Genome Insight, Inc.
- Published
- 2023
- Full Text
- View/download PDF
27. The admixed brushtail possum genome reveals invasion history in New Zealand and novel imprinted genes.
- Author
-
Bond DM, Ortega-Recalde O, Laird MK, Hayakawa T, Richardson KS, Reese FCB, Kyle B, McIsaac-Williams BE, Robertson BC, van Heezik Y, Adams AL, Chang WS, Haase B, Mountcastle J, Driller M, Collins J, Howe K, Go Y, Thibaud-Nissen F, Lister NC, Waters PD, Fedrigo O, Jarvis ED, Gemmell NJ, Alexander A, and Hore TA
- Subjects
- Animals, Australia, New Zealand epidemiology, Marsupialia
- Abstract
Combining genome assembly with population and functional genomics can provide valuable insights to development and evolution, as well as tools for species management. Here, we present a chromosome-level genome assembly of the common brushtail possum (Trichosurus vulpecula), a model marsupial threatened in parts of their native range in Australia, but also a major introduced pest in New Zealand. Functional genomics reveals post-natal activation of chemosensory and metabolic genes, reflecting unique adaptations to altricial birth and delayed weaning, a hallmark of marsupial development. Nuclear and mitochondrial analyses trace New Zealand possums to distinct Australian subspecies, which have subsequently hybridised. This admixture allowed phasing of parental alleles genome-wide, ultimately revealing at least four genes with imprinted, parent-specific expression not yet detected in other species (MLH1, EPM2AIP1, UBP1 and GPX7). We find that reprogramming of possum germline imprints, and the wider epigenome, is similar to eutherian mammals except onset occurs after birth. Together, this work is useful for genetic-based control and conservation of possums, and contributes to understanding of the evolution of novel mammalian epigenetic traits., (© 2023. Springer Nature Limited.)
- Published
- 2023
- Full Text
- View/download PDF
28. The complete sequence of a human Y chromosome.
- Author
-
Rhie A, Nurk S, Cechova M, Hoyt SJ, Taylor DJ, Altemose N, Hook PW, Koren S, Rautiainen M, Alexandrov IA, Allen J, Asri M, Bzikadze AV, Chen NC, Chin CS, Diekhans M, Flicek P, Formenti G, Fungtammasan A, Garcia Giron C, Garrison E, Gershman A, Gerton JL, Grady PGS, Guarracino A, Haggerty L, Halabian R, Hansen NF, Harris R, Hartley GA, Harvey WT, Haukness M, Heinz J, Hourlier T, Hubley RM, Hunt SE, Hwang S, Jain M, Kesharwani RK, Lewis AP, Li H, Logsdon GA, Lucas JK, Makalowski W, Markovic C, Martin FJ, Mc Cartney AM, McCoy RC, McDaniel J, McNulty BM, Medvedev P, Mikheenko A, Munson KM, Murphy TD, Olsen HE, Olson ND, Paulin LF, Porubsky D, Potapova T, Ryabov F, Salzberg SL, Sauria MEG, Sedlazeck FJ, Shafin K, Shepelev VA, Shumate A, Storer JM, Surapaneni L, Taravella Oill AM, Thibaud-Nissen F, Timp W, Tomaszkiewicz M, Vollger MR, Walenz BP, Watwood AC, Weissensteiner MH, Wenger AM, Wilson MA, Zarate S, Zhu Y, Zook JM, Eichler EE, O'Neill RJ, Schatz MC, Miga KH, Makova KD, and Phillippy AM
- Subjects
- Humans, Base Sequence, DNA, Satellite genetics, Genetic Variation genetics, Genetics, Population, Heterochromatin genetics, Multigene Family genetics, Reference Standards, Segmental Duplications, Genomic genetics, Tandem Repeat Sequences genetics, Telomere genetics, Chromosomes, Human, Y genetics, Genomics methods, Genomics standards, Sequence Analysis, DNA standards
- Abstract
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications
1-3 . As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5 . Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes., (© 2023. This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply.)- Published
- 2023
- Full Text
- View/download PDF
29. A draft human pangenome reference.
- Author
-
Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas JK, Monlong J, Abel HJ, Buonaiuto S, Chang XH, Cheng H, Chu J, Colonna V, Eizenga JM, Feng X, Fischer C, Fulton RS, Garg S, Groza C, Guarracino A, Harvey WT, Heumos S, Howe K, Jain M, Lu TY, Markello C, Martin FJ, Mitchell MW, Munson KM, Mwaniki MN, Novak AM, Olsen HE, Pesout T, Porubsky D, Prins P, Sibbesen JA, Sirén J, Tomlinson C, Villani F, Vollger MR, Antonacci-Fulton LL, Baid G, Baker CA, Belyaeva A, Billis K, Carroll A, Chang PC, Cody S, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Ebert P, Fairley S, Fedrigo O, Felsenfeld AL, Formenti G, Frankish A, Gao Y, Garrison NA, Giron CG, Green RE, Haggerty L, Hoekzema K, Hourlier T, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Magalhães H, Marco-Sola S, Marijon P, McCartney A, McDaniel J, Mountcastle J, Nattestad M, Nurk S, Olson ND, Popejoy AB, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Smith MW, Sofia HJ, Abou Tayoun AN, Thibaud-Nissen F, Tricomi FF, Wagner J, Walenz B, Wood JMD, Zimin AV, Bourque G, Chaisson MJP, Flicek P, Phillippy AM, Zook JM, Eichler EE, Haussler D, Wang T, Jarvis ED, Miga KH, Garrison E, Marschall T, Hall IM, Li H, and Paten B
- Subjects
- Humans, Diploidy, Haplotypes genetics, Sequence Analysis, DNA, Reference Standards, Cohort Studies, Alleles, Genetic Variation, Genome, Human genetics, Genomics standards
- Abstract
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals
1 . These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample., (© 2023. The Author(s).)- Published
- 2023
- Full Text
- View/download PDF
30. Divergent sensory and immune gene evolution in sea turtles with contrasting demographic and life histories.
- Author
-
Bentley BP, Carrasco-Valenzuela T, Ramos EKS, Pawar H, Souza Arantes L, Alexander A, Banerjee SM, Masterson P, Kuhlwilm M, Pippel M, Mountcastle J, Haase B, Uliano-Silva M, Formenti G, Howe K, Chow W, Tracey A, Sims Y, Pelan S, Wood J, Yetsko K, Perrault JR, Stewart K, Benson SR, Levy Y, Todd EV, Shaffer HB, Scott P, Henen BT, Murphy RW, Mohr DW, Scott AF, Duffy DJ, Gemmell NJ, Suh A, Winkler S, Thibaud-Nissen F, Nery MF, Marques-Bonet T, Antunes A, Tikochinski Y, Dutton PH, Fedrigo O, Myers EW, Jarvis ED, Mazzoni CJ, and Komoroske LM
- Subjects
- Animals, Ecosystem, Population Dynamics, Turtles
- Abstract
Sea turtles represent an ancient lineage of marine vertebrates that evolved from terrestrial ancestors over 100 Mya. The genomic basis of the unique physiological and ecological traits enabling these species to thrive in diverse marine habitats remains largely unknown. Additionally, many populations have drastically declined due to anthropogenic activities over the past two centuries, and their recovery is a high global conservation priority. We generated and analyzed high-quality reference genomes for the leatherback ( Dermochelys coriacea ) and green ( Chelonia mydas ) turtles, representing the two extant sea turtle families. These genomes are highly syntenic and homologous, but localized regions of noncollinearity were associated with higher copy numbers of immune, zinc-finger, and olfactory receptor (OR) genes in green turtles, with ORs related to waterborne odorants greatly expanded in green turtles. Our findings suggest that divergent evolution of these key gene families may underlie immunological and sensory adaptations assisting navigation, occupancy of neritic versus pelagic environments, and diet specialization. Reduced collinearity was especially prevalent in microchromosomes, with greater gene content, heterozygosity, and genetic distances between species, supporting their critical role in vertebrate evolutionary adaptation. Finally, diversity and demographic histories starkly contrasted between species, indicating that leatherback turtles have had a low yet stable effective population size, exhibit extremely low diversity compared with other reptiles, and harbor a higher genetic load compared with green turtles, reinforcing concern over their persistence under future climate scenarios. These genomes provide invaluable resources for advancing our understanding of evolution and conservation best practices in an imperiled vertebrate lineage.
- Published
- 2023
- Full Text
- View/download PDF
31. A chromosome-level reference genome and pangenome for barn swallow population genomics.
- Author
-
Secomandi S, Gallo GR, Sozzoni M, Iannucci A, Galati E, Abueg L, Balacco J, Caprioli M, Chow W, Ciofi C, Collins J, Fedrigo O, Ferretti L, Fungtammasan A, Haase B, Howe K, Kwak W, Lombardo G, Masterson P, Messina G, Møller AP, Mountcastle J, Mousseau TA, Ferrer Obiol J, Olivieri A, Rhie A, Rubolini D, Saclier M, Stanyon R, Stucki D, Thibaud-Nissen F, Torrance J, Torroni A, Weber K, Ambrosini R, Bonisoli-Alquati A, Jarvis ED, Gianfranceschi L, and Formenti G
- Subjects
- Animals, Metagenomics, Genome genetics, Genomics, Chromosomes, Swallows genetics
- Abstract
Insights into the evolution of non-model organisms are limited by the lack of reference genomes of high accuracy, completeness, and contiguity. Here, we present a chromosome-level, karyotype-validated reference genome and pangenome for the barn swallow (Hirundo rustica). We complement these resources with a reference-free multialignment of the reference genome with other bird genomes and with the most comprehensive catalog of genetic markers for the barn swallow. We identify potentially conserved and accelerated genes using the multialignment and estimate genome-wide linkage disequilibrium using the catalog. We use the pangenome to infer core and accessory genes and to detect variants using it as a reference. Overall, these resources will foster population genomics studies in the barn swallow, enable detection of candidate genes in comparative genomics studies, and help reduce bias toward a single reference genome., Competing Interests: Declaration of interests D.S. and K.W. are full-time employees at Pacific Biosciences, a company commercializing single-molecule sequencing technologies., (Copyright © 2023 The Author(s). Published by Elsevier Inc. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
32. The Assembled Genome of the Stroke-Prone Spontaneously Hypertensive Rat.
- Author
-
Kalbfleisch TS, Hussien AbouEl Ela NA, Li K, Brashear WA, Kochan KJ, Hillhouse AE, Zhu Y, Dhande IS, Kline EJ, Hudson EA, Murphy TD, Thibaud-Nissen F, Smith ML, and Doris PA
- Subjects
- Humans, Rats, Animals, Rats, Inbred SHR, Stroke genetics
- Abstract
Background: We report the creation and evaluation of a de novo assembly of the genome of the spontaneously hypertensive rat, the most widely used model of human cardiovascular disease., Methods: The genome is assembled from long read sequencing (PacBio HiFi and continuous long read data [CLR]) and scaffolded with long-range structural information obtained from Bionano optical maps and proximity ligation sequencing proximity analysis of the genome. The genome assembly was polished with Illumina short reads. Completeness of the assembly was investigated using Benchmarking Universal Single Copy Orthologs analysis. The genome assembly was also evaluated with the rat reference gene set, using NCBI automated protocols. We also generated orthogonal single molecule transcript sequence reads (Iso-Seq) from 8 tissues and used them to validate the coding assembly, to annotate the assembly with RNA transcripts representing unique full length transcript isoforms for each gene and to determine whether divergences between RefSeq sequences and the assembly were attributable to assembly errors or polymorphisms., Results: The assembly analysis indicates that this assembly is comparable in contiguity and completeness to the current rat reference assembly, while the use of HiFi sequencing yields an assembly that is more correct at the single base level. Synteny analysis was performed to uncover the extent of synteny and the presence and distribution of chromosomal rearrangements between the reference and this assembly., Conclusion: The resulting genome assembly is reference quality and captures significant structural variation.
- Published
- 2023
- Full Text
- View/download PDF
33. Semi-automated assembly of high-quality diploid human reference genomes.
- Author
-
Jarvis ED, Formenti G, Rhie A, Guarracino A, Yang C, Wood J, Tracey A, Thibaud-Nissen F, Vollger MR, Porubsky D, Cheng H, Asri M, Logsdon GA, Carnevali P, Chaisson MJP, Chin CS, Cody S, Collins J, Ebert P, Escalona M, Fedrigo O, Fulton RS, Fulton LL, Garg S, Gerton JL, Ghurye J, Granat A, Green RE, Harvey W, Hasenfeld P, Hastie A, Haukness M, Jaeger EB, Jain M, Kirsche M, Kolmogorov M, Korbel JO, Koren S, Korlach J, Lee J, Li D, Lindsay T, Lucas J, Luo F, Marschall T, Mitchell MW, McDaniel J, Nie F, Olsen HE, Olson ND, Pesout T, Potapova T, Puiu D, Regier A, Ruan J, Salzberg SL, Sanders AD, Schatz MC, Schmitt A, Schneider VA, Selvaraj S, Shafin K, Shumate A, Stitziel NO, Stober C, Torrance J, Wagner J, Wang J, Wenger A, Xiao C, Zimin AV, Zhang G, Wang T, Li H, Garrison E, Haussler D, Hall I, Zook JM, Eichler EE, Phillippy AM, Paten B, Howe K, and Miga KH
- Subjects
- Humans, Haplotypes genetics, High-Throughput Nucleotide Sequencing methods, High-Throughput Nucleotide Sequencing standards, Sequence Analysis, DNA methods, Sequence Analysis, DNA standards, Reference Standards, Chromosomes, Human genetics, Genetic Variation genetics, Chromosome Mapping standards, Diploidy, Genome, Human genetics, Genomics methods, Genomics standards
- Abstract
The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society
1,2 . However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4 . Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5 . To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6 . Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements., (© 2022. The Author(s).)- Published
- 2022
- Full Text
- View/download PDF
34. A roadmap for the functional annotation of protein families: a community perspective.
- Author
-
de Crécy-Lagard V, Amorin de Hegedus R, Arighi C, Babor J, Bateman A, Blaby I, Blaby-Haas C, Bridge AJ, Burley SK, Cleveland S, Colwell LJ, Conesa A, Dallago C, Danchin A, de Waard A, Deutschbauer A, Dias R, Ding Y, Fang G, Friedberg I, Gerlt J, Goldford J, Gorelik M, Gyori BM, Henry C, Hutinet G, Jaroch M, Karp PD, Kondratova L, Lu Z, Marchler-Bauer A, Martin MJ, McWhite C, Moghe GD, Monaghan P, Morgat A, Mungall CJ, Natale DA, Nelson WC, O'Donoghue S, Orengo C, O'Toole KH, Radivojac P, Reed C, Roberts RJ, Rodionov D, Rodionova IA, Rudolf JD, Saleh L, Sheynkman G, Thibaud-Nissen F, Thomas PD, Uetz P, Vallenet D, Carter EW, Weigele PR, Wood V, Wood-Charlson EM, and Xu J
- Subjects
- Base Sequence, Computational Biology, Genome, Molecular Sequence Annotation, Genomics, Proteins
- Abstract
Over the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3-4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward., (© The Author(s) 2022. Published by Oxford University Press.)
- Published
- 2022
- Full Text
- View/download PDF
35. Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation.
- Author
-
Formenti G, Rhie A, Walenz BP, Thibaud-Nissen F, Shafin K, Koren S, Myers EW, Jarvis ED, and Phillippy AM
- Subjects
- Genome, Genomics, Humans, Sequence Analysis, DNA, High-Throughput Nucleotide Sequencing, Nanopores
- Abstract
Variant calling has been widely used for genotyping and for improving the consensus accuracy of long-read assemblies. Variant calls are commonly hard-filtered with user-defined cutoffs. However, it is impossible to define a single set of optimal cutoffs, as the calls heavily depend on the quality of the reads, the variant caller of choice and the quality of the unpolished assembly. Here, we introduce Merfin, a k-mer based variant-filtering algorithm for improved accuracy in genotyping and genome assembly polishing. Merfin evaluates each variant based on the expected k-mer multiplicity in the reads, independently of the quality of the read alignment and variant caller's internal score. Merfin increased the precision of genotyped calls in several benchmarks, improved consensus accuracy and reduced frameshift errors when applied to human and nonhuman assemblies built from Pacific Biosciences HiFi and continuous long reads or Oxford Nanopore reads, including the first complete human genome. Moreover, we introduce assembly quality and completeness metrics that account for the expected genomic copy numbers., (© 2022. This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply.)
- Published
- 2022
- Full Text
- View/download PDF
36. The complete sequence of a human genome.
- Author
-
Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, Aganezov S, Hoyt SJ, Diekhans M, Logsdon GA, Alonge M, Antonarakis SE, Borchers M, Bouffard GG, Brooks SY, Caldas GV, Chen NC, Cheng H, Chin CS, Chow W, de Lima LG, Dishuck PC, Durbin R, Dvorkina T, Fiddes IT, Formenti G, Fulton RS, Fungtammasan A, Garrison E, Grady PGS, Graves-Lindsay TA, Hall IM, Hansen NF, Hartley GA, Haukness M, Howe K, Hunkapiller MW, Jain C, Jain M, Jarvis ED, Kerpedjiev P, Kirsche M, Kolmogorov M, Korlach J, Kremitzki M, Li H, Maduro VV, Marschall T, McCartney AM, McDaniel J, Miller DE, Mullikin JC, Myers EW, Olson ND, Paten B, Peluso P, Pevzner PA, Porubsky D, Potapova T, Rogaev EI, Rosenfeld JA, Salzberg SL, Schneider VA, Sedlazeck FJ, Shafin K, Shew CJ, Shumate A, Sims Y, Smit AFA, Soto DC, Sović I, Storer JM, Streets A, Sullivan BA, Thibaud-Nissen F, Torrance J, Wagner J, Walenz BP, Wenger A, Wood JMD, Xiao C, Yan SM, Young AC, Zarate S, Surti U, McCoy RC, Dennis MY, Alexandrov IA, Gerton JL, O'Neill RJ, Timp W, Zook JM, Schatz MC, Eichler EE, Miga KH, and Phillippy AM
- Subjects
- Cell Line, Chromosomes, Artificial, Bacterial genetics, Chromosomes, Human genetics, Humans, Reference Values, Genome, Human, Human Genome Project, Sequence Analysis, DNA standards
- Abstract
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
- Published
- 2022
- Full Text
- View/download PDF
37. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research.
- Author
-
Morales J, Pujar S, Loveland JE, Astashyn A, Bennett R, Berry A, Cox E, Davidson C, Ermolaeva O, Farrell CM, Fatima R, Gil L, Goldfarb T, Gonzalez JM, Haddad D, Hardy M, Hunt T, Jackson J, Joardar VS, Kay M, Kodali VK, McGarvey KM, McMahon A, Mudge JM, Murphy DN, Murphy MR, Rajput B, Rangwala SH, Riddick LD, Thibaud-Nissen F, Threadgold G, Vatsan AR, Wallin C, Webb D, Flicek P, Birney E, Pruitt KD, Frankish A, Cunningham F, and Murphy TD
- Subjects
- Genome, Humans, Information Dissemination, Molecular Sequence Annotation, National Library of Medicine (U.S.), United States, Computational Biology, Databases, Genetic, Genomics
- Abstract
Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE
1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref.3 ) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation., (© 2022. This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply.)- Published
- 2022
- Full Text
- View/download PDF
38. Standards recommendations for the Earth BioGenome Project.
- Author
-
Lawniczak MKN, Durbin R, Flicek P, Lindblad-Toh K, Wei X, Archibald JM, Baker WJ, Belov K, Blaxter ML, Marques Bonet T, Childers AK, Coddington JA, Crandall KA, Crawford AJ, Davey RP, Di Palma F, Fang Q, Haerty W, Hall N, Hoff KJ, Howe K, Jarvis ED, Johnson WE, Johnson RN, Kersey PJ, Liu X, Lopez JV, Myers EW, Pettersson OV, Phillippy AM, Poelchau MF, Pruitt KD, Rhie A, Castilla-Rubio JC, Sahu SK, Salmon NA, Soltis PS, Swarbreck D, Thibaud-Nissen F, Wang S, Wegrzyn JL, Zhang G, Zhang H, Lewin HA, and Richards S
- Subjects
- Animals, Biodiversity, Genomics methods, Humans, Reference Standards, Reference Values, Sequence Analysis, DNA methods, Sequence Analysis, DNA standards, Base Sequence genetics, Eukaryota genetics, Genomics standards
- Abstract
A global international initiative, such as the Earth BioGenome Project (EBP), requires both agreement and coordination on standards to ensure that the collective effort generates rapid progress toward its goals. To this end, the EBP initiated five technical standards committees comprising volunteer members from the global genomics scientific community: Sample Collection and Processing, Sequencing and Assembly, Annotation, Analysis, and IT and Informatics. The current versions of the resulting standards documents are available on the EBP website, with the recognition that opportunities, technologies, and challenges may improve or change in the future, requiring flexibility for the EBP to meet its goals. Here, we describe some highlights from the proposed standards, and areas where additional challenges will need to be met., Competing Interests: Competing interest statement: P.F. is a member of the scientific advisory boards of Fabric Genomics, Inc. and Eagle Genomics, Ltd., (Copyright © 2022 the Author(s). Published by PNAS.)
- Published
- 2022
- Full Text
- View/download PDF
39. Database resources of the national center for biotechnology information.
- Author
-
Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, Connor R, Funk K, Kelly C, Kim S, Madej T, Marchler-Bauer A, Lanczycki C, Lathrop S, Lu Z, Thibaud-Nissen F, Murphy T, Phan L, Skripchenko Y, Tse T, Wang J, Williams R, Trawick BW, Pruitt KD, and Sherry ST
- Subjects
- Databases, Chemical, Databases, Nucleic Acid, Databases, Protein, Humans, Internet, National Library of Medicine (U.S.), PubMed, United States, Biotechnology trends, Databases, Genetic trends
- Abstract
The National Center for Biotechnology Information (NCBI) produces a variety of online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, RefSeq, SRA, Virus, dbSNP, dbVar, ClinicalTrials.gov, MMDB, iCn3D and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov., (Published by Oxford University Press on behalf of Nucleic Acids Research 2021.)
- Published
- 2022
- Full Text
- View/download PDF
40. Population genomics of the critically endangered kākāpō.
- Author
-
Dussex N, van der Valk T, Morales HE, Wheat CW, Díez-Del-Molino D, von Seth J, Foster Y, Kutschera VE, Guschanski K, Rhie A, Phillippy AM, Korlach J, Howe K, Chow W, Pelan S, Mendes Damas JD, Lewin HA, Hastie AR, Formenti G, Fedrigo O, Guhlin J, Harrop TWR, Le Lec MF, Dearden PK, Haggerty L, Martin FJ, Kodali V, Thibaud-Nissen F, Iorns D, Knapp M, Gemmell NJ, Robertson F, Moorhouse R, Digby A, Eason D, Vercoe D, Howard J, Jarvis ED, Robertson BC, and Dalén L
- Abstract
The kākāpō is a flightless parrot endemic to New Zealand. Once common in the archipelago, only 201 individuals remain today, most of them descending from an isolated island population. We report the first genome-wide analyses of the species, including a high-quality genome assembly for kākāpō, one of the first chromosome-level reference genomes sequenced by the Vertebrate Genomes Project (VGP). We also sequenced and analyzed 35 modern genomes from the sole surviving island population and 14 genomes from the extinct mainland population. While theory suggests that such a small population is likely to have accumulated deleterious mutations through genetic drift, our analyses on the impact of the long-term small population size in kākāpō indicate that present-day island kākāpō have a reduced number of harmful mutations compared to mainland individuals. We hypothesize that this reduced mutational load is due to the island population having been subjected to a combination of genetic drift and purging of deleterious mutations, through increased inbreeding and purifying selection, since its isolation from the mainland ∼10,000 years ago. Our results provide evidence that small populations can survive even when isolated for hundreds of generations. This work provides key insights into kākāpō breeding and recovery and more generally into the application of genetic tools in conservation efforts for endangered species., Competing Interests: The authors declare no competing interests., (© 2021 The Author(s).)
- Published
- 2021
- Full Text
- View/download PDF
41. Author Correction: Improved reference genome of the arboviral vector Aedes albopictus.
- Author
-
Palatini U, Masri RA, Cosme LV, Koren S, Thibaud-Nissen F, Biedler JK, Krsticevic F, Johnston JS, Halbach R, Crawford JE, Antoshechkin I, Failloux AB, Pischedda E, Marconcini M, Ghurye J, Rhie A, Sharma A, Karagodin DA, Jenrette J, Gamez S, Miesen P, Masterson P, Caccone A, Sharakhova MV, Tu Z, Papathanos PA, Van Rij RP, Akbari OS, Powell J, Phillippy AM, and Bonizzoni M
- Published
- 2021
- Full Text
- View/download PDF
42. A high-quality bonobo genome refines the analysis of hominid evolution.
- Author
-
Mao Y, Catacchio CR, Hillier LW, Porubsky D, Li R, Sulovari A, Fernandes JD, Montinaro F, Gordon DS, Storer JM, Haukness M, Fiddes IT, Murali SC, Dishuck PC, Hsieh P, Harvey WT, Audano PA, Mercuri L, Piccolo I, Antonacci F, Munson KM, Lewis AP, Baker C, Underwood JG, Hoekzema K, Huang TH, Sorensen M, Walker JA, Hoffman J, Thibaud-Nissen F, Salama SR, Pang AWC, Lee J, Hastie AR, Paten B, Batzer MA, Diekhans M, Ventura M, and Eichler EE
- Subjects
- Animals, Eukaryotic Initiation Factor-4A genetics, Female, Genes, Gorilla gorilla genetics, Molecular Sequence Annotation standards, Pan troglodytes genetics, Pongo genetics, Segmental Duplications, Genomic, Sequence Analysis, DNA, Evolution, Molecular, Genome genetics, Genomics, Pan paniscus genetics, Phylogeny
- Abstract
The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation
1,2 . Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes1,3-5 and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome.- Published
- 2021
- Full Text
- View/download PDF
43. Towards complete and error-free genome assemblies of all vertebrate species.
- Author
-
Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, Lee C, Ko BJ, Chaisson M, Gedman GL, Cantin LJ, Thibaud-Nissen F, Haggerty L, Bista I, Smith M, Haase B, Mountcastle J, Winkler S, Paez S, Howard J, Vernes SC, Lama TM, Grutzner F, Warren WC, Balakrishnan CN, Burt D, George JM, Biegler MT, Iorns D, Digby A, Eason D, Robertson B, Edwards T, Wilkinson M, Turner G, Meyer A, Kautt AF, Franchini P, Detrich HW 3rd, Svardal H, Wagner M, Naylor GJP, Pippel M, Malinsky M, Mooney M, Simbirsky M, Hannigan BT, Pesout T, Houck M, Misuraca A, Kingan SB, Hall R, Kronenberg Z, Sović I, Dunn C, Ning Z, Hastie A, Lee J, Selvaraj S, Green RE, Putnam NH, Gut I, Ghurye J, Garrison E, Sims Y, Collins J, Pelan S, Torrance J, Tracey A, Wood J, Dagnew RE, Guan D, London SE, Clayton DF, Mello CV, Friedrich SR, Lovell PV, Osipova E, Al-Ajli FO, Secomandi S, Kim H, Theofanopoulou C, Hiller M, Zhou Y, Harris RS, Makova KD, Medvedev P, Hoffman J, Masterson P, Clark K, Martin F, Howe K, Flicek P, Walenz BP, Kwak W, Clawson H, Diekhans M, Nassar L, Paten B, Kraus RHS, Crawford AJ, Gilbert MTP, Zhang G, Venkatesh B, Murphy RW, Koepfli KP, Shapiro B, Johnson WE, Di Palma F, Marques-Bonet T, Teeling EC, Warnow T, Graves JM, Ryder OA, Haussler D, O'Brien SJ, Korlach J, Lewin HA, Howe K, Myers EW, Durbin R, Phillippy AM, and Jarvis ED
- Subjects
- Animals, Birds, Gene Library, Genome Size, Genome, Mitochondrial, Haplotypes, High-Throughput Nucleotide Sequencing, Molecular Sequence Annotation, Sequence Alignment, Sequence Analysis, DNA, Sex Chromosomes genetics, Genome, Genomics methods, Vertebrates genetics
- Abstract
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species
1-4 . To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.- Published
- 2021
- Full Text
- View/download PDF
44. RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.
- Author
-
Li W, O'Neill KR, Haft DH, DiCuccio M, Chetvernin V, Badretdin A, Coulouris G, Chitsaz F, Derbyshire MK, Durkin AS, Gonzales NR, Gwadz M, Lanczycki CJ, Song JS, Thanki N, Wang J, Yamashita RA, Yang M, Zheng C, Marchler-Bauer A, and Thibaud-Nissen F
- Subjects
- Data Curation methods, Data Mining methods, Genomics methods, Internet, Proteins classification, User-Computer Interface, Computational Biology methods, Databases, Genetic, Genome, Archaeal genetics, Genome, Bacterial genetics, Molecular Sequence Annotation methods, Proteins genetics
- Abstract
The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures. As a result, >122 million or 79% of RefSeq proteins are now named based on a match to a curated PFM. Gene symbols, Enzyme Commission numbers or supporting publication attributes are available on over 40% of the PFMs and are inherited by the proteins and features they name, facilitating multi-genome analyses and connections to the literature. In adherence with the principles of FAIR (findable, accessible, interoperable, reusable), the PFMs are available in the Protein Family Models Entrez database to any user. Finally, the reference and representative genome set, a taxonomically diverse subset of RefSeq prokaryotic genomes, is now recalculated regularly and available for download and homology searches with BLAST. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/., (Published by Oxford University Press on behalf of Nucleic Acids Research 2020.)
- Published
- 2021
- Full Text
- View/download PDF
45. Telomere-to-telomere assembly of a complete human X chromosome.
- Author
-
Miga KH, Koren S, Rhie A, Vollger MR, Gershman A, Bzikadze A, Brooks S, Howe E, Porubsky D, Logsdon GA, Schneider VA, Potapova T, Wood J, Chow W, Armstrong J, Fredrickson J, Pak E, Tigyi K, Kremitzki M, Markovic C, Maduro V, Dutra A, Bouffard GG, Chang AM, Hansen NF, Wilfert AB, Thibaud-Nissen F, Schmitt AD, Belton JM, Selvaraj S, Dennis MY, Soto DC, Sahasrabudhe R, Kaya G, Quick J, Loman NJ, Holmes N, Loose M, Surti U, Risques RA, Graves Lindsay TA, Fulton R, Hall I, Paten B, Howe K, Timp W, Young A, Mullikin JC, Pevzner PA, Gerton JL, Sullivan BA, Eichler EE, and Phillippy AM
- Subjects
- Centromere genetics, CpG Islands genetics, DNA Methylation, DNA, Satellite genetics, Female, Humans, Hydatidiform Mole genetics, Male, Pregnancy, Reproducibility of Results, Testis metabolism, Chromosomes, Human, X genetics, Genome, Human genetics, Telomere genetics
- Abstract
After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist
1,2 . Here we present a human genome assembly that surpasses the continuity of GRCh382 , along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome3 , we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.- Published
- 2020
- Full Text
- View/download PDF
46. Improved reference genome of the arboviral vector Aedes albopictus.
- Author
-
Palatini U, Masri RA, Cosme LV, Koren S, Thibaud-Nissen F, Biedler JK, Krsticevic F, Johnston JS, Halbach R, Crawford JE, Antoshechkin I, Failloux AB, Pischedda E, Marconcini M, Ghurye J, Rhie A, Sharma A, Karagodin DA, Jenrette J, Gamez S, Miesen P, Masterson P, Caccone A, Sharakhova MV, Tu Z, Papathanos PA, Van Rij RP, Akbari OS, Powell J, Phillippy AM, and Bonizzoni M
- Subjects
- Aedes immunology, Aedes virology, Animals, Chromosome Mapping, Chromosomes, Genome Size, Immunity, Insect Vectors, Mosquito Vectors immunology, Mosquito Vectors virology, RNA, Small Interfering genetics, Transcriptome, Aedes genetics, Arboviruses genetics, Genome, Mosquito Vectors genetics
- Abstract
Background: The Asian tiger mosquito Aedes albopictus is globally expanding and has become the main vector for human arboviruses in Europe. With limited antiviral drugs and vaccines available, vector control is the primary approach to prevent mosquito-borne diseases. A reliable and accurate DNA sequence of the Ae. albopictus genome is essential to develop new approaches that involve genetic manipulation of mosquitoes., Results: We use long-read sequencing methods and modern scaffolding techniques (PacBio, 10X, and Hi-C) to produce AalbF2, a dramatically improved assembly of the Ae. albopictus genome. AalbF2 reveals widespread viral insertions, novel microRNAs and piRNA clusters, the sex-determining locus, and new immunity genes, and enables genome-wide studies of geographically diverse Ae. albopictus populations and analyses of the developmental and stage-dependent network of expression data. Additionally, we build the first physical map for this species with 75% of the assembled genome anchored to the chromosomes., Conclusion: The AalbF2 genome assembly represents the most up-to-date collective knowledge of the Ae. albopictus genome. These resources represent a foundation to improve understanding of the adaptation potential and the epidemiological relevance of this species and foster the development of innovative control measures.
- Published
- 2020
- Full Text
- View/download PDF
47. Haplotype-resolved genomes provide insights into structural variation and gene content in Angus and Brahman cattle.
- Author
-
Low WY, Tearle R, Liu R, Koren S, Rhie A, Bickhart DM, Rosen BD, Kronenberg ZN, Kingan SB, Tseng E, Thibaud-Nissen F, Martin FJ, Billis K, Ghurye J, Hastie AR, Lee J, Pang AWC, Heaton MP, Phillippy AM, Hiendleder S, Smith TPL, and Williams JL
- Subjects
- Alleles, Allelic Imbalance, Animals, Base Sequence, Chromosomes, Mammalian genetics, Female, Genetic Loci, INDEL Mutation genetics, Male, Molecular Sequence Annotation, Polymorphism, Single Nucleotide genetics, RNA, Messenger genetics, RNA, Messenger metabolism, Repetitive Sequences, Nucleic Acid genetics, Cattle genetics, Genetic Variation, Genome, Haplotypes genetics
- Abstract
Inbred animals were historically chosen for genome analysis to circumvent assembly issues caused by haplotype variation but this resulted in a composite of the two genomes. Here we report a haplotype-aware scaffolding and polishing pipeline which was used to create haplotype-resolved, chromosome-level genome assemblies of Angus (taurine) and Brahman (indicine) cattle subspecies from contigs generated by the trio binning method. These assemblies reveal structural and copy number variants that differentiate the subspecies and that variant detection is sensitive to the specific reference genome chosen. Six genes with immune related functions have additional copies in the indicine compared with taurine lineage and an indicus-specific extra copy of fatty acid desaturase is under positive selection. The haplotyped genomes also enable transcripts to be phased to detect allele-specific expression. This work exemplifies the value of haplotype-resolved genomes to better explore evolutionary and functional variations.
- Published
- 2020
- Full Text
- View/download PDF
48. De novo assembly of the cattle reference genome with single-molecule sequencing.
- Author
-
Rosen BD, Bickhart DM, Schnabel RD, Koren S, Elsik CG, Tseng E, Rowan TN, Low WY, Zimin A, Couldrey C, Hall R, Li W, Rhie A, Ghurye J, McKay SD, Thibaud-Nissen F, Hoffman J, Murdoch BM, Snelling WM, McDaneld TG, Hammond JA, Schwartz JC, Nandolo W, Hagen DE, Dreischer C, Schultheiss SJ, Schroeder SG, Phillippy AM, Cole JB, Van Tassell CP, Liu G, Smith TPL, and Medrano JF
- Subjects
- Animals, Breeding methods, Genomics methods, RNA-Seq methods, RNA-Seq standards, Reference Standards, Sequence Analysis, DNA methods, Sequence Analysis, DNA standards, Breeding standards, Cattle genetics, Genome, Genomics standards, Polymorphism, Genetic
- Abstract
Background: Major advances in selection progress for cattle have been made following the introduction of genomic tools over the past 10-12 years. These tools depend upon the Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies., Results: We present the new reference genome for cattle, ARS-UCD1.2, based on the same animal as the original to facilitate transfer and interpretation of results obtained from the earlier version, but applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness. The assembly includes 2.7 Gb and is >250× more continuous than the original assembly, with contig N50 >25 Mb and L50 of 32. We also greatly expanded supporting RNA-based data for annotation that identifies 30,396 total genes (21,039 protein coding). The new reference assembly is accessible in annotated form for public use., Conclusions: We demonstrate that improved continuity of assembled sequence warrants the adoption of ARS-UCD1.2 as the new cattle reference genome and that increased assembly accuracy will benefit future research on this species., (© The Author(s) 2020. Published by Oxford University Press.)
- Published
- 2020
- Full Text
- View/download PDF
49. Database resources of the National Center for Biotechnology Information.
- Author
-
Sayers EW, Beck J, Brister JR, Bolton EE, Canese K, Comeau DC, Funk K, Ketter A, Kim S, Kimchi A, Kitts PA, Kuznetsov A, Lathrop S, Lu Z, McGarvey K, Madden TL, Murphy TD, O'Leary N, Phan L, Schneider VA, Thibaud-Nissen F, Trawick BW, Pruitt KD, and Ostell J
- Subjects
- Databases, Nucleic Acid, Genomics methods, Humans, PubMed, United States, Web Browser, Computational Biology methods, Computational Biology organization & administration, Databases, Genetic, National Library of Medicine (U.S.)
- Abstract
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Custom implementations of the BLAST program provide sequence-based searching of many specialized datasets. New resources released in the past year include a new PubMed interface, a sequence database search and a gene orthologs page. Additional resources that were updated in the past year include PMC, Bookshelf, My Bibliography, Assembly, RefSeq, viral genomes, the prokaryotic genome annotation pipeline, Genome Workbench, dbSNP, BLAST, Primer-BLAST, IgBLAST and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov., (Published by Oxford University Press on behalf of Nucleic Acids Research 2019.)
- Published
- 2020
- Full Text
- View/download PDF
50. A guinea fowl genome assembly provides new evidence on evolution following domestication and selection in galliformes.
- Author
-
Vignal A, Boitard S, Thébault N, Dayo GK, Yapi-Gnaore V, Youssao Abdou Karim I, Berthouly-Salazar C, Pálinkás-Bodzsár N, Guémené D, Thibaud-Nissen F, Warren WC, Tixier-Boichard M, and Rognon X
- Subjects
- Africa, Animals, Computational Biology, Europe, Sequence Analysis, DNA, Whole Genome Sequencing, Domestication, Evolution, Molecular, Galliformes classification, Galliformes genetics, Genome, Selection, Genetic
- Abstract
The helmeted guinea fowl Numida meleagris belongs to the order Galliformes. Its natural range includes a large part of sub-Saharan Africa, from Senegal to Eritrea and from Chad to South Africa. Archaeozoological and artistic evidence suggest domestication of this species may have occurred about 2,000 years BP in Mali and Sudan primarily as a food resource, although villagers also benefit from its capacity to give loud alarm calls in case of danger, of its ability to consume parasites such as ticks and to hunt snakes, thus suggesting its domestication may have resulted from a commensal association process. Today, it is still farmed in Africa, mainly as a traditional village poultry, and is also bred more intensively in other countries, mainly France and Italy. The lack of available molecular genetic markers has limited the genetic studies conducted to date on guinea fowl. We present here a first-generation whole-genome sequence draft assembly used as a reference for a study by a Pool-seq approach of wild and domestic populations from Europe and Africa. We show that the domestic populations share a higher genetic similarity between each other than they do to wild populations living in the same geographical area. Several genomic regions showing selection signatures putatively related to domestication or importation to Europe were detected, containing candidate genes, most notably EDNRB2, possibly explaining losses in plumage coloration phenotypes in domesticated populations., (© 2019 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.)
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.