Back to Search Start Over

Using networks to analyze and visualize the distribution of overlapping genes in virus genomes.

Authors :
Muñoz-Baena, Laura
Poon, Art F. Y.
Source :
PLoS Pathogens. 2/24/2022, p1-19. 19p.
Publication Year :
2022

Abstract

Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (−0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps. Author summary: Gene overlap occurs when the same part of a genome encodes two or more genes. This phenomenon is found in all biological domains of life, but it is particularly common in viruses, where it may play a role in making viral genomes more compact. To understand the prevalence of overlapping genes in viruses, we analyzed over 12,000 genomes of every known type of virus for which this genetic information is available. Although overlaps are more abundant in viruses with larger genomes, for instance, they are also significantly shorter. Overlaps in which one of the genes is read in the opposite direction (−0 overlaps) tend to be longer, which may be an emergent property of the universal genetic code. We developed a new computational method to analyze and visualize the distribution of overlaps among genomes belonging to a group (family) of viruses as a network. This approach enabled us to identify distinct patterns in the organization of genomes within virus families; for example, gene overlap in the coronavirus family tends to involve non-essential genes outside of the "core" of the network of genes. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15537366
Database :
Academic Search Index
Journal :
PLoS Pathogens
Publication Type :
Academic Journal
Accession number :
155430944
Full Text :
https://doi.org/10.1371/journal.ppat.1010331