Back to Search Start Over

Latent Taxonomic Signatures: Alignment Free Approach Reveals Semantic Properties of Species Proteomes

Authors :
Ena Melvan
Antonio Starcevic
Toni Cvrljak
Janko Diminic
Jurica Zucko
Paul F. Long
Source :
SSRN Electronic Journal.
Publication Year :
2021
Publisher :
Elsevier BV, 2021.

Abstract

Alignment-based methods allow only one-to-one comparisons; promote gene-centered viewpoint and lack broad insight needed for complex biological systems. In actuality, each gene or a protein is part of conglomerate where more than one sequence contributes to the functional network and evolutionary trajectory of the cell. Conserving these network interactions is arguably more important to the evolutionary success than conservation of sequence integrity of an individual protein. Using alignment-free language model, we encoded sets of randomly selected species’ proteins into distributed vector representations of respective species. These representations captured transitive relations between otherwise unrelated proteins, resulting from conserved interactions within a proteome. This allowed us to discover Latent Taxonomic Signatures, a species-specific difference in the frequency of short amino acid chains occurrence, reflecting constraints imposed on protein evolution by their proteome context. Even orphan proteins exhibited LTSs, allowing us to establish taxonomic relatedness in total absence of alignment-based homology. The alignment-free approach here suggests that difference between species is more than just numbers and sequences, actual semantic properties could be equally important as protein family kinship when proteins evolve as parts of a system.

Details

ISSN :
15565068
Database :
OpenAIRE
Journal :
SSRN Electronic Journal
Accession number :
edsair.doi...........6f0dfb08c6ddda5e06be074f5a95b9a0
Full Text :
https://doi.org/10.2139/ssrn.3877552