Molecular signatures that characterize a clade are an important tool to create or corroborate phylogenetic groupings (reviewed in Telford and Copley 2011). Examples of such molecular synapomorphies include the presence of Hox/ParaHox genes in the ParaHoxozoa (Placozoa, Cnidaria and Bilateria) (Ryan et al. 2010), an indel in the gene elongation-factor 1-alpha as a character of opisthokonts (Steenkamp, Wright and Baldauf 2006) or the the NAD5 mithocondrial gene that is exclusive to protostomes (Papillon et al 2004). The monophyly of deuterostomes (the clade that comprises chordates, echinoderms, and hemichordates) has been consistently recovered on phylogenetic and phylogenomic studies (Hejnol et al. 2009; Paps, Baguna and Riutort 2009). However, it remains contentious whether Xenoturbella and/or the acoelomorphs (acoels and nemertodermatids) are members of the deuterostomes or basal bilaterians (Ruiz-Trillo et al. 1999; Ruiz-Trillo et al. 2002; Bourlat et al. 2003; Bourlat et al. 2006; Philippe et al. 2007; Dunn et al. 2008; Hejnol et al. 2009; Paps, Baguna and Riutort 2009; Mwinyi et al. 2010; Philippe et al. 2011). Thus the identification of diagnostic molecular synapomorphies for deuterostomes is important to both corroborate previous molecular analyses and independently test the putative deuterostome affiliation of acoels and Xenoturbella. We here show that the bi-functional enzyme UDP-GlcNAc 2-epimerase/N-acetylmannosamine kinase (GNE), is exclusive to deuterostomes, acoels and Xenoturbella, being absent from all sequenced protostomes and non-bilaterian taxa. Our data show that GNE is encoded in the genomes of all sequenced deuterostomes except for the urochordates Ciona savignyi, C. intestinalis and Oikopleura dioica, most likely an effect of secondary gene loss (D’Aniello et al. 2008; Churcher and Taylor 2009). Moreover, a small fragment of the gene is present in the expressed sequence tags (EST) of Xenoturbella bocki and we have amplified the gene GNE from the acoel Symsagittifera roscoffensis (GenBank {"type":"entrez-nucleotide","attrs":{"text":"JF826132","term_id":"334878555","term_text":"JF826132"}}JF826132). We searched publicly available EST data as well as unpublished transcriptome data from nemertodermatids and did not get any hit. However, since a complete genome of a nemertodermatid is not available, we can not discard the presence of GNE in this group. Interestingly, the GNE encoded by chordates, echinoderms and hemichordates all share the same 9 introns (both in position and phase), while the GNE of the acoel S. roscoffensis does not share any intron with deuterostomes (Figure S1). Unfortunately, we could not elucidate the intronexon structure of the GNE encoded by X. bocki since only a small cDNA fragment is available. GNE is known to play an important role in the biosynthesis of sialic acids, which are monosaccharides that act in a wide range of biological and pathological events, such as cellular adhesion, recognition determinants, tumorigenesis and stem cells (Effertz, Hinderlich and Reutter 1999; Tanner 2005; Weidemann et al. 2010). In mammals, the metabolic precursor of sialic acids is the N-acetylneuraminic (Neu5Ac) acid, which derives from UDP-N-acetylglucosamine (UDP-GlcNAc). The first two steps of this reaction are catalyzed by the bi-functional enzyme UDP-GlcNAc 2-epimerase/N-acetylmannosamine kinase (GNE) (Figure 1). The bi-functional activity of GNE comes from two different protein domains: the UDP-N-acetylglucosamine 2-epimerase domain (PF02350) (from herein the “epimerase-2 domain”) and a kinase domain known as ROK (Repressor, ORF, Kinase PF00480) (Tanner 2005). The epimerase-2 domain converts UDP-GlcNAc to N-acetylmannosamine (ManNAc), which is consecutively phosphorylated to ManNAc-6-P by the ROK domain (Figure 1). Interestingly, while in prokaryotes the epimerase and kinase functions are carried out by two separate enzymes, in vertebrates those two domains have been fused, allowing an allosteric site to appear, and thus conferring the potential for new functions to arise. Figure 1 A) Schematic representation of the biosynthesis pathway of CMP-NeuNAc both in bacteria and mammals (adapted from (Tanner 2005). Steps 1 and 2 are performed by GNE in deuterostomes. Step 3 is exclusive of Bacteria; while the additional step 4 has so-far ... Thus, to gain further insights into the evolutionary origin of GNE, we performed both exhaustive searches across the public databases and phylogenetic analyses of the two domains independently. Our data show that both domains have a patchy distribution across eukaryotes. The Epimerase-2 domain is encoded in a few eukaryotic genomes, being, in contrast, ubiquitous among Archaea and Eubacteria (Figure 2). The phylogenetic analyses show two major clades, one comprising the bilaterian-specific GNE genes within a mostly prokaryotic clade (clade A in Figure 2), the other comprising most other (non-metazoan) eukaryotes branching also within a prokaryotic clade (clade B in Figure 2). Both clades are divided with high nodal support (Bootstrap Value (BV) = 100%, and Bayesian Posterior Probability (PP) = 1.00), and both have specific indel characters (see Figure S2). The general topology of the epimerase-2 domain is probably due to either i) an extreme case of hidden paralogy (i.e, this protein domain was present at the origin of eukaryotes and lost in all lineages except in deuterostomes, Xenoturbella, Acoela and a few other eukaryotes), ii) domain convergence, or iii) the consequence of several independent Lateral Gene Transfer (LGT) events to the different eukaryotic lineages, one being a LGT event to the last common bilaterian ancestor (and then subsequently lost in protostomes and in urochordates), or to the last common ancestor of deuterostomes (and lost in urochordates), if xenacoelomorphs (Xenoturbella + acoelomorphs) are indeed deuterostomes as recently suggested (Philippe et al. 2011). Interestingly, the sequence from Micromonas sp. falls as sister group to bilaterians. However, the Micromonas gene, in contrast to the deuterostome sequences, has no introns. Moreover, neither M. pusilla (the other congeneric species with its complete genome sequenced), or any other sequenced clorophyte (except Ostreococcus lucimarinus, whose epimerase gene is far related; see Figure 2) encode this domain. Thus, Micromonas epimerase-2 sequence, as well as Ricinus homolog which branches within another independent bacterial clade (Figure 2), most probably come from independent LGT events (see Supplementary Material). Figure 2 Maximum likelihood (ML) phylogenetic tree of the Epimerase-2 domain as obtained by RAxML (WAG + Γ + I). The tree is rooted using the midpoint-rooted tree option. Nodal support was obtained by RAxML 100-bootstrap replicates (BV) and Bayesian Posterior ... The ROK domain also bears a complex evolutionary history (Figure 3). The phylogenetic analysis shows most eukaryotic sequences in a single monophyletic group that also includes Eubacteria. The bilaterian GNE genes are monophyletic and unrelated to the other eukaryotic sequences (also supported by indel characters, see Figure S2B). Again, this topology can either be explained by LGT or by hidden paralogy. Figure 3 Maximum likelihood (ML) phylogenetic tree of the ROK domain as obtained by RAxML (WAG + Γ + I). The tree is rooted using the eukaryotes as out-group. RAxML 100-bootstrap replicates (BV) and Bayesian Posterior Probabilities (PP) values are shown ... Our data show that GNE (the derived gene fusion of epimerase-2 and ROK domains) is exclusive to most deuterostomes, acoelomorphs and Xenoturbella. How this presence/absence scheme is interpreted remains unclear. One could easily propose the presence of the gene GNE as a molecular synapomorphy of deuterostomes (Figure 1B). This would corroborate the recent, although lowly supported, proposal that xenacoelomorphs are deuterostomes (Philippe, H. et al 2011). However if Xenacoelomorpha or just the Acoelomorpha are not deuterostomes but basal bilaterians, as most phylogenetic analyses suggest (Ruiz-Trillo et al. 1999; Ruiz-Trillo et al. 2002; Hejnol et al. 2009; Paps, Baguna and Riutort 2009; Mwinyi et al. 2010), then GNE was secondarily lost in the last common protostome ancestor. This is indeed not a difficult scenario, specially considering that gene loss must already be hypothesized for urochordates (Figure 1B). The analysis of intron composition shows the GNE encoded by acoels and deuterostomes independently evolved their own introns (Figure S1), somehow supporting acoels are not deuterostomes. However, independent intron evolution within acoels could easily be argued as well; as has been described, for example, in the urochordate Oikopleura dioica (Edvardsen et al. 2004). To sum up, although rare genomic changes can be important phylogenetic markers, they should be used with caution, since gene loss has been shown to play an important role in eukaryotic evolution (see for example Sebe-Pedros et al. 2011 and Zmasek and Godzik 2011). Additional data, especially from phylogenomics analyses, should be taken into account. Finally, the two domains that make up the GNE gene have complex evolutionary histories, most likely involving LGT events or extreme hidden paralogy.