The primordial tRNA acceptor stem code from theoretical minimal RNA ring clusters

Background Theoretical minimal RNA rings code by design over the shortest length once for each of the 20 amino acids, a start and a stop codon, and form stem-loop hairpins. This defines at most 25 RNA rings of 22 nucleotides. As a group, RNA rings mimick numerous prebiotic and early life biomolecular properties: tRNAs, deamination gradients and replication origins, emergence of codon preferences for the natural circular code, and contents of early protein coding genes. These properties result from the RNA ring’s in silico design, based mainly on coding nonredundancy among overlapping translation frames, as the genetic code’s codon-amino acid assignments determine. RNA rings resemble ancestral tRNAs, defining RNA ring anticodons and corresponding cognate amino acids. Surprisingly, all examined RNA ring properties coevolve with genetic code integration ranks of RNA ring cognates, as if RNA rings mimick prebiotic and early life evolution. Methods Distances between RNA rings were calculated using different evolutionary models. Associations between these distances and genetic code evolutionary hypotheses detect evolutionary models best describing RNA ring diversification. Results Here pseudo-phylogenetic analyses of RNA rings produce clusters corresponding to the primordial code in tRNA acceptor stems, more so when substitution matrices from neutrally evolving pseudogenes are used rather than from functional protein coding genes reflecting selection for conserving amino acid properties. Conclusions Results indicate RNA rings with recent cognates evolved from those with early cognates. Hence RNA rings, as designed by the genetic code’s structure, simulate tRNA stem evolution and prebiotic history along neutral chemistry-driven mutation regimes.


Background
Several early life hypotheses assume that the first genes combined functions of structural RNAs (tRNAs, rRNAs) and protein coding genes (tRNAs and CDs [1,2]; rRNAs and tRNAs [3][4][5][6][7][8]; rRNAs, tRNAs and CDs [9]). One of the earliest attempts to reconstruct in silico likely ancestral protein coding genes used two main principles, economy and diversity. These rational constraints mean that the sequence should code over the shortest possible length once for each genetic code signal, one start and stop codon, and each of the 20 biogenic amino acids. A third constraint, forming a stem-loop hairpin was added to delay environmental degradation [10,11].
This defines at most 25 circular RNAs, each 22nucleotide long, called theoretical minimal RNA rings. Surprisingly, these sequences mainly defined by coding constraints, resemble the loops of ancestral tRNAs, defining for each RNA ring an anticodon and its corresponding cognate amino acid [12]. They might also have ribozyme-like properties [13]. Hence, RNAs designed mainly according to nonredundancy among overlapping translation frames [14] recover tRNA loop sequences.
The observation that RNA rings, designed according to coding properties, coincide with tRNAs, strengthens the hypothesis of protein coding functions for tRNAs [15,16]. This could explain biases in tRNAs (and rRNAs) for nucleotide triplets corresponding to circular code codons which are overrepresented in protein coding genes (tRNAs [17,18]; rRNAs [19][20][21]). Recent analyses detect further RNA ring properties presumably characterizing prebiotic and early life sequences.

Protein coding gene properties and RNA rings
The RNA rings have several properties that resemble those expected for actual protein coding sequences. For example, the mean position of amino acids in modern proteins, overall, resembles the order of integration of amino acids in the genetic code. Analyses showed that recent amino acids are, on average, closer to the start codon, and ancient amino acids are on average closer to the stop codon [22]. Similarly, peptides translated from RNA rings also recapitulate evolutionary orders of genetic code codon-amino acid assignments [23].
A codon property, called codon directional asymmetry (CDA), groups codons into palindromic (codon structure XYX, CDA = 0), and 5′-and 3′-extremity dominant (YXX/XXY, CDA < 0/CDA > 0). CDA > 0 associates with amino acids that are cognates of class II tRNA synthetases, CDA < 0 associates with cognates of class I tRNA synthetases [24]. Class II tRNA synthetases are believed more ancestral [25][26][27][28]. Indeed, significantly more RNA rings have more codons with CDA > 0 than with CDA < 0, as the independent hypothesis that class II tRNAs are ancestral expects [29]. Similarly, a set of 20 codons that, as a group, have mathematical properties enabling detection of the ribosomal translation frame, the near universal natural circular code [30], are overrepresented among RNA ring codons [31]. In addition, RNA ring sequences are overrepresented in ancient protein coding genes, and in particular RNA rings with ancient cognates [32].

Replication origins and RNA rings
Natural genomes are frequently characterized by gradients in nucleotide biases, proportional to times spent single stranded during replication. This is typically due to hydrolytic deaminations, resulting in the observed nucleotide bias gradients across genomes due to replication [33][34][35][36][37] and/or to transcription [38,39]. Some evidences suggest that deamination gradients do not always result from chemical changes, but from coding constraints that locate genes with high deamination risks at positions where these risks are low (i.e., close to replication/transcription origin(s)) and the genes with nucleotide contents that imply lower deamination risks at locations with higher deamination risks, i.e., at genomic locations that are distant from replication/transcription origins [36,40].
Surprisingly, deamination gradients occur in theoretical minimal RNA rings. These overall start at the 5′ extremity of the RNA ring's anticodon [41], as predicted by homology with ancestral tRNAs [12]. This is in line with likely homologies between mitochondrial tRNAs, their anticodon loops and loops of mitochondrial light strand replication origins (OL) [42,43]. OL loops are the binding sites for the mitochondrial gamma DNA polymerase [44]. This polymerase and its active sites are homologues of active sites recognizing and binding the cognate tRNA anticodon loop and acceptor stem of a bacterial tRNA synthetase, the presumed ancestor of mitochondrial vertebrate gamma DNA polymerases [45][46][47]. Hence, deamination gradients in RNA rings start at the anticodon, the likely polymerase binding site, and reflect known homologies between tRNA synthetases and mitochondrial gamma DNA polymerase.
These evidences for RNA rings functioning as replication origins match their similarity with ancestral tRNAs [12], and the plausible origin of tRNAs from replication origin stem-loop hairpins [48].

tRNA evolution
Previous sections show that the simple design of RNA rings implies several properties expected for ancestral multifunctional RNAs, including coding for a peptide, functioning as replication origin, and as plausible proto-tRNAs [12]. Further analyses based on RNA ring secondary structures strengthen the proto-tRNA hypothesis.
RNA secondary structures can be clustered into two main groups, presumed ancestral tRNA-like secondary structures, and presumed rRNA-like secondary structures. In rRNA-like RNAs, percentages of unpaired nucleotides within stems (forming 'bulges') among unpaired nucleotides, is greater than in tRNA-like RNAs. These are likely targets for enzymatic degradation and hence reflect greater regulation of RNAs forming rRNAlike secondary structures [49,50].
The concept that rRNA-like RNAs are derived from tRNA-like RNAs was tested on tRNAs from all three kingdoms of life and from giant viruses. Cloverleaves formed by tRNAs with each of the possible cognate amino acids were ranked on the tRNA-rRNA secondary structure axis. These estimates derived from secondary structure properties were compared with the genetic code integration orders of the cognate amino acids from various hypotheses on these orders [51]. The working hypothesis expects that tRNAs with relatively more rRNA-like secondary structures have relatively recent cognate amino acids. Results overall fit this prediction, mainly in prokaryotes and viruses, confirming the evolutionary direction of the tRNA-rRNA secondary structure axis [52]. Results from similar analyses of secondary structures formed by RNA rings also follow this pattern, using cognate amino acids of predicted RNA ring anticodons. This pattern is strongest when RNA rings are spliced so as to maximize their similarity with ancestral tRNAs [53].

RNA ring evolution?
Genetic code inclusion orders of RNA ring cognate amino acids are determined according to the RNA ring's anticodon, which is predicted from homology with ancestral tRNA loops [12]. This inclusion order associates with each RNA ring property examined: the tRNA-rRNA axis of secondary structure of RNA rings [53], deamination gradient strengths [41], abundances of RNA ring pieces in ancient genes [32], overrepresentation of codons belonging to the natural circular code [31], overrepresentation of codons with CDA > 0 in RNA rings [29], and tendencies of amino acid sequences translated from RNA rings to recapitulate the amino acid order of integration in the genetic code [23]. Moreover, the evolutionary orders of amino acid integration hypotheses that match best with RNA ring-derived properties tend to be hypotheses derived from tRNA properties, mainly the primitive code in tRNA stems [54], tRNAs as ancestral coding genes [1,2], and the diversity of isoacceptor tRNAs [55]. This is expected if RNA rings are proto-tRNAs.
These analyses show progressive emergence of the various properties from RNA rings with early cognates, to those with cognates having late genetic code integration ranks. This pattern is not trivial and we have no explanation for it: RNA rings result from rational in silico design, and presumably did not evolve one from the other, and/or from a common ancestor, as any evolutionary scenario would imply. Hence, we do not understand how RNA rings mimick evolutionary trends in so many properties, without any biological-historical context. This issue is addressed here, using pseudo-phylogenetic analyses of RNA rings. In other words, do clusters of RNA rings, based on sequence similarities, associate with evolutionary hypotheses on the order of genetic code integration of their predicted cognate amino acid? This would mean that RNA rings concentrate information on the evolution of the biomolecular translation machinery. This information comes from the genetic code's codon-amino acid assignments, which are the major information used in the design of RNA rings. This could be either because the genetic code structure embeds information related to the historical processes that formed it, and/or that its structure determines the biomolecular prebiotic and/or early life evolution.

Distances among RNA ring sequences
The design of RNA rings is based on coding constraints. Their similarity with tRNAs is an unintended result from this design. This tRNA similarity defines the RNA ring's anticodon and its cognate amino acid. The cognate  Table 2 Distances between two RNA ring sequences, RNA ring 13 and RNA ring 25, both with predicted anticodons matching amino acid glycine (underlined). D1 is the distance between these sequences, summing the number of positions where nucleotides differ. D2 indicates a distance that accounts for mutation bias for transitions (mutations within purines (A < ->G) and within pyrimidines (C < ->T)), a distance that is more realistic in an evolutionary context. Transitions among the two sequences are indicated in italics, and are counted as distance 0.5, resulting in D2 = 8.5 amino acid genetic code integration order defines RNA ring evolutionary ranks. Hence, in order to understand how RNA ring comparisons reflect prebiotic and early life evolution, we focus on comparisons among RNA rings, aligning them according to their coding properties, with the stop codon at their 3′ extremity ( Table 1). The 25 RNA rings are compared using a simple distance between sequences, considering each combination of two RNA rings, aligned after splicing them at the 5′ extremity of their stop codon. Identical nucleotides at a given position have distance "0". Non-identical ones have distance "1". These position-specific distances are summed over the complete RNA ring length, producing pairwise distances among RNA rings. These theoretically range from "0" to "22", because RNA rings are 22 nucleotides long. Table 2 compares RNA ring 13 and RNA ring 25. Table 3 presents the matrix of distances among all pairs of RNA rings.
The distance matrix in Table 3 is analyzed for associations with genetic code integration orders of RNA ring cognate amino acids [51], as these are determined by the RNA ring anticodon, defined by homology with ancestral tRNAs (Table 1). Analyses consider separately distances to each focal RNA ring with the remaining 24 RNA rings. The frequency distribution of these Pearson correlation coefficients shows a clear high correlation outlier ( Fig. 1): the association between distances to RNA ring 3 (anticodon corresponding to cognate Ser) and the presence of a primordial code in the tRNA acceptor stem of some tRNAs with presumed ancient cognate amino acids. These consist of prokaryote (Archaea and Bacteria) tRNAs for cognates A, D, G, V, in which the 5′ acceptor stems have at positions 3-5 nucleotide triplets coding for the amino acid that is the tRNA's cognate amino acid [54]. The average distance between RNA ring 3 and RNA rings with predicted cognates A, D, G or V, as compared to the average distance to the remaining RNA rings is statistically significant (6 ± 3.39 vs 14.21 ± 3.07, P = 0.00003, two tailed t-test). After Bonferroni correction considering the 1225 correlations Table 3 D1 distances (closest to minimum, bold) among RNA rings (Table 1). Two last columns: amino acid matching predicted RNA ring anticodon/sum of D1; number N of genetic code integration hypotheses with D2 < D1/* = P < .05 two-tailed sign test  17 18 16 14 15 19 18 18 17 15 16 15 14 18 18  calculated, this corresponds to P = 0.03763, which is still statistically significant at P < 0.05. Note that the genetic code integration hypothesis that associates best with distances among RNA rings is one among the three tRNA-derived hypotheses. This is in line with the RNA ring proto-tRNA hypothesis.

Simple evolutionary distances among RNA rings
The previous section shows that RNA ring clusters match a primitive code in tRNA stems. This raises the question whether distances among RNA rings could be interpreted in an evolutionary sense, considering that RNA rings are rational constructs, and a priori did not evolve one from the other by point mutations. Nucleotide sequence evolution is characterized by a strong bias for transitions, meaning purine-to-purine and pyrimidine-to-pyrimidine mutations ((A < ->G and C < ->T, respectively), as opposed to transversions, purine-to-pyrimidine and pyrimidine-topurine point mutations (the eight remaining point mutation types, A < ->C, A < ->T, C < ->G and G < ->T). Transitions are more frequent, reflecting on average shorter evolutionary distances between sequences. Hence, when differences between two RNA rings could be interpreted as due to transitions, distances are set to "0.5", while when these are due to transversions, distances remain as previously "1". Table 4 presents D2, the evolutionary distance matrix among RNA rings.
The hypothesis that RNA rings evolved from each other predicts that associations with genetic code inclusion hypotheses for RNA ring cognate amino acids should be stronger for D2 than D1. This is not the case: only 50.37% of all comparisons between associations of genetic code inclusion hypotheses with D1 vs D2 indicate stronger associations with D2. For three genetic code inclusion hypotheses, significant majorities of associations were stronger for D2 than D1 (more than 17 cases among 25 comparisons for each hypothesis): Fox's proteinoid hypothesis [56], yields from Bar Nun's shock wave experiment of [57], and the tRNA stem primitive code [54]. Notably, the difference between the outliers and the bulk of the distribution of r values from Fig. 1 becomes more extreme, as the strongest correlation (with the tRNA stem primitive code hypothesis) increases from r = 0.733 to r = 748. The average distance between RNA ring 3 (cognate amino acid S) and RNA rings predicted cognates A, D, G or V, as compared to the average distance to the remaining RNA rings is statistically significant (5.1 ± 2.86 vs 12.0 ± 2.42, P = 0.00002, two tailed t-test). After Bonferroni correction considering the 1225 correlations calculated, this corresponds to P = 0.02050, which is still statistically significant at P < 0.05.
For specific RNA rings, D1 yields significantly more stronger associations than D2 for 7 RNA rings (7,8,10, 13,14,15 and 19, cognates D, R, L, G, I, Q and A), and vice versa for 9 RNA rings (1,2,11,12,16,17,20,22,24, cognates F, M, P, E, L, K, W, H and Y), according to two tailed sign tests. RNA rings with recent cognate amino acids have more genetic code integration order hypotheses for which D2 produces stronger associations than D1 than presumably more ancient RNA rings in 91.8% of the genetic code integration hypotheses examined. Effects are strongest for the RNY hypothesis [58], the Jiménez-Montaño hypothesis [59], and the tRNA stem primitive code [54]. Note that associations within tRNA structures were very early observed as reflecting the evolution of the genetic code [60]. Hence, variation among RNA rings can be seen as evolutionary, meaning resulting from point mutations that transform one RNA ring into another, mainly for recent RNA rings.

RNA ring evolution with observed nucleotide substitutions
Distance D2 is a simplified approximation of known nucleotide substitution patterns. We used observed substitution rates in natural pseudogenes [61] (Table 5) to estimate D3. D3 should produce better associations than D2 because D3 should reflect observed spontaneous physicochemical substitution rates, as indicated by their proportionality with differences in nucleotide dipole moments [62,63], because pseudogenes are presumably not functional protein coding genes. A major difference between D1 and D2, vs D3, is that D1 and D2 matrices have a diagonal with distance "0 ". This is not the case for D3, because identity rates, X-> X, differ from "1″. D3 (Table 6) associates positively with ranks of amino acid insertion in the genetic code in 58.1% of the tests. Associations were statistically significant at P < 0.05 (two tailed tests) for 158 cases, among which 89.9% were positive correlations. The strongest association is between D3 from RNA ring 13, and the primitive acceptor stem code (r = 0.63, one tailed P = 0.00037). RNA ring 13 is the barycenter of the sequence space formed by the 25 RNA rings, and its predicted anticodon matches glycine. Hence, it is a likely candidate for the most primitive RNA ring. The heatmap in Fig. 2 shows that D3 recovers the classification of tRNAs according to the primitive acceptor stem code. The correlation analysis in Fig. 3 confirms this, and shows the association between D3 and percentages of tRNAs with a primitive code in their acceptor stem is statistically significant (r = − 0.5958, two tailed P = 0.00167).
D3 in Table 6 includes distances for the diagonal (distances of a sequence to itself after replication), while analyses with D1 and D2 did not include this distance, which is "0". In order to enable comparing strengths of associations of genetic code integration hypotheses with distances in Tables 4 and 6, correlations were recalculated with "0" as the distance in diagonal of Table 4. The bias for positive correlations for D2 remained very low (50.37%). Correlations were more positive for D3 than D2 in 86.9% of the comparisons. Hence, observed substitution rates reflect better RNA ring evolution than more or less arbitrary distances.

Spontaneous physico-chemical mutations vs after natural selection
Analyses in the previous section use estimates of sequence distances derived from observed nucleotide substitutions in pseudogenes, which seem to reflect mainly neutral evolution, apparently driven by nucleotide physicochemical properties. Table 5 shows also substitution rates estimated for protein coding genes, which integrate effects of selection on substitution rates. Selection overall weeds out substitutions that tend to cause replacements between amino acids with very different properties (example glycine<− > tryptophan), disproportionally conserving nucleotide substitutions that replace amino acids by other amino acids with very similar properties (example leucine<− > isoleucine). We calculated based on these selection substitution rates D4 and a further distance matrix (not shown). Associations with genetic code integration ranks of amino acids were more positive for D3 than for D4 in 78.2% of the cases. This suggests that RNA ring evolution would have occurred without effects of selection for conserving protein coding properties of RNA rings. It potentially means that RNA ring evolution occurred under prebiotic conditions devoid of natural selection ruled by physicochemical factors, though selection for properties other than conservation of protein coding properties, such as secondary structure formation, could have driven RNA ring evolution.
A further alternative hypothesis is that similarities in codon usages of RNA rings produced a clustering compatible with the tRNA acceptor stem primitive code. We calculated D5, a matrix of similarities between RNA rings according to codon usages. Associations between genetic code integration hypotheses and D5 are much weaker than with D3, excluding that results are due to confounding effects by codon usages.

Discussion
Comparisons between strengths of associations between genetic code inclusion order hypotheses and D1 vs D2 distances among RNA rings imply that ancient RNA rings arose spontaneously, perhaps by template-free polymerization [50,64], and that recent RNA rings evolved from these earlier RNA rings. RNA ring properties examined in earlier analyses [23,29,31,32,41,52,53] coevolve with genetic code inclusion orders of their predicted cognate amino acid. This would mean that RNA rings, despite their in silico design along rational constraints, mimick the evolution of prebiotic and early life biomolecules. Indeed, observations reported here show that RNA ring clusters and distances among them match the evolution of the genetic code's amino acid inclusions. Patterns are strengthened when considering substitution rates as observed in genes, especially pseudogenes as compared to functional protein coding genes, suggesting that RNA ring evolution was driven by physicochemical propensities for nucleotide substitutions, without effects of natural selection against substitutions causing drastic effects at amino acid replacement level. Apparently, RNA ring comparisons match, at least in part, what could be expected if RNA rings with recent cognates evolved by point mutations from those with ancient cognates. These patterns suggest that the hypothetical evolutionary diversification of RNA rings occurred without coding constraints, either because RNA rings were not translated, or because their evolution was disconnected from the function of the coded peptides. Fig. 2 Heatmap of pairwise D3 distances among 25 RNA rings, ordered by increasing rank of mean percentage of tRNAs (with the same cognate amino acid as the RNA ring) with primitive code in their acceptor stem (averaged across kingdoms as per data in [54]). The tRNAs considered in [54] with primitive acceptor stem code have cognates A, D, G, V. The heatmap shows these clusters (blueish colors) with RNA rings 3, 4, 8 and 9 (cognates S, Pyl, R and Sec) Previously published analyses show how RNA ring properties mimick properties of modern genes (part above horizontal line in the scheme in Fig. 4: left of vertical line, protein coding genes; right of vertical line, structural RNAs involved in translation and replication). These observations assess the status of the RNA ring system as a system able to "compute "(or simulate/mimick) prebiotic and early life evolution of major biomolecules, by dealing with ulterior evolution, that presumably occurred downstream of RNA rings. RNAs are probably easier to use for solving RNA-related problems [65] than other problems [66], including origins of life.
This study, and most of our future endeavours, explore evolution upstream of RNA rings, corresponding to the area below the horizontal line in Fig. 4. This consists of processes that produced the RNA rings, from the presumably stereochemical determination of codon-amino acid assignments [67], and biases for short pentamers with nonredundant coding across frames [68] which could have accreted into RNA rings. The aim is to understand what makes RNA rings such useful, and perhaps efficient, simulators of prebiotic evolution, independently of the possibility that RNA rings are the actual primordial sequences.

Conclusions
Distances among theoretical minimal RNA rings converge most with the genetic code integration hypothesis of amino acids derived from the primitive code in tRNA acceptor stems. This convergence is greatest when distances are calculated using the substitution model derived from nuclear pseudogenes. Other substitution models, such as models integrating effects of natural selection on amino acid replacements, produce weaker patterns. Hence, theoretical RNA rings evolved along physicochemical constraints affecting nucleotide substitutions, apparently devoid of effects on their coding properties on amino acid sequences, in line with a pre-translational origin of diversification of RNA rings that would at a later stage become the population of primordial coding and decoding RNAs. The RNA ring system appears as a useful synthetic simulator of prebiotic evolution [69]. This and future Fig. 3 Distance D3 of RNA rings to RNA ring 7 for cognate Asp as a function of the percentage of tRNAs with a primitive code in their acceptor stem [54] (averaged across kingdoms) with the same cognate as the RNA ring. Datapoints for cognates with tRNAs considered in [54] as belonging to those with an acceptor stem primitive code (A,D,G and V) are filled, others are empty circles. The trend line indicates that D3 decreases with the presence of a primitive code in tRNA acceptor stems (r = − 0.596, two tailed P = 0.00167) analyses focus on decomposing the processes and properties of RNA rings that make RNA rings such computational tools.

Methods
The 25 theoretical minimal RNA rings are aligned considering their coding frame as presented in Table 1. This alignment does not enable insertions and/or deletions. Each RNA ring has a candidate cognate amino acid defined by its presumed anticodon [12], as it was predicted from similarities with ancestral tRNAs [1,2]. All pairwise distances among RNA ring nucleotide sequences are calculated according to 5 models, D1-D5. The model for D1 considers that for identical nucleotides in an alignment distance "0", and distance "1" when the presumed homologous nucleotides in Table 1 differ. D1 is the sum of the distances calculated along the complete alignment length, which is 22 nucleotide long. The model for D2 considers that when nucleotides that differ in the alignment would result from transitions, the distance should be considered as "0.5". This is the case when the nucleotides are both purines (A and G), because these two nucleotides are relatively similar. Similarly, when both different nucleotides in the alignment are pyrimidines (C and T/U), the distance should be "0.5". For pairs of nucleotides that would result from transversions (A < ->C, A < ->T/U, C < ->G and G < ->T/ U), the distance is "1" as for D1. Table 2 presents the alignment between two specific RNA rings (13 and 25) and the calculations of D1 and D2 between this pair of RNA rings.
The model for D3 develops the principle that distances between non-identical nucleotides differ according to which nucleotide pair is considered. It is based on frequencies of observed nucleotide substitutions in pseudogenes, presumably neutrally evolving sequences. Model D4 uses also observed substitution frequencies for estimating distances between specific nucleotide pairs, but as these were observed for protein coding genes, meaning that substitution frequencies are affected by natural selection due to constraints on protein function. Empirical data on substitution frequencies for D3 and D4 are from Gojobori et al. [61]. Note that in models D3 and D4, identical nucleotides do not get distance "0", and the distance varies according to which nucleotide is conserved, reflecting the mutability of that nucleotide.
The model for D5 compares codon usages of RNA rings and tests whether patterns of evolution could be confounded by similarities between RNA rings in codon usages, independently from the exact sequence alignment in Table 1. In this context, codon usage of each RNA ring gets value "0 "for a codon if it is not used in that RNA ring, and value "1 "if it is used in that ring. These data were used to calculate pairwise similarities among all combinations of two RNA rings, using Pearson's correlation coefficient r. These correlation coefficients are similarities, with the maximal similarity at r = 1. In order to obtain a distance, we used 1-r as distance For each distance matrix D1 to D5, further analyses consider separately each row of the matrix, which corresponds to distances from a given focal RNA ring to each of the remaining "target" RNA rings. Correlations between each of these 25 sets of distances and the genetic code integration order of the cognate amino acid of the target RNA ring are calculated. This is done for each of the genetic code integration hypotheses listed by Trifonov [51]. Overall, results confirm that distances increase with the genetic code integration rank, and this trend is strongest for D3, as compared to all other distances D1, D2, D4 and D5. Specifically, the genetic code evolutionary hypothesis that produces the strongest associations between the various distances (D1-5 confounded) is the evolutionary hypothesis derived from the primitive code in tRNA acceptor stems [54].
Abbreviations A: One letter abbreviation for amino acid alanine; C: One letter abbreviation for amino acid cysteine; CD: Coding sequence; CDA: Codon directional asymmetry; D: One letter abbreviation for amino acid aspartic acid; D1: Hamming distance between RNA rings; D2: Hamming evolutionary distance between RNA rings distinguishing transitions and transversions; D3: Evolutionary distances among RNA rings, using frequencies of substitutions as observed in pseudogenes; D4: Evolutionary distances among RNA rings, using frequencies of substitutions as observed in protein coding genes; D5: Distance between RNA rings based on differences in codon usages; E: One letter abbreviation for amino acid glutamic acid; F: One letter abbreviation for amino acid phenylalanine; G: One letter abbreviation for amino acid glycine; GNN: Codons with G at position 1; H: One letter abbreviation for amino acid histidine; I: One letter abbreviation for amino acid isoleucine; K: One letter abbreviation for amino acid lysine; L: One letter abbreviation for amino acid leucine; M: One letter abbreviation for amino acid methionine; N: One letter abbreviation for amino acid asparagine; P: One letter abbreviation for amino acid proline; Pyl: Abbreviation for amino acid pyrrolysine; Q: One letter abbreviation for amino acid glutamine; R: One letter abbreviation for amino acid arginine; RNY: Codons a purine (R), any nucleotide (N) and a pyrimidine (Y) at positions 1, 2, and 3, respectively.; S: One letter abbreviation for amino acid serine; Sec: Abbreviation for amino acid selenocysteine; T: One letter abbreviation for amino acid threonine; V: One letter abbreviation for amino acid valine; W: One letter abbreviation for amino acid tryptophan; Y: One letter abbreviation for amino acid tyrosine