Skip to main content

Global abundance of short tandem repeats is non-random in rodents and primates



While of predominant abundance across vertebrate genomes and significant biological implications, the relevance of short tandem repeats (STRs) (also known as microsatellites) to speciation remains largely elusive and attributed to random coincidence for the most part. Here we collected data on the whole-genome abundance of mono-, di-, and trinucleotide STRs in nine species, encompassing rodents and primates, including rat, mouse, olive baboon, gelada, macaque, gorilla, chimpanzee, bonobo, and human. The collected data were used to analyze hierarchical clustering of the STR abundances in the selected species.


We found massive differential STR abundances between the rodent and primate orders. In addition, while numerous STRs had random abundance across the nine selected species, the global abundance conformed to three consistent < clusters>, as follows: <rat, mouse>, <gelada, macaque, olive baboon>, and <gorilla, chimpanzee, bonobo, human>, which coincided with the phylogenetic distances of the selected species (p < 4E-05). Exceptionally, in the trinucleotide STR compartment, human was significantly distant from all other species.


Based on hierarchical clustering, we propose that the global abundance of STRs is non-random in rodents and primates, and probably had a determining impact on the speciation of the two orders. We also propose the STRs and STR lengths, which predominantly conformed to the phylogeny of the selected species, exemplified by (t)10, (ct)6, and (taa4). Phylogenetic and experimental platforms are warranted to further examine the observed patterns and the biological mechanisms associated with those STRs.

Peer Review reports


Speciation is the evolutionary process by which populations evolve to become distinct species. Several models and theories have been proposed for this highly complicated process, including gene regulatory networks, community ecology, and mating preferences (for a review see [1]). Natural selection may be considered a major outcome associated with, and linking the above propositions. With an exceptionally high degree of polymorphism and plasticity, short tandem repeats (STRs) (also known as microsatellites/simple sequence repeats) may be a spectacular source of variation required for speciation and evolution [2,3,4,5,6]. The impact of STRs on speciation is supported by their various functional implications in gene expression, alternative splicing, and translation [4, 7,8,9,10,11,12,13].

STRs are a source of rapid and continuous morphological evolution[14], for example, in the evolution of facial length in mammals[15]. These highly evolving genetic elements may also be ideal responsive elements to fluctuating selective pressures. A role in evolutionary selection and adaptation is consistent with deep evolutionary conservation of some STRs, as “tuning knobs”, including several in genes with neurological and neurodevelopmental function[16].

While a limited number of studies indicate that purifying selection and drift can shape the structure of STRs at the inter- and intra-species levels [17,18,19,20,21,22], the global abundance of STRs at the crossroads of speciation remains largely unknown.

Mononucleotide and dinucleotide STRs are the most common categories of STRs in the vertebrate genomes[23, 24]. In addition to their association with frameshifts in coding sequences and pathological [25] and possibly evolutionary consequences, recent evidence indicates surprising functions for the mononucleotide STRs, such as their proposed role in translation initiation site selection[12, 26]. Several groups have found evidence on the involvement of a number of dinucleotide STRs in gene regulation, speciation, and evolution[4, 23, 27,28,29,30]. Trinucleotide STRs are frequently linked to human neurological disorders, most of which are specific to this species[31, 32].

Here, we analyzed the global hierarchical clustering of all types of mono-, di-, and trinucleotide STRs in nine mammalian species, encompassing primates and rodents, Those species belong to the superordinal group of Euarchontoglires [33], and form three distinct and unambiguous phylogenetic < clusters>. The aim of this analysis was to examine whether the global abundance of STRs in the selected species conforms to the phylogenetic < clusters > of the selected species, or not.

Materials and methods

Species and whole-genome sequences

The UCSC genome browser ( was used to download and analyze the latest genome assemblies of nine species as follows (genome sizes are indicated following each species): rat (Rattus norvegicus): 2,647,915,728, mouse (Mus musculus): 2,728,222,451, gelada (Theropithecus gelada): 2,889,630,685, olive baboon (Papio anubis): 2,869,821,163, macaque (Macaca mulatta): 2,946,843,737, gorilla (Gorilla gorilla gorilla): 3,063,362,754, chimpanzee (Pan troglodytes): 3,050,398,082, bonobo (Pan paniscus): 3,203,531,224, and human (Homo sapiens): 3,099,706,404. Those species encompassed rodents: rat and mouse, Old World monkeys: gelada, olive baboon, macaque, and great apes: gorilla, bonobo, chimpanzee, human.

Extraction of STRs from genomic sequences

The whole-genome abundance of mononucleotide STRs of ≥ 10-repeats, dinucleotide STRs of ≥ 6-repeats, and trinucleotide STRs of ≥ 4-repeats were studied in the nine selected species. To that end, we designed a software package in Java ( All possibilities of mononucleotide motifs, consisting of A, C, T, and G, all possibilities of dinucleotide motifs, consisting of AC, AG, AT, CA, CG, CT, GA, GC, GT, TA, TC, and TG, and all possibilities of trinucleotide motifs, consisting of AAC, AAT, AAG, ACA, ACC, ACT, ACG, ATA, ATC, ATT, ATG, AGA, AGC, AGT, AGG, CAA, CAC, CAT, CAG, CCA, CCT, CCG, CTA, CTC, CTT, CTG, CGA, CGC, CGT, CGG, TAA, TAC, TAT, TAG, TCA, TCC, TCT, TCG, TTA, TTC, TTG, TGA, TGC, TGT, TGG, GAA, GAC, GAT, GAG, GCA, GCC, GCT, GCG, GTA, GTC, GTT, GTG, GGA, GGC, and GGT were analyzed.

The written program calculated based on perfect (pure) STRs. The algorithm started from an initial point, which was the first nucleotide of each genome, and iteratively repeated a series of steps during walking on the genome, nucleotide by nucleotide. In the first step, it investigated a window frame of 2*N, where 2 was the definition of tandem repeats i.e., two identical continuous sequences, and N was the length of the STR core. If the first half of the sequence inside the window was not equal to the second half, the algorithm moved one nucleotide forward. If equal, the algorithm checked the nucleotides, and this process continued until all identical continuous nucleotides, which were the same as the core were found. The final selected sequence- M*N- was introduced as a new STR, which had a core with a length of N and M repeats. All steps were repeated to find new STRs from the end of the previous STR. We repeated the algorithm for different values of N (N was between 1 and 3 in each genome to detected mono, di, and trinucleotide STRs).

Whole-genome STR data aggregation, abundance, and hierarchical cluster analysis across species

Whole-genome chromosome-by-chromosome data were aggregated and analyzed in the nine species. STR abundances across the selected species were obtained and depicted by boxplot diagrams and hierarchical clustering, using boxplot and hclust packages[34] in R, respectively. Boxplots illustrate abundance differences among segments across the selected species, and hierarchical clustering plots demonstrate the level of similarity and differences across the obtained abundances. The input data to these packages were numerical arrays . Each array consisted of a number of columns, each column corresponding to the STR abundance in different chromosomes. It should be noted that the focus of our analysis was to evaluate the global abundance of STRs across those species, regardless of the homologous regions.

Statistical analysis

The STR abundances across the nine selected species were compared by repeated measurements analysis, using one and two-way ANOVA tests. These analyses were confirmed by nonparametric tests.


Global abundance of mono, di, and trinucleotide STRs coincides with the phylogenetic distance of the nine selected species

Whole-genome data was collected on the abundance of mononucleotide STRs across the nine species (Table 1). We found massive expansion of the mononucleotide STR compartment in all primate species versus rat and mouse. Hierarchical clustering yielded three < clusters > as follows: <rat, mouse>, <gelada, olive baboon, macaque>, and < gorilla, chimpanzee, bonobo, human>, which coincided with the phylogenetic distance of the nine selected species (P = 6.3E-09) (Fig. 1) namely < rodents>, <Old World monkeys>, and < great apes>.

Table 1 Mononucleotide STR abundance across the nine selected species
Fig. 1
figure 1

Whole-genome mononucleotide STR abundance in the nine selected species. Global incremented pattern was observed in the primate species versus rodents (left graph). The overall hierarchical clustering yielded three <clusters>, which conformed to <rodents>, <Old World monkeys>, and <great apes> (right graph).

The whole-genome STR abundances from aggregated chromosome-by-chromosome analysis in the dinucleotide category (Table 2) was decremented in primates versus rodents. Similar to the mononucleotide STR compartment, the dinucleotide STR compartment conformed to the genetic distance among the three < clusters > of species (P = 7.1E-08) (Fig. 2).

Table 2 Dinucleotide STR abundance across the nine selected species
Fig. 2
figure 2

Whole-genome dinucleotide STR abundance in the nine selected species. Global decremented patterns were observed in all primate species versus mouse and rat (left gragh). The global pattern conformed to the three <clusters> across the nine species and their phylogenetic distance (right graph)

There was global shrinkage of the trinucleotide STR compartment in primates versus rodents (P = 3.8E-05) (Table 3; Fig. 3). Remarkably, human stood out among all other species in the trinucleotide STR compartment.

Table 3 Trinucleotide STR abundance across the nine selected species
Fig. 3
figure 3

Whole-genome trinucleotide STR abundance in the nine selected species. While global decremented patterns were observed in primates versus rodents (left graph), human stood out in this category, in comparison to all other species (right graph)

Differential abundance patterns of various STRs and STR lengths across rodents and primates

Numerous STRs and STR lengths across the mono, di, and trinucleotide STR categories conformed to the phylogenetic distances of the nine selected species, for example, in the instance of T/A mononucleotides of 10, 11, and 12 repeats, which were the most abundant STRs across all nine species (Fig. 4). In another example, (ct)6 and (taa)4 conformed to the phylogeny of the studied species in the di and trinucleotide STR categories, respectively.

Fig. 4
figure 4

Example of STRs and STR lengths, abundance of which coincided with the phylogeny of the nine selected species. Three STRs are depicted as examples for each of mono, di, and trinucleotide categories. Data from all studied STRs are available at:

On the other hand, numerous STRs did not follow perfect phylogenetic patterns, such as (C)10, (at)8, and (ttg)4 (Fig. 5). Hierarchical clusters of all studied STRs across the three categories are available at:

Fig. 5
figure 5

Example of STRs and STR lengths, abundance of which appeared to be predominantly random across the nine selected species. Three STRs are depicted as examples for each of mono, di, and trinucleotide categories. Data from all studied STRs are available at:


While the mechanisms underlying speciation are extremely complicated and largely based on theories and models, the impact of genetics seems to be significant in respect of adaptation, gene flow, and natural selection. In fact, natural selection may be a central converging point of the evolutionary propositions for speciation. However, the various mechanisms involved in speciation have different impact on natural selection, and it is the net effect which may ultimately result in the emergence of a new species.

As one of the most abundant genetic elements in various animal genomes, it is largely unknown whether at the crossroads of speciation, STRs evolved as a result of purifying selection, genetic drift, and/or in a directional manner.

Here, we selected multiple species across rodents and primates, and investigated the clustering patterns of all possible types and lengths of mononucleotides, dinucleotide, and trinucleotide STRs on the whole-genome scale in those species. Hierarchical clustering yielded clusters that predominantly conformed to the phylogenetic distances of the selected species. Hierarchical clustering is an unsupervised clustering method that is used to group data. This algorithm is unsupervised because it uses random, unlabeled datasets. As the number of clusters increases, the accuracy of the hierarchical clustering algorithm improves.

Our findings may be of significance in a number of aspects. Firstly, there were significant differential abundances separating rodents from primates, for example, massive decremented abundance of dinucleotide and trinucleotide STRs in primates versus the rodent species, and massive incremented abundance of mononucleotide STRs in primates versus rodents. Secondly, the three major < clusters > obtained from global hierarchical cluster analysis matched the phylogeny of the three < clusters > of species, i.e., <rodents>, <Old World monkeys>, and < great apes>. It is possible that there are mathematical channels/thresholds required for the abundance of STRs in various orders. This is in line with the hypothesis that STRs function as scaffolds for biological computers[35]. In addition, our data indicate that various STRs and STR lengths behave differently with respect to their colossal abundance. Not all the studied STRs conformed to the phylogenetic distances of the nine selected species. We hypothesize that those which did, had a link with the speciation of those species, whereas those which did not, apparently followed random patterns for the most part. The potential effect of STRs in non-genic regions is largely unknown. However, when located at genic regions, various STRs and repeat lengths can potentially recruit transcription factors (TFs), which differ in qualitative and quantitative terms ( [36]. Those various TF sets may differentially regulate expression of the relevant genes during the process of evolution. For example, T-blocks of 10, 12, and 14-repeats recruit various combinations of FOXD3, HNF-3, and Hb (Fig. 6). Interestingly, (T)10 and (T)12 were among the mononucleotide STRs, which conformed to the phylogenetic distance of the nine species (Fig. 4), and (t)14 did not ( The concept of various TF sets stands for other STRs as well. For example, (ct)6 conforms to the phylogenetic clusters, and recruits a number of TFs, whereas (ct)7, which does not conform to those clusters, recruits quantitatively different set of those TFs (Fig. 7).

Fig. 6
figure 6

Potential recruitment of qualitatively and quantitatively different TFs to various lengths of (T)-repeats. (T)10 (A) and (T)12 (B) conformed to the phylogenetic < clusters>, whereas (T)14 (C) did not. Differential recruitment of TFs may differentially regulate the relevant genes in evolutionary processes

Fig. 7
figure 7

Potential differential TF recruitments to various lengths of (ct)6 A) and (ct)7 B). Those two lengths result in alternative quantitative binding of three TFs. (ct)6 conformed and (ct)7 did not conform to the phylogenetic < clusters>

Mononucleotide STRs impact various processes, such as gene expression, translation alterations, and frameshifts of various proteins, which may have evolutionary and pathological consequences[12, 25]. They can overlap with G4 structures, many of which associate with evolutionary consequences[37].

In a number of instances, dinucleotide STRs located in the protein-coding gene core promoters have been subject to contraction in the process of human and non-human primate evolution[38]. A number of those STRs are identical in formula in primates versus non-primates, and the genes linked to those STRs are involved in characteristics that have diverged primates from other mammals, such as craniofacial development, neurogenesis, and spine morphogenesis. Structural variants are enriched near genes that diverged in expression across great apes[39], and genes with STRs in their regulatory regions are more divergent in expression than genes with fixed or no STRs[40]. STR variants are likely to have epistatic interactions, which can have significant consequences in complex traits, in human as well as model organisms[6, 41].

Trinucleotide STRs are predominantly focused on in human because of their link with several neurological disorders[42,43,44,45]. We found an exceptional global hierarchical distance between human and all other species in that compartment. In view of the fact that most of the phenotypes attributed to trinucleotide STRs are human-specific in nature, it is conceivable that their evolution is also significantly distant from all other species studied.

The observed abundances were independent of the genome sizes of the selected species. For example in the instances of di- and trinucleotide STRs, we observed higher abundances in rodents versus primates despite the smaller genome sizes of the former. These findings are in line with the previous reports of lack of relationship between genome size and abundance of STRs[46, 47].

It should be noted that this is a pilot study based on hierarchical clustering, and future studies are warranted to further examine our hypothesis, using phylogenetic platforms and additional orders and species. Functional studies are also warranted to examine the biological impact of the relevant STRs.


We propose that the global abundance of STRs is non-random across rodents and primates. We also propose the STRs and STR lengths, which predominantly conformed to the phylogenetic distances of those species, such as (t)10, (ct)6, and (taa4). Additional species encompassing other orders and phylogenetic platforms are warranted to further examine this proposition.


This research was a pilot study based on hierarchical clustering of the collected data in a number of mammalian species. Phylogenetic platforms and additional orders of species are warranted to further examine our hypothesis.

Data Availability

Raw data are available at: and



Short tandem repeat


Transcription factor


  1. Gavrilets S. Models of speciation: where are we now? J Hered. 2014;105(S1):743–55.

    Article  PubMed  Google Scholar 

  2. Mohammadparast S, Bayat H, Biglarian A, Ohadi M. Exceptional expansion and conservation of a CT-repeat complex in the core promoter of PAXBP1 in primates. Am J Primatol. 2014;76(8):747–56.

    Article  CAS  PubMed  Google Scholar 

  3. Bushehri A, Barez MRM, Mansouri SK, Biglarian A, Ohadi M. Genome-wide identification of human-and primate-specific core promoter short tandem repeats. Gene. 2016;587(1):83–90.

    Article  CAS  PubMed  Google Scholar 

  4. Nikkhah M, Rezazadeh M, Khorshid HRK, Biglarian A, Ohadi M. An exceptionally long CA-repeat in the core promoter of SCGB2B2 links with the evolution of apes and Old World monkeys. Gene. 2016;576(1):109–14.

    Article  CAS  PubMed  Google Scholar 

  5. Reinar WB, Lalun VO, Reitan T, Jakobsen KS, Butenko MA. Length variation in short tandem repeats affects gene expression in natural populations of Arabidopsis thaliana. Plant Cell. 2021;33(7):2221–34.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Press MO, Carlson KD, Queitsch C. The overdue promise of short tandem repeat variation for heritability. Trends Genet. 2014;30(11):504–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Jakubosky D, D’Antonio M, Bonder MJ, Smail C, Donovan MKR, Greenwald WWY, Matsui H, D’Antonio-Chronowska A, Stegle O, Smith EN. Properties of structural variants and short tandem repeats associated with gene expression and complex traits. Nat Commun. 2020;11(1):1–15.

    Article  Google Scholar 

  8. Valipour E, Kowsari A, Bayat H, Banan M, Kazeminasab S, Mohammadparast S, Ohadi M. Polymorphic core promoter GA-repeats alter gene expression of the early embryonic developmental genes. Gene. 2013;531(2):175–9.

    Article  CAS  PubMed  Google Scholar 

  9. Ranathunge C, Wheeler GL, Chimahusky ME, Perkins AD, Pramod S, Welch ME. Transcribed microsatellite allele lengths are often correlated with gene expression in natural sunflower populations. Molecular Ecology 2020.

  10. Press MO, Hall AN, Morton EA, Queitsch C. Substitutions are boring: Some arguments about parallel mutations and high mutation rates. Trends Genet. 2019;35(4):253–64.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Fotsing SF, Margoliash J, Wang C, Saini S, Yanicky R, Shleizer-Burko S, Goren A, Gymrek M. The impact of short tandem repeat variation on gene expression. Nat Genet. 2019;51(11):1652–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Arabfard M, Kavousi K, Delbari A, Ohadi M. Link between short tandem repeats and translation initiation site selection. Hum Genomics. 2018;12(1):47.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Yap K, Mukhina S, Zhang G, Tan JSC, Ong HS, Makeyev EV. A short tandem repeat-enriched RNA assembles a nuclear compartment to control alternative splicing and promote cell survival. Mol Cell. 2018;72(3):525–40.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Fondon JW, Garner HR: Molecular origins of rapid and continuous morphological evolution. Proceedings of the National Academy of Sciences 2004, 101(52):18058–18063.

  15. Wren JD, Forgacs E, Fondon Iii JW, Pertsemlidis A, Cheng SY, Gallardo T, Williams RS, Shohet RV, Minna JD, Garner HR. Repeat polymorphisms within gene regions: phenotypic and evolutionary implications. Am J Hum Genet. 2000;67(2):345–56.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. King DG. Evolution of simple sequence repeats as mutable sites. Tandem Repeat Polymorphisms 2012:10–25.

  17. Srivastava S, Avvaru AK, Sowpati DT, Mishra RK. Patterns of microsatellite distribution across eukaryotic genomes. BMC Genomics. 2019;20(1):153.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Pavlova A, Gan HM, Lee YP, Austin CM, Gilligan DM, Lintermans M, Sunnucks P. Purifying selection and genetic drift shaped Pleistocene evolution of the mitochondrial genome in an endangered Australian freshwater fish. Heredity. 2017;118(5):466–76.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Jorde PE, Søvik G, Westgaard JI, Albretsen J, André C, Hvingel C, Johansen T, Sandvik AD, Kingsley M, Jørstad KE. Genetically distinct populations of northern shrimp, Pandalus borealis, in the North Atlantic: adaptation to different temperatures as an isolation factor. Mol Ecol. 2015;24(8):1742–57.

    Article  PubMed  Google Scholar 

  20. Legrand D, Chenel T, Campagne C, Lachaise D, Cariou ML. Inter-island divergence within Drosophila mauritiana, a species of the D. simulans complex: Past history and/or speciation in progress? Mol Ecol. 2011;20(13):2787–804.

    Article  CAS  PubMed  Google Scholar 

  21. Sun G, McGarvey ST, Bayoumi R, Mulligan CJ, Barrantes R, Raskin S, Zhong Y, Akey J, Chakraborty R, Deka R. Global genetic variation at nine short tandem repeat loci and implications on forensic genetics. Eur J Hum Genet. 2003;11(1):39–49.

    Article  CAS  PubMed  Google Scholar 

  22. Abe H, Gemmell NJ. Evolutionary footprints of short tandem repeats in avian promoters. Sci Rep. 2016;6(1):1–11.

    Article  CAS  Google Scholar 

  23. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W: Initial sequencing and analysis of the human genome. 2001.

  24. Fan H, Chu J-Y. A brief review of short tandem repeat mutation. Genom Proteom Bioinform. 2007;5(1):7–14.

    Article  CAS  Google Scholar 

  25. Mo HY, Lee JH, Kim MS, Yoo NJ, Lee SH. Frameshift Mutations and Loss of Expression of CLCA4 Gene are Frequent in Colorectal Cancers With Microsatellite Instability. Appl Immunohistochem Mol Morphology. 2020;28(7):489.

    Article  CAS  Google Scholar 

  26. Maddi AMA, Kavousi K, Arabfard M, Ohadi H, Ohadi M. Tandem repeats ubiquitously flank and contribute to translation initiation sites. BMC Genomic Data. 2022;23(1):59.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Corney BPA, Widnall CL, Rees DJ, Davies JS, Crunelli V, Carter DA. Regulatory architecture of the neuronal Cacng2/Tarpγ2 gene promoter: multiple repressive domains, a polymorphic regulatory short tandem repeat, and bidirectional organization with co-regulated lncRNAs. J Mol Neurosci. 2019;67(2):282–94.

    Article  CAS  PubMed  Google Scholar 

  28. Emamalizadeh B, Movafagh A, Darvish H, Kazeminasab S, Andarva M, Namdar-Aligoodarzi P, Ohadi M. The human RIT2 core promoter short tandem repeat predominant allele is species-specific in length: a selective advantage for human evolution? Mol Genet Genomics. 2017;292(3):611–7.

    Article  CAS  PubMed  Google Scholar 

  29. Haasl RJ, Johnson RC, Payseur BA. The effects of microsatellite selection on linked sequence diversity. Genome Biol Evol. 2014;6(7):1843–61.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Yim J-J, Adams AA, Kim JH, Holland SM. Evolution of an intronic microsatellite polymorphism in Toll-like receptor 2 among primates. Immunogenetics. 2006;58(9):740–5.

    Article  CAS  PubMed  Google Scholar 

  31. Annear DJ, Vandeweyer G, Elinck E, Sanchis-Juan A, French CE, Raymond L, Kooy RF. Abundancy of polymorphic CGG repeats in the human genome suggest a broad involvement in neurological disease. Sci Rep. 2021;11(1):1–11.

    Article  Google Scholar 

  32. Tang H, Kirkness EF, Lippert C, Biggs WH, Fabani M, Guzman E, Ramakrishnan S, Lavrenko V, Kakaradov B, Hou C. Profiling of short-tandem-repeat disease alleles in 12,632 human whole genomes. Am J Hum Genet. 2017;101(5):700–15.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Kumar V, Hallström BM, Janke A. Coalescent-based genome analyses resolve the early branches of the euarchontoglires. PLoS ONE. 2013;8(4):e60019.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Murtagh F, Legendre P. Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion? J Classif. 2014;31(3):274–95.

    Article  Google Scholar 

  35. Herbert A: Simple Repeats as Building Blocks for Genetic Computers. Trends in Genetics 2020.

  36. Farré D, Roset R, Huerta M, Adsuara JE, Roselló L, Albà MM, Messeguer X. Identification of patterns in biological sequences at the ALGGEN server: PROMO and MALGEN. Nucleic Acids Res. 2003;31(13):3651–3.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Sawaya S, Bagshaw A, Buschiazzo E, Kumar P, Chowdhury S, Black MA, Gemmell N. Microsatellite tandem repeats are abundant in human promoters and are associated with regulatory elements. PLoS ONE. 2013;8(2):e54710.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Ohadi M, Valipour E, Ghadimi-Haddadan S, Namdar‐Aligoodarzi P, Bagheri A, Kowsari A, Rezazadeh M, Darvish H, Kazeminasab S. Core promoter short tandem repeats as evolutionary switch codes for primate speciation. Am J Primatol. 2015;77(1):34–43.

    Article  CAS  PubMed  Google Scholar 

  39. Kronenberg ZN, Fiddes IT, Gordon D, Murali S, Cantsilieris S, Meyerson OS, Underwood JG, Nelson BJ, Chaisson MJP, Dougherty ML. High-resolution comparative analysis of great ape genomes. Science 2018, 360(6393).

  40. Sonay TB, Carvalho T, Robinson MD, Greminger MP, Krützen M, Comas D, Highnam G, Mittelman D, Sharp A, Marques-Bonet T. Tandem repeat variation in human and great ape populations and its impact on gene expression divergence. Genome Res. 2015;25(11):1591–9.

    Article  CAS  Google Scholar 

  41. Bagshaw ATM, Horwood LJ, Fergusson DM, Gemmell NJ, Kennedy MA. Microsatellite polymorphisms associated with human behavioural and psychological phenotypes including a gene-environment interaction. BMC Med Genet. 2017;18(1):1–12.

    Article  Google Scholar 

  42. Sundblom J, Niemelä V, Ghazarian M, Strand A-S, Bergdahl IA, Jansson J-H, Söderberg S, Stattin E-L. High frequency of intermediary alleles in the HTT gene in Northern Sweden-The Swedish Huntingtin Alleles and Phenotype (SHAPE) study. Sci Rep. 2020;10(1):1–7.

    Article  Google Scholar 

  43. Baker EK, Arpone M, Kraan C, Bui M, Rogers C, Field M, Bretherton L, Ling L, Ure A, Cohen J. FMR1 mRNA from full mutation alleles is associated with ABC-C FX scores in males with fragile X syndrome. Sci Rep. 2020;10(1):1–8.

    Article  Google Scholar 

  44. Zhou X, Wang C, Ding D, Chen Z, Peng Y, Peng H, Hou X, Wang P, Ye W, Li T. Analysis of (CAG) n expansion in ATXN1, ATXN2 and ATXN3 in Chinese patients with multiple system atrophy. Sci Rep. 2018;8(1):1–5.

    Google Scholar 

  45. Zhang Q, Yang M, Sørensen KK, Madsen CS, Boesen JT, An Y, Peng SH, Wei Y, Wang Q, Jensen KJ. A brain-targeting lipidated peptide for neutralizing RNA-mediated toxicity in Polyglutamine Diseases. Sci Rep. 2017;7(1):1–13.

    Google Scholar 

  46. Neff BD, Gross MR. Microsatellite evolution in vertebrates: inference from AC dinucleotide repeats. Evolution. 2001;55(9):1717–33.

    Article  CAS  PubMed  Google Scholar 

  47. Park JY, An Y-R, An C-M, Kang J-H, Kim EM, Kim H, Cho S, Kim J. Evolutionary constraints over microsatellite abundance in larger mammals as a potential mechanism against carcinogenic burden. Sci Rep. 2016;6(1):1–5.

    Google Scholar 

Download references


Not applicable.


Not applicable.

Author information

Authors and Affiliations



MA performed and coordinated the bioinformatics analyses. MS performed the biostatistics analysis. YHN, IA, and AMAM contributed to data collection. KK contributed to coordination. MO conceived and supervised the project, and wrote the manuscript with input from all authors.

Corresponding author

Correspondence to Mina Ohadi.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

Authors have no conflict of interest to declare.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Arabfard, M., Salesi, M., Nourian, Y.H. et al. Global abundance of short tandem repeats is non-random in rodents and primates. BMC Genom Data 23, 77 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Global
  • Short tandem repeat
  • Abundance
  • Non-random
  • Rodent
  • Primate
  • Hierarchical clustering