Genetic diversity and structure of Chinese grass shrimp, Palaemonetes sinensis, inferred from transcriptome-derived microsatellite markers

Background The Chinese grass shrimp, Palaemonetes sinensis, is an economically important freshwater shrimp in China, and the study of genetic diversity and structure can positively contribute to the exploration of germplasm resources and assist in the understanding of P. sinensis aquaculture. Microsatellite markers are widely used in research of genetic backgrounds since it is considered an important molecular marker for the analyses of genetic diversity and structure. Hence, the aim of this study was to evaluate the genetic diversity and structure of wild P. sinensis populations in China using the polymorphic microsatellite makers from the transcriptome. Results Sixteen polymorphic microsatellite markers were developed for P. sinensis from transcriptome, and analyzed for differences in genetic diversity and structure in multiple wild P. sinensis populations in China. Totally of 319 individual shrimps from seven different populations were genotyped to find that allelic polymorphisms varied in two to thirteen alleles seen in the entire loci. Compared to other populations analyzed, the two populations including LD and SJ showed lower genetic diversity. Both the genetic distance (D) and Wrights fixation index (FST) comparing any two populations also indicated that LD and SJ populations differed from the other five populations. An UPGMA tree analysis showed three main clusters containing SJ, LD and other populations which were also confirmed using STRUCTURE analysis. Conclusion This is the first study where polymorphic microsatellite markers from the transcriptome were used to analyze genetic diversity and structures of different wild P. sinensis populations. All the polymorphic microsatellite makers are believed useful for evaluating the extent of the genetic diversity and population structure of P. sinensis. Compared to the other five populations, the LD and SJ populations exhibited lower genetic diversity, and the genetic structure was differed from the other five populations. Therefore, they needed to be protected against further declines in genetic diversity. The other five populations, LP, LA, LSL, LSY and LSH, are all belonging to Liaohe River Drainage with a relatively high genetic diversity, and hence can be considered as hot spots for in-situ conservation of P. sinensis as well as sources of desirable alleles for breeding values.


Background
Palaemonetes sinensis (Sollaud, 1911), also known as Chinese grass shrimp, a small freshwater shrimp belonging to the Palaemonidae family distributed in China, Myanmar, Vietnam, Japan, southeastern Siberia and Sakhalin and has both ecological and ornamental value [1][2][3]. In China, P. sinensis is commonly distributed in the Liaoning, Jilin, Heilongjiang, Hebei, Jiangsu, Fujian and Yunnan provinces [1]. P. sinensis is an attractive shrimp due to its striking appearance, appealing flavor [4] and use as bait for sport fishing [3]. Due to its delicious meat and high nutritional value, P. sinensis is very popular in both domestic and foreign markets [3][4][5][6]. However, as a consequence of environmental pollution and overharvesting, the P. sinensis population has gradually diminished. As such, there has been interest in P. sinensis aquaculture to potentially alleviate fishing pressure on the wild population by meeting consumer demand with farmed shrimp and through stock enhancement. Nevertheless, there are only a handful of studies investigating P. sinensis morphology and more work is needed to understand the biology of this species [3,6]. At present, studies on the P. sinensis using molecular biology are rarely reported, only some studies have used microsatellite markers in other species of the Palaemonidae family, including Macrobrachium rosenbergii [7] and Macrobrachium nipponense [8]. In the past decade, microsatellite markers have been widely used in the genetic background research [9], and have been considered as very important molecular genetic markers for the analysis of genetic diversity, genetic structure and construction of genetic linkage maps. With the development of sequencing technology, vast quantities of transcriptome information were obtained, including of the identification of microsatellite markers. Ma et al. [10] identified 129 polymorphic microsatellite markers from transcriptomic analysis and analyzed their relationship to growth performance in Mud Crab (Scylla paramamosain). Research in related species of snapping shrimps with highly duplicated genomes proved that microsatellites acquired from the transcriptome were more likely to work effectively than those developed through traditional methods from the genome [11]. However, compared to other aspects of massive transcriptome data, the roles of transcriptome-derived microsatellite markers have not been investigated thoroughly. In prior work, a total of 17,019 microsatellite markers were obtained from the transcriptome of the P. sinensis [4]. Moreover, there are no studies investigating molecular marker based genetic diversity of P. sinensis, which has the potential to aid in the conservation and improvement of this shrimp species. In view of the broad distribution of its wild populations, the questions over whether they are genetically similar or do exhibit differences attributable to their locations or drainage basins are still unsolved. The objectives of the current study were to validate variations of transcriptome-derived microsatellite markers and to analyze their underlying genetic background of different wild P. sinensis populations. It is hoped that this work could make a positive contribution to the molecular genetic analyses of P. sinensis populations and eventually serve as a basis for the improvement and sustainable conservation of P. sinensis aquaculture in China.

Polymorphisms of microsatellite loci
Seven wild populations of P. sinensis, totaling 319 individual shrimps (Fig. 1, Table 1), were screened for 16 microsatellite loci which were polymorphic in all populations using the 0.95 allele frequency criterion. The characteristics of the 16 microsatellite loci were summarized in Table 2. The most polymorphic locus was c1747_g1_i1 among the 16 loci, with the highest N A (13), N E (3.628), H e (0.724) and A r (7.478), and c2591_g1_i1 was the least polymorphic locus, with the lowest N A (2), N E (1.006), H e (0.006) and A r (1.213) values.

Hardy-Weinberg equilibrium test and linkage disequilibrium
Nine of the sixteen loci showed a highly significant departure from Hardy-Weinberg equilibrium (HWE) (P < 0.01), whereas the other seven loci showed no significant differences ( Table 2). All populations, except for LSY, showed a highly significant deviation from HWE (P < 0.01) (Table 3). However, from the outcome of deviation from HWE for each locus in each population (Additional file 1: Table S1), the above extreme results are likely to be the consequence of mixing analysis for multi-populations. In over 420 pairwise comparisons for linkage disequilibrium (Ld) among 16 loci in all P. sinensis populations, there were seven significant comparisons in LA and LSL, six significant comparisons in LP, LSH and SJ, as well as five significant comparisons in LSY (Table 3). In general, no consistencies were found to be significant in pair-wise comparisons for Ld indicating that there was no linkage among these loci and their inclusion will not affect the results of genetic variability [12].

Genetic diversity among populations
Data for all parameters of genetic diversity for the seven P. sinensis populations were shown in Table 3. The LSL population presented the highest N A , H e and A r values, while the LSH population exhibited highest N E and H o values. In the LD, LP, LA, LSL and SJ populations, F IS coefficients varied from 0.093 to 0.322, suggesting significant deficiencies of heterozygotes with 95% confidence interval. In the other two populations, F IS values did not show significantly different from zero.
Estimated of effective population size (N e ) and the mean ratio of the number of alleles to the range in allele size (M ratios) of each population were also listed in Table 3.
The N e values for LP and LSH showed to be the highest (infinite), whereas the lowest N e value was 106.1 as shown in LA. M ratios among all populations ranged from 0.826 to 0.908, which revealed that no population experienced reduction in effective size. Reduction in effective size only occurs when M < 0.68 [13]. Potential genetic bottleneck analysis performed using a Wilcoxon sign-rank test under TPM with 90% single-step mutations, showed that all populations exhibited normal L-shaped distribution and might not have experienced a bottleneck recently (P > 0.05).

Genetic divergence and distance between populations
The pairwise Wrights fixation index (F ST ) and Cavalli-Sforza and Edwards' genetic distance (D) [14] values were shown in Table 4, revealing significant differences among all populations. The analysis of molecular variance (AMOVA) revealed that genetic variation within and among populations was 82.76 and 17.24%, respectively. In addition, the variation among populations was found to be significant (P < 0.01) ( Table 5).
Basically, the F ST values among the seven populations reflected their geographic relationships. Firstly, there was a very great genetic differentiation between SJ (the only population from Huaihe Drainage Basin in Shandong Province) and the other six populations (all in Liaoning Province) (F ST > 0.25). Secondly, LD (Related River Drainage Basin) and all populations from Liaohe River Drainage Basin (LP, LA, LSL, LSY and LSH) exhibited a great genetic differentiation between (0.15 < F ST < 0.25). Finally, in Liaohe River Drainage Basin, there was a moderate genetic differentiation among LP, LA, LSL, LSY and LSH (0.05 < F ST < 0.15), meanwhile, LSL and LP, LSL and LSY populations showed little genetic differentiation (F ST < 0.05) [15] A comparison of F ST values revealed significant differences among the populations (P < 0.01).   Table 1 for full description of the populations)  As analyzed by the Mantel test, the correlation between geographical distances and a pairwise comparison with genetic distances (D values) was significantly correlated (r = 0.803, p = 0.001). The D value between the populations displayed similar pattern as F ST values. For example, most D values between SJ and the other six populations including SJ vs LA, SJ vs LSL, SJ vs LYS, and SJ vs LSH, were larger than 0.20, suggesting that they were genetically disparate populations [16]. In addition, the D values between LD and populations of Liaohe Drainage Basins were also higher than those between latter populations.
The UPGMA dendrogram constructed according to the D values was shown in Fig. 2. The seven populations formed three major clusters, LD, SJ and the other five populations. Among the five populations, LP and LA as well as LSL and LSY populations were more similar.
Gene flow between locations ranged from 0.608 (LA into LD) to 3.545 (SJ into LP). Most long-term gene flow between populations were symmetric (the 95% credible intervals overlapped in their pairwise comparisons), except those between SJ and LP, LSY and LSL, and SJ and LSY (Additional file 2: Table S2). As for cumulative gene flow, most populations received more migrants than they supplied, whereas SJ and LSL present the opposite results (Additional file 3: Table S3).
The logarithm probabilities Ln P (X/K) related with different numbers of genetic clusters K, calculated from structure analysis of 319 individuals of P. sinensis showed the highest value at K = 2, and followed by K = 5. As shown in Fig. 3, based on the value of K = 2, the individuals from LD and SJ populations were merged to some extent and were obviously different from the individuals of the other five populations. Based on the value of K = 5, individuals from LD and SJ populations were significantly different from each other, and the other five populations were similar to some extent, with LP and LA as well as LSL, LSY and LSH being the more similar.

Analysis of microsatellite polymorphisms
One of the main objectives of this study was to detect polymorphisms in transcriptome-derived microsatellite makers and to further understand the genetic diversity and structure of the wild populations from different locations. Compared with genomic derived microsatellites, transcriptome-derived microsatellites have several advantages including high efficiency, strong transferability, and correlation with potential genes [10]. Although transcriptome-derived microsatellites are predicted to be relatively less polymorphic than those derived from genomic DNA because they are more likely under stringent evolutionary constraint, some studies have given different results [17,18]. In this study, 16 polymorphic microsatellites were identified, with the N A value ranging from 2 to 13 (mean = 5.8), which is similar with studies investigating transcriptome-derived microsatellite markers in other decapod [10]. Additionally, compared to those genomic derived microsatellites in related species, such as M. rosenbergii [7] and M. nipponense [8], these 16 microsatellites present lower polymorphism. Since there is no report about genetic markers for P. sinensis so far, these polymorphic microsatellite makers and further mining of transcriptomic data may helpful for future research of P. sinensis and its related species.   [19]. This may due to the poor swimming ability of P. sinensis. In addition, due to time and space constraints in sampling, a certain number of individuals may have been from a few number of spawning shrimps, which would affect genetic diversity through an increase in genetic identity [20].
Most of the populations analyzed in this study showed a departure from HWE. This result was similar to several other studies which microsatellites derived from transcribed sequence data significantly depart from HWE [21][22][23]. This could be due to selection on polymorphisms in untranslated gene regions where these microsatellites typically reside, or to non-neutral dynamics of the genes to which they are physically linked [23]. However, the bottleneck effect and the M ratios indicated that all seven P. sinensis populations did not experience a bottleneck effect or a recent decline in quantity. Therefore, in future breeding projects, the level of inbreeding, brood shrimp population size and genetic diversity must be considered since domestication probably leads to a reduction in genetic variation due to genetic drift, selection and inbreeding [24].

Relationships among different populations
In this present study, subjects are collected from three geographically isolated drainage basins. LD population belongs to Related River Drainage in East Liaoning Peninsula; the SJ population belongs to Huaihe River Drainage in north China and the other five populations (LP, LA, LSL, LSY and LSH) come from Liaohe River Basin in central Liaoning Province. Specifically, LSL and LSY come from a closed lake and a semi-closed reservoir respectively; LA and LSH belong to inland tributaries; LP locates close to the mouth of Liaohe River.
AMOVA analysis revealed that the genetic differentiation among populations of P. sinensis, which accounted for 17.24% of the total genetic variation, was much lower than that within populations. Pairwise D and F ST values were consistent with these results. According to Thorp [16], the pairwise D values between most of the populations of P. sinensis indicated that they were closely related populations. Likewise, excluding SJ and LD populations, the pairwise F ST values among the other five populations were low to moderate [15]. The slight and moderate differences of pairwise F ST values among five populations indicated they might share the same ancestors. Even so, the significant correlation between geographical and genetic distances as well as the gene flow values indicated that five populations from Liaohe River Basin were separate due to the habitat fragmentation. The divergence between SJ and the other six populations as well as the divergence between LD and the other five populations were due to long-term geographic separation. These outcomes were consistent with the Bayesian analysis in genetic structure simulations, which also revealed that the SJ and LD populations were much different from the other populations.

Conclusions
In this study, 16 polymorphic transcriptome-derived microsatellites were screened and used to assess the genetic diversity and structure among wild P. sinensis populations in China. All the polymorphic microsatellite makers are believed useful for evaluating the extent of genetic diversity and population structure of P. sinensis. Compared to the other five populations, the LD and SJ populations exhibited  The AMOVA analysis revealed that genetic variation among populations was 17.24% and much lower than that within populations. D and F ST values between any two populations indicated that the LD and SJ populations differed from the other five populations. The UPGMA tree and the STRUCTURE analysis also supported the result. Therefore, they needed to be protected against further declines in genetic diversity. Among the seven populations, LP, LA, LSL, LSY and LSH populations were all from Liaohe River Drainage with a relatively high genetic diversity, and hence can be considered as hot spots for in-situ conservation of P. sinensis as well as sources of desirable alleles for breeding values. In future, further development of transcriptome-derived microsatellite markers is necessary for more detailed investigation on the genetic variation, genetic structure, and molecular markers-assisted selection (MAS) of P. sinensis. Overall, this study provided a theoretical basis for the protection, rational use and genetic breeding of germplasm resources.  Table 1). Genomic DNA was extracted from the muscles of each shrimp using a TIAnamp Marine Animals DNA Kit (TIANGEN) according to the manufacturer's protocol.

Microsatellite selection and genotyping
More than 50 microsatellite loci were selected from the data of P. sinensis transcriptome (GenBank No. SRR5759507) [5], and all new primers were designed using Primer Premier 3.0 (http://bioinfo.ut.ee/primer3-0.4.0/). Primers were examined using varying PCR conditions and PCR amplified products were evaluated by agarose gels. PCR reactions were performed by ABI 2720 thermocycler (Applied Biosystems, USA  at 72°C, followed by 4°C hold. All individual genotypes were scored after the PCR products were resolved on Applied Biosystems 3730XL Genetic Analyzer (Applied Biosystems, USA) and the product size was analyzed by GeneMarker version 2.2.0. (Applied Biosystems, USA).

Statistical analysis
Sixteen polymorphic loci were used to detect genetic variation among P. sinensis populations ( Table 2). The number of alleles (N A ), the number of effective alleles (N E ), observed heterozygosity (H o ) and expected heterozygosity (H e ) of each locus of each population was calculated using POPGENE 1.32 [25]. Null allele frequency (r) in each locus was estimated using software FreeNA [26] in which loci with estimated frequencies of null alleles above 0.2 were considered as potentially problematic for calculations. Meanwhile, the allele richness (A r ) and Wright's F IS values with 95% confidence intervals were calculated according to Weir and Cockerham using FSTAT version 2.9.3.2 software [27]. Tests for Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (Ld) were conducted using Arlequin 3.5.2.2 [28] and a Markov chain of 1,000,000 and 100,000 Dememorisation Steps. The estimates of effective population size (N e ) for each population was calculated using the gametic disequilibrium method implemented in LDNe 1.31 [29] based on the lowest allele frequency of 0.02 and confidence intervals estimated with the parametric method (which were highly similar to those estimated by the jackknife method). The mean ratio of the number of alleles to the range in allele size (M ratio) was used to assess recent changes in N e value using Arlequin 3.5.2.2 [28]. Recent bottlenecks were performed using Bottleneck version 1.2.02 [30], under a two-phase model (TPM) with 90% single-step mutation. These methods test for departures from mutation-drift equilibrium based on heterozygosity excess or deficiencies. A Wilcoxon signed-rank test was used to determine whether a statistically significant number of loci displayed heterozygote excess compared to expectations based on the observed number of alleles.
Pairwise Wrights fixation index (F ST ) and analysis of molecular variances (AMOVAs) were calculated using Arlequin 3.5.2.2 [28]. Cavalli-Sforza and Edwards' genetic distance (D) [14], computed using the INA correction method described in Chapuis and Estoup, was also calculated using FreeNA [26], and then constructed using the dendrogram with genetic distance based on UPGMA cluster through POPTREE2 [31]. The significance was tested based on 1000 bootstraps. A Mantel test was performed to estimate a correlation between the matrices of genetic and geographical distances using Arlequin 3.5.2.2 [28] (10, 000 permutations).
Genetic structures among populations analysis was performed via STRUCTURE v2.3.3 [34] using Bayesian methods. Parameters settings were assumed by an admixture model, with a burn-in of 50,000, with 100,000 Markov chain-Monte Carlo (MCMC) repetitions and 10 iterations per K (K = 2-7). The ΔK value was calculated by online software STRUCTURE HARVESTER [35] based on the rate of change in the log probability of data between successive K [36]. Plots of the clustering results were obtained using DISTRUCT [37].