Transferability, development of simple sequence repeat (SSR) markers and application to the analysis of genetic diversity and population structure of the African fan palm (Borassus aethiopum Mart.) in Benin

Background In Sub-Saharan Africa, Borassus aethiopum Mart. (African fan palm) is an important non-timber forest product-providing palm that faces multiple anthropogenic threats to its genetic diversity. However, this species is so far under-studied, which prevents its sustainable development as a resource. The present work is a first attempt at characterizing the genetic diversity and population structure of B. aethiopum across nine collection sites spanning the three climatic regions of Benin, West Africa, through the use of microsatellite markers. Results During a first phase we relied on the reported transferability of primers developed in other palm species. We find that, in disagreement with previously published results, only 22.5% of the markers tested enable amplification of B. aethiopum DNA and polymorphism detection is very low. In a second phase, we generated a B. aethiopum-specific genomic dataset through high-throughput sequencing and used it for the de novo detection of microsatellite loci. Among the primer pairs targeting these, 11 detected polymorphisms and were further used for analyzing genetic diversity. Across the nine sites, expected heterozygosity (He) ranges from 0.263 to 0.451 with an overall average of 0.354, showing a low genetic diversity. Analysis of molecular variance (AMOVA) shows that within-site variation accounts for 53% of the genetic variation. Accordingly, the low number of migrants and positive values of the fixation index (F) in sites from both the Central (Sudano-Guinean) and the Southern (Guinean) climatic regions suggest limited gene flow between sites. The global correlation between genetic and geographic distances is weak; however, our clustering analyses indicate that B. aethiopum palms from Savè (Center) are genetically more similar to those from the North than to samples from other Central sites. Conclusions In the light of our results, we discuss the use of inter-species transfer vs. de novo development of microsatellite markers in genetic diversity analyses targeting under-studied species, and suggest future applications for our molecular resources. We propose that, while prominent short-range pollen and seed dispersal in Benin explain most of our results, gene flux between the Central and Northern regions, as a result of animal and/or human migrations, might underlie the Savè discrepancy. Supplementary Information The online version contains supplementary material available at 10.1186/s12863-020-00955-y.

literature [35,36], such phenomena may lead to restricted gene ow and ultimately, to loss of genetic diversity among B. aethiopum populations. A sustainable management policy for B. aethiopum populations is therefore urgently needed and acquiring information on the genetic diversity of the species and population structure is a major step towards de ning sustainable management actions. At the time of writing the present article, only a few chloroplast sequences are publicly available for B. aethiopum through NCBI (https://www.ncbi.nlm.nih.gov/search/all/?term=borassus%20aethiopum). By contrast, abundant molecular resources, including genome assemblies or drafts, are available for model palm species such as Elaeis guineensis Jacq. [37], Phoenix dactylifera L. [38][39][40] and Cocos nucifera L. [41,42]. In each of these three palm species, large numbers of SSR markers have been identi ed and for a fraction of them, cross-species and cross-genera transferability tests among species belonging to the Palmaceae family have been performed [43][44][45][46][47][48][49]. In several instances [44][45][46][47]49] these tests included samples from Borassus abellifer, the Asian relative of B. aethiopum.
The primary objective of the present study is to generate the rst set of genetic data on Borassus aethiopum, as a rst step towards improving the management of this species through a better knowledge of its diversity. In order to achieve this, we rst describe attempts to use SSR markers identi ed in these other palm species. Then, we describe the low-coverage sequencing of the B. aethiopum genome with the aim of developing the rst set of speci c SSR markers targeting this species. Finally, we used the novel SSR markers to assess the genetic diversity and population structure of B. aethiopum samples collected across the three different climatic regions of Benin, a country that was most readily accessible to us for sample collection, as an important rst step towards more comprehensive studies spanning the West African sub-region.

Results
Assessment of palm SSR marker transferability to Borassus aethiopum and evaluation of their capacity for characterizing genetic diversity Of the 80 microsatellite markers selected from the three model palm species Elaeis guineensis, Phoenix dactylifera and Cocos nucifera and tested for ampli cation on B. aethiopum DNA, 18 (22.5%) generate ampli cation products (Table 1). No ampli cation is observed using the 11 C. nucifera markers, whereas 7 (15.9%) and 11 (44%) of the P. dactylifera and E. guineensis markers, respectively, show a successful ampli cation. None of the ampli cation products generated with P. dactylifera primers display genetic polymorphism in our B. aethiopum test panel. Among E. guineensis-derived SSR markers however, two, namely ESSR566 and ESSR652, display polymorphism. However, it must be noted that depending on the DNA sample the ESSR566 primer pair generates a variable number of amplicons with distinct sizes, which may be an indication that more than one locus is targeted. Overall, during this phase of the study we detect polymorphism in our B. aethiopum test panel with only 2 (11.1% of successfully ampli ed markers, 2.5% of total) of the palm SSR primer pairs assayed. Only one of these markers, namely ESSR652, enables unambiguous detection of microsatellite locus polymorphism in B. aethiopum, and might therefore be used for studying genetic diversity in this species.
De novo identi cation of microsatellite sequences in the B. aethiopum genome and assessment of potential SSR markers In order to enable a more precise evaluation of genetic diversity in B. aethiopum, we developed speci c B. aethiopum markers from de novo sequencing data. A total of 23,281,354 raw reads with an average length of 250 bp have been generated from one MiSeq run. Raw  aethiopum SSR markers (representing 20.4% of primer pairs associated with successful ampli cation and 55.0% of those detecting polymorphic products in our study) are both polymorphic and unambiguously mono-locus in our ampli cation test panel and may therefore be used for further analyses.

Microsatellite-based characterization of genetic variation of B. aethiopum in Benin
The newly identi ed set of 11 B. aethiopum-speci c SSR markers has been used for the characterization of genetic diversity in our full panel of 180 individual samples from nine locations distributed across Benin (Fig. 1). Map generated from publicly available resources of the Institut Géographique National du Bénin (IGN; https://geobenin.bj/) and the "Major Rivers of the World" dataset from the World Bank Data Catalog (https://datacatalog.worldbank.org/dataset/major-rivers-world; Creative Commons Attribution 4.0 International license), using the ArcGIS software by ESRI (www.esri.com).
As shown in Table 3, among our sample set the number of alleles per microsatellite locus ranges from 2 for marker Mbo41 to 6 for markers Mbo34, Mbo35, and Mbo50, with an average value of 4.27, whereas expected heterozygosity (He) values range from 0.031 (marker Mbo56) to 0.571 (marker Mbo35). Using these markers, the analysis of genetic diversity (   Nei's genetic distance among locations (Table 5) ranges from 0.073, as observed between Togbin and Hounviatouin (Guineo-Congolian region), to 0.577 between Togbin (Guineo-Congolian region) and Trois Rivières (Sudanian region). Overall, genetic distances between B. aethiopum sampling locations are lowest within the same region, with the lowest genetic distances among the sites of Pendjari, Pingou, and Trois Rivières which are all located in the Northern part of the country. One interesting exception is the Central (Guineo-Sudanian) region of Benin, where we nd that the most genetically distant location from Savè is the one from the Agoua forest reserve (0.339). Surprisingly, Savè displays its highest genetic identity value when compared to the other two collection sites located within protected areas, namely Pendjari (0.870) and Trois Rivières (0.882) which are both located in the Sudanian region. This is an unexpected nding considering the geographic distances involved. A similar structure of genetic distances emerges from the analysis of pairwise location genetic differentiation (Fst) ( Table 6), suggesting genetic differentiation according to geographic distances between collection sites, with the notable exception of the lower genetic differentiation between samples from Savè and those from either one of the forest reserves in the Northern region, namely Pendjari and Trois Rivières. In order to assess the strength of the relationship between genetic and geographic distances, we plotted them as a linear regression and performed the Mantel permutation test. As shown in Fig. 2, the positive correlation between both variables is weak, but signi cant (R 2 = 0.1139, P = 0.040).

Fig. 2 Correlation between pairwise Fst vs. pairwise geographical distance
The results of the non-hierarchical AMOVA (Table 7) show that within-site variation underlies the major part (53%) of total variance, whereas among-sites and among-regions variations explain genetic variance to a similar extent (23 and 24%, respectively). Accordingly, the average Number of migrants between collection sites (Nm = 1.019) is low, indicating very limited gene ow.
Hierarchical analyses performed with K=2 and K=3, respectively, yield an identical proportion of genetic variation at the within-individual level (62% of total ; Table 7). Analysis using K=3 allows for a balanced representation of variation between the among-regions and among-sites scales (16% of total variance for each), whereas among-regions variation is not as well accounted for under K=2 (7% of total variance, vs. 24% for among-sites variation). The Principal Coordinates Analysis (PCoA) of 180 B. aethiopum samples (Fig. 3A) shows that the rst axis (accounting for 24% of total variation out of a sum of 33.90 for axes 1 and 2) roughly separates individual samples in two main groups, a result that is in agreement with the analysis of genetic distances. The sampling locations-based PCoA (Fig. 3B) con rms the genetic separation along the rst axis (accounting for 44.08% of total variation over a total of 61.06% for the sum for axes 1 and 2) between sites from the Guineo-Congolian (Southern) region, plus the sites of Agoua and Biguina (Center) vs. sites from the Sudanian (Northern) region, plus the site of Savè (Center). Although the distinction is not as clearly marked, the second axis (accounting for 16.98% of total variation) further allows to distinguish two subgroups within the rst group, corresponding to sites belonging to the Southern region and to those from the Central one, respectively. Malanville, and Trois Rivières. Since there is a possibility that the ΔK method used for estimating K leads to over-or under-estimated values [50], clustering with higher values of K have also been tested. As is apparent in Fig. 4B, for values of K=4 and above standard deviations increase considerably, therefore we present results for both K=2 and K=3 ( Fig. 4C; see also Additional Figure 4 for the summary of the complete analyses with K=1 to K=10). As previously observed with the location-based PCoA, under K=3 further clustering emerges within the rst group, involving samples from Togbin and Hounviatouin (South) and those from Biguina and Agoua (Center), respectively. The Unweighted pair-group method with arithmetic mean (UPGMA) tree constructed from our data (Fig.  5) distinguishes two main groups matching the ones de ned through the Bayesian analysis with K = 2, and which are supported by bootstrap values above 50. Within each of these groups, subgroups corresponding to those observed with K=3 clustering and that globally match geo-climatic regions (Savè excepted) can further be de ned. However, in this case most bootstrap values attached to these secondary branches are not signi cant. Bootstrap values supporting each branch are indicated on the nodes.

Discussion
In owering plant, the e ciency of cross-species transfer of SSR markers is highly variable among taxa, especially when important differences in genome complexity exist between the marker source and the target [51]. Nevertheless, this method has been used successfully for accelerating the analysis of genetic diversity in many plant species, including palms [11,[52][53][54]. In the present study, we nd that the transferability rate of microsatellite markers developed in other palms genera to Borassus aethiopum, i.e. their ability to successfully amplify genomic DNA from the latter species, is very low. Indeed, among the 80 primer pairs designed on either Elaeis guineensis, Phoenix dactylifera or Cocos nucifera, we observe that only 18 (22.5%) produce amplicons from B. aethiopum. This percentage is very low when compared to both the inter-species and inter-genera transferability rates that have been found in similar studies targeting other palm species: from 17 to 93% in a panel of 32 palm species [49], 75% from E. oleifera to E. guineensis [54], 86% between the wooly jelly palm (Butia eriospatha Mart.) and related species Butia catarinensis [55] and up to 100% in the licuri palm (Syagrus coronate Mart) [56]. When considering other plant families, our transferability rate is also markedly lower than both the average rate of 50% found by Peakall et al. [57] within the Glycine genus and among Legumes genera, and the overall rate of 35.2% calculated by Rossetto [58] for within-family transferability among Gymnosperms and Angiosperms. The low transferability rate in our study might be explained in part by the fact that we used markers originating from genomic sequences. Indeed, as pointed out by Fan et al. [1], such markers have a lower transferability rate when compared to Expressed Sequence Tags (ESTs)-derived microsatellites due to the higher inter-species sequence variability within non-coding vs. coding sequences. Similarly, it is plausible that differences in genome size and complexity among palm species and genera account for our di culty to identify palm SSR markers that successfully amplify in B. aethiopum. As a matter of fact, the size of the B. aethiopum genome, as determined by ow cytometry (1C = 7.73 Gb; Jaume Pellicer, unpublished data), is 3.2 to 11.5 times larger than those of the microsatellite source species used in the present study: P. dactylifera genome is estimated to be 671 Mb [39] whereas the E. guineensis genome is 1.8-1.9 Gb [37,59] and C. nucifera genome is 2.42 Gb [42]. It is possible that these differences in genome sizes among related diploid plant species rely on differences in transposable element (TE) content, which in turn might have induced structural alterations throughout the genome through indels, copy number variations and recombinations [60,61]. The illustration of such a mechanism working at the intra-genus level has been provided by cultivated rice species Oryza sativa L. and its wild relative Oryza australiensis [60]. Ultimately, TE-induced structural variations may have a negative effect on the cross-species ampli cation ability of some of the SSR primers. Indeed, in a recent study Xiao et al. [49] showed that over 70% of the conserved microsatellite loci between E. guineensis and P. dactylifera are located within genic regions of the genome with low TE content, and which are therefore less likely to be submitted to TE-dependent structural variations. More generally, gaining a better understanding of genome structures within the Borassus genus could also help reconcile our results with previous published reports of successful transfer of SSR markers developed from other palm sources to Borassus abellifer (see references cited in Table 8, Methods section). Indeed, since the genome size of B. abellifer (7.58 Gb; Jaume Pellicer, unpublished data) is only marginally smaller than that of B. aethiopum, signi cant differences in genome composition may be underlying the lack of SSR transferability between both species.
In any case, from the low number of successfully transferred microsatellite markers we could only identify one displaying polymorphism in our B. aethiopum test panel, making it impossible to rely on for analysis of genetic diversity. Still, the fact that so little microsatellite polymorphism (2 out of 18 amplifying primer pairs: 11.1%) could be detected in this subset of 20 palms sampled across different locations throughout Benin is somewhat surprising and its reasons remain to be elucidated. In addition to possibly being a symptom of habitat fragmentation, this low diversity might also result from the extremely long juvenile phase that has been attributed to this palm species. Indeed, oral maturity has been reported to occur 30 to 50 years after germination [62]. The manner of seed and pollen dispersal, which have so far not been studied extensively in B. aethiopum, might also play a role. Indeed, in pollenmediated gene ow species, the distance the pollen travel is of importance in the occurrence of crossing between populations [63,64].
Regarding the development of novel SSR markers, our results are similar to other studies based on the use of high-throughput sequencing techniques in species where very little information is available [22,65]. We identi ed 57 microsatellite loci, from which we selected 11 markers displaying polymorphism that were used to assess the genetic structure of B. aethiopum sampled from different sites in Benin. We nd low genetic diversity, with an average He value (0.354) that is substantially below those reported for B. abellifer (0.417) [45] and for other non-timber forest products such as Khaya senegalensis (0.53) [66] and Phyllanthus sp. (0.607 and 0.582 for Phyllanthus emblica and Phyllanthus indo scheri respectively [67]. The positive F value that we observed in the majority (6 out of 9) of locations in the present study indicates an overall de ciency of heterozygotes across sites. This deviation from the Hardy-Weinberg equilibrium (HWE) might re ect low gene ow through pollen and seed dissemination, leading to crosses between related individuals, as supported by the low average number of migrants between sites. Accordingly, our data reveal limited genetic distances among collection sites, with values that are lower than those reported for others palm species. Indeed for B. abellifer, genetic distances ranged from 0.716 to 0.957 [68] and among natural E. guineensis accessions an average of 0.769 was observed [69]. Both our Fst values and AMOVA analysis point to intra-site differentiation as being the main source of genetic variation.
As illustrated by the global agreement between our PCoA and Bayesian analyses, Beninese B. aethiopum samples cluster into two main groups that are mostly dependent on geo-climatic regions and geographic distances between collection sites, although the correlation between genetic and geographic distance is poorly signi cant. There might be further genetic separation between Southern B. aethiopum samples and those from the Central sites of Agoua and Biguina, resulting in the splitting of one group into two subgroups. However, with our current dataset it is not possible to achieve this level of discrimination in our analyses. Additional sampling campaigns from intermediate locations in the Central and Northern regions will be necessary in order to make progress on the subject. within the Central region may reside in their physical separation by the Ouémé river, which further forms a natural corridor between Savè and the sites of Trois Rivières and Malanville in the North-East (see Fig. 1) [71]. We postulate that seed dispersal by humans and/or animals along this corridor might have played a major role in the observed pattern of genetic diversity and explain the singularity observed in Savè. As a matter of fact, members of the Bariba ethnic group, who live in the Eastern part of the country up to Malanville, share strong historical ties with the Shabè people from Savè, and exchanges between both groups are frequent [72]. The same corridor is also used annually for transhumance by the Fulani people [73], for whom B. aethiopum is an important plant: the role of their mobility in the dispersal of the plant, similar to what has been proposed for Caesalpina bonduc [74], is therefore plausible. Regarding the impact of animal migrations, Salako et al. [31,32] detected the presence of B. aethiopum seeds in elephant dungs and hypothesized that elephants may have played important role in the seed dissemination for this species through fruit consumption and long-distance herd migrations. In support to this assumption, Savè is part of a continuous forest corridor connecting with the Northern region that was likely used by elephants in their migrations. Up until 1982, the seasonal occurrence of the animal has been reported in the Wari-Maro forest of Central Benin [75].
The speci c microsatellite markers developed in this study from the partial genomic sequencing of B. aethiopum appear to be e cient to assess the genetic diversity and population structure of this species. Additionally, and provided that genome divergence is not too extensive to allow marker transferability, our SSR markers may also been used in a palm species that belongs to the same genus and that is reported to share parts of its distribution area, namely Borassus akeassii B.O.G., which has long been confused with B. aethiopum due to its similar morphology [76]. High-throughput sequencing techniques are an effective way of developing new microsatellite markers in plant species without signi cant molecular data. The increasing technical performances and nancial affordability of these technologies make it feasible to overcome the di culties arising in case studies such as ours, where marker transfer was proved to be limited or ineffective.

Conclusions
To our knowledge, the data presented in the present article constitute the rst sizeable molecular resource available for Borassus aethiopum, which we have made available to the scienti c community at large in order to facilitate the implementation of an increasing number of studies on this palm species. Using 11 newly identi ed SSR markers, we have also performed the rst analysis of the genetic diversity of B. aethiopum in an African country, which we see as a rst step towards the elaboration of an evidencebased strategy for sustainable resource management and preservation in Benin. Our results support the hypothesis that pollen and seed dispersal mainly occur within sites, leading to crosses among related individuals. The exception to this general rule in the region of Savè (Center) seems to indicate long-range transfer of genes as a result to animal and/or human movements towards and from forest reserves of the North. Further research into the characteristics of these migrations and their impact on gene ow among B. aethiopum populations is required in order to con rm this assumption. As a complement to the present work, the acquisition of agro-morphological data is currently under way, in a bid to elucidate the reproductive development and breeding system of the species. As a longer-term perspective, we also plan to extend our analysis of B. aethiopum diversity to the West African sub-region, and leverage the data acquired to improve knowledge of other species within the Borassus genus, and of palms diversity as a whole.

Plant material sampling and DNA extraction
Samples of Borassus aethiopum were collected from wild populations in nine distinct sites (three located in protected forest areas, six in farmlands) that were distant from each other by at least 50 km and which spanned the three main climatic regions encountered in Benin (Fig. 1). According to White [77], Benin covers three contrasted climatic regions which are the Sudanian region in the North, the Sudano-Guinean region in the Center and the Guineo-Congolian region in the South. Along a South-North gradient, the rainfall regime switches from bimodal to unimodal, the climate becomes globally drier [29] and the density of B. aethiopum distribution increases [31]. At each location, young leaves from 10 male and 10 female adult trees separated by at least 100 m were collected and stored in plastic bags containing silica gel until further processing. The complete list of samples and their characteristics is available in Additional le 2.
Genomic DNA was extracted from 250 mg of leaves ground to powder under liquid nitrogen using the Chemagic DNA Plant Kit (Perkin Elmer, Germany), according to the manufacturer's instructions on a KingFisher Flex™ (Thermo Fisher Scienti c, USA) automated DNA puri cation workstation. Final DNA concentration was assessed uorometrically with the GENios Plus reader (TECAN) using bis-benzimide H 33258 (Sigma-Aldrich) as a uorochrome.
Transferability of palms microsatellite markers: selection and ampli cation A total of 80 SSR markers from previous studies were selected for assessment of their transferability to B. aethiopum: 44 developed for Phoenix dactylifera [78]; 25 developed for Elaeis guineensis [44,79]; and 11 developed for Cocos nucifera [80]. The respective sequences and origins of these primer sets are displayed in Table 8. Separation and vizualization of ampli cation products were performed as described previously. Finally, the primer pairs enabling successful and unambiguous ampli cation of polymorphic bands were used for the analysis of genetic diversity among the complete set of 180 B. aethiopum individuals under the same PCR conditions.

Data analysis
Ampli cation products were scored using the GeneMapper software version 3.7 (Applied Biosystems) and only unambiguous ampli cation products were considered for data analysis. Genetic diversity parameters were calculated for each locus and each sampling location using the GenAlEx software version 6.502 [87]. Expected heterozygosity (He) was calculated using the formula: where p i is the frequency of each allele. The xation index (F) was calculated as: where Ho is observed heterozygosity and He is expected heterozygosity [88].
F-statistics analysis assessing genetic differentiation (Fst), genetic identity, number of migrants (Nm) [89] and non-hierarchical analysis of molecular variance (AMOVA) for estimating genetic differentiation within and among locations were performed with the same software. Allelic richness was calculated using the SPAGeDi software version 1.5 (http://ebe.ulb.ac.be/ebe/SPAGeDi.html; [90]). Consecutively to K determination (see below), successive hierarchical AMOVA analyses were carried out with K= 2 and K=3. The Mantel permutation test was used for assessing the correlation between genetic and geographic distances between sampling sites [91,92]. Two Principal Coordinates Analyses (PCoA) enabling the visualization of genetic variation distribution across individuals and sampling sites, respectively, were performed using GenAlEx.
The STRUCTURE software version 2.3.4 [93] was used for the determination of the most probable number of clusters for population structure (K value). Using the admixture model, eight simulations were performed for each inferred K value, with a running length composed of 300,000 burn-in periods and 50,000 Markov chain Monte Carlo (MCMC) replicates. The output from this analysis was then used as input in the Structure HARVESTER online program version 0.6.94 (http://taylor0.biology.ucla.edu/structureHarvester/) to determine the optimal value of K using the ΔK method of Evanno et al. [94] and allowing for different estimates of K in accordance with Janes et al [50]. Based on the resulting values of K, a clustering analysis of the studied sampling sites was performed and graphical output was generated using CLUMPAK's main pipeline (http://clumpak.tau.ac.il; [95]). In order to further assess genetic clustering, a UPGMA tree based on Fst values using 1,000 bootstrap replications was constructed using the POPTREE2 software [96]. Availability of data and materials Data generated from genome sequencing ( ltered reads) were deposited into GenBank under BioProject ID PRJNA576413. Capillary electrophoresis pro les are available upon reasonable request to the Corresponding Author. All other data generated or analyzed during this study are included in this published article (and its supplementary information les).

Competing interests
The authors declare that they have no competing interests