Dissection of the genetic basis of oil content in Chinese peanut cultivars by association mapping

Background Peanut is one of the primary sources for vegetable oil worldwide, and enhancing oil content is the main objective in these peanut breeding programs. Linked markers for oil content is required for use in genomics-assisted breeding (GAB), and association mapping is one of the promising approaches for discovery of associated markers. Results An association mapping panel consisting of 292 peanut varieties extensively distributed in China were phenotyped for oil content and genotyped with 583 polymorphic SSR markers. These markers amplified 3663 alleles with an average of 6.28 alleles per locus. The results of structure, phylogenetic relationship, and PCA analyses indicated two subgroups majorly differentiating based on geographic regions. Genome-wide association analysis using genetic and phenotypic data identified 12 associated markers including one (AGGS1014_2) highly stable association controlling up to 9.94% phenotypic variance explained (PVE) across multiple environments. Interestingly, the frequency of the favorable alleles for 12 associated markers showed a geographic difference. Two associated markers AGGS1014_2 and AHGS0798 with 6.90-9.94% PVE were verified to enhance oil content in an independent RIL population. The combined genotypes of AGGS1014_2 and AHGS0798 appeared to experience selection during the breeding program. Conclusion This study provided insights into the genetic basis of oil content in peanut and verified that two SSR markers were highly associated with oil content. Our results could facilitate marker-assisted selection for high-oil content breeding.

The peanut panel was consisted of cultivars from 17 provinces of China. Most accessions (93.2%) were distributed in nine provinces (HEB, SD, HN, SC, HUB, JS, Fujian, Guangdong, and Guangxi). The proportion of two subgroups in these provinces exhibited obvious differences (Fig. 2a). In Northern China (HEB, SD, and HN provinces), the proportion of G1 subgroup ranged from 77.42% to 85.71%.
Similarly, the proportion of G1 ranged from 66.10% to 91.67% in peanut accessions distributed in the Yangtze River region (SC, HUB, and JS provinces). Whereas, the proportion of G1 subgroup were below 11.11% in Southern China (Fujian, Guangdong, and Guangxi provinces). It is suggested that genetic diversity was highly linked to geographic distribution. The phylogenetic tree showed that the peanut-distributed provinces could be clearly clustered into two clades (Fig. 2b). The provinces in Southern China (Fujian, Guangdong, and Guangxi) were clustered together, and the provinces from Northern China and the Yangtze River region were grouped into another clade.

Phenotypic variation for oil content among peanut accessions
The oil content for 292 Chinese peanut accessions was analyzed from seeds harvested from four  (Table 2). Based on phenotypic data of 292 peanut accessions across four environments, the broad sense heritability for oil content was evaluated to be 0.76.
Since genetic diversity was present among peanuts from different Chinese regions, we further evaluated whether phenotypic differences existed in the different geographic distributions. The oil content of accessions from Northern China was statistically higher than that from Southern China in all the field trials. Similarly, the accessions from the Yangtze River region have higher oil content than the accessions from Southern China in 2016WH, 2017NC, and 2017WH. The phenotypic difference between Northern China and the Yangtze River region was not statistically significant in three environments. Meanwhile, we also made a comparison among cultivated peanuts released at different times (Additional file 2: Fig. S2). In general, there was no obvious difference in oil content between cultivars released at different times.

Association analysis for oil content
Based on four environmental phenotypic data of 292 peanut accessions, the MLM model with K+Q matrix was used to execute association mapping of SSR-markers with oil content. The marker-trait analysis for four environmental trials identified two associated loci for 2015 WH environment, eight associated loci for 2016WH environment, three associated loci for 2017NC environment, and five associated loci for 2017WH environment. Twelve significantly associated loci at P < 0.00186 explained 4.54-9.94% phenotypic variance across four environments (Table 3 and Additional file 2: Fig. S3). Among them, AGGS1014_2 with up to 9.94% PVE had been repeatedly detected in multiple environments (2016WH, 2017NC, and 2017WH). The allelic number of these associated loci ranged from two (pPGPseq8D9 and AGGS1014_2) to six (TC11B4_2). The accessions with different alleles for these loci had significantly different phenotypes in a four-environment average of oil content (Fig.   4a). The most favorable alleles associated with oil content were pPGPseq8D9-131bp, TC9F10_2-256bp, TC11B4_2-298bp, AHGS1679-293bp, AGGS1149-192bp, AGGS1081-201bp, AGGS1014_2-215bp, AHGS2053-256bp, AHS0127-188bp, AHGS1431-260bp, AHGS0798-174bp, and AHGS1388-304bp (Additional file 1: Table S4). In general, the allelic effects of associated loci were higher in accessions from the Northern China and Yangtze River region than accessions from Southern China (Fig. 4b). Similarly, the frequencies of the most favorable alleles also showed geographic differences.
For ten associated loci (pPGPseq8D9, TC11B4_2, AHGS1679, AGGS1149, AGGS1014_2, AHGS2053, AHGS0127, AHGS1431, AGHS0798, and AHGS1388), the favorable allele frequency was highest in Northern China, the second-highest in the Yangtze River region, and lowest in Southern China (Fig.   4c). However, the favorable allele frequencies were highest in Southern China for another two associated loci (TC9F10_2 and AHGS1431).

Evaluation of RIL population and confirmation of associated markers
To estimate potential values of associated loci in peanut breeding, a RIL population derived from two additional accessions (Zhonghua 10 and ICG12625) was employed as a test population. Oil content of the RIL population across four environments ranged from 47.45% to 60.88% in Env1, 45.30% to 58.96% in Env2, 42.89% to 55.07% in Env3, and 45.98% to 58.37% in Env4, respectively. The oil content of the female parent was 51.88 ± 1.41 %, whereas that of the male parent was 53.32 ± 1.47%. Three associated makers (AGGS1014_2, AHGS0798, and AHGS1431) showed polymorphism in the RIL population. A significant difference in oil content between homozygous alleles from P1 and P2 at AHGS1431 locus was observed in Env1 (Additional file 1: Table S5). Compared with the homozygous allele from P1 at AGGS1014_2 locus, the homozygous allele from P2 had significantly higher oil content in two environments i.e., Env2 and Env4 ( Fig. 5a and Additional file 1: Table S5). For marker AHGS0798, oil content of the homozygous allele from P2 was significantly higher than that of the homozygous allele from P1 in two environments ( Fig. 5a and Additional file 1: Table S5).
Combined AGGS1014_2 and AHGS0798, it is obvious to find that oil content of homozygous alleles from P2 was significantly higher than that of the homozygous allele from P1 across environments (Fig.   5c).
In the 292 peanut accessions, the alleles at AGGS1014_2 (X) locus and AHGS798 (Y) locus formed six and PIC (0.72) was observed in the peanut 'reference set' of ICRISAT [21] which may due to the diverse genotypes included from 48 countries representing global diversity including wild accessions.
From all these comparisons, the Chinese cultivated accessions in the present study represented high molecular diversity comparable to other such collections consist of cultivated genotypes, indicating that this population is suitable for association mapping. Several studies in other crops also reported that the genetic diversity of cultivated species was always lower than the corresponding wild species [26][27][28][29]. It is essential to deploy diverse and wild genetic resources into Chinese cultivars to broaden their genetic base of founder parents for enhancing the genetic diversity and achieving higher genetic gains. The information available through genotyping and multilocation phenotyping will further facilitate identification of potential founder parents for the ongoing breeding program.
Population structure is an important component in association mapping analysis and it helps in reducing the detection of false positives among associated markers. The STRUCTURE analysis identified two subpopulations for the 292 accessions (Fig. 1b) which is also been indicated from the dendrogram tree and PCA analysis reached similar results ( Fig. 1c and 1d). The peanut germplasm collections in previous studies could be divided into 2 to 4 subpopulations, which were always associated with the types of botanical varieties [21,24,[30][31][32]. In the present study, the landraces in the peanut panel could be clearly divided into subsp. hypogaea (G1) and subsp. fastigiata (G2), respectively (Fig. 1c). However, most peanut cultivars and breeding lines in this population harbored mixed morphological features from the reciprocal cross between different botanical varieties. Thus, it is hard to distinguish the botanical difference between two subgroups clearly. Most accessions in the G1 group were from the provinces distributed on Northern China and the Yangtze River region. More than half of accessions in the G2 group were from the provinces distributed in Southern China (Fig. 2a and Additional file 1: Table S1). Comparing to Southern China, the varieties from Northern China were more closely related to the varieties from the Yangtze River region (Fig. 2b). It is indicated that the geographic origins of accessions had a significant effect on the population structure. A similar phenomenon was observed in many other crops [26, [33][34][35]. Different climate condition and their corresponding cropping system among Northern China, Southern China, and the Yangtze River region, might be responsible for genetic differentiation of the peanut population in China, enabling peanut varieties to adapt to various ecological environments.
Oil content is an important trait in peanut breeding, which acts as polygenic inheritance. In the present study, the associated analysis was performed to evaluate the phenotypic effect of multiple alleles in the diverse genetic background across multiple environments. A total of 42 alleles for twelve associated loci, which explained 4.54-9.94% phenotypic variance, were identified for oil content (Table 3). Interestingly, the favorable alleles with relatively higher effects were relatively abundant in the varieties from Northern China and the Yangtze River region, compared with the varieties from Southern China (Fig. 4b and 5c). Correspondingly, oil content of varieties showed a geographic difference clearly. The accessions from Northern China and the Yangtze River region had significantly higher phenotypic values than the accessions from Southern China (Fig. 3). It seemed that the trait of oil content and its underlying loci may undergo selection during geography differentiation in China.
However, more experimental evidence, such as a multiple-ecological investigation of phenotype, was needed to verify the hypothesis.
Compared with the previous results, chromosome A03, A04, A08, B06, B07, and B08 were also found to harbor QTLs for oil content [11,12,14]. For instance, the associated marker AHGS0798 on chromosome B06 (124.9 Mb), is close to qOCB06.1 (121.9 Mb-124 Mb) detected in the RIL population derived from Xuhua 13 and Zhonghua 6 [14]. Another two markers (TC1A02 and AHGS0393), which were highly linked to QTLs for oil content in the early studies [11,12], were located at 127.5 Mb and 139.3 Mb on chromosome B06. These results suggested that AHGS0798 with PVE of 7.28% would be a reliable marker associated with oil content. In addition, three associated markers (AGGS1014_2, pPGPseq8D9, and TC11B4_2) in the present study could not collocate with the previous QTLs, suggesting that they are newly identified loci controlling oil content. Among them, the locus (AGGS1014_2) was repeatedly detected in three environments, with the maximum -LogP value of 2.50E-06 and PVE of 9.94% (Table 3). To evaluate the potential value of these loci in peanut breeding, two associated markers (AHGS0798 and AGGS1014_2) were verified in the RIL population derived from Zhonghua 10 and ICG12625. The favorable allele of single locus (AGGS1014_2 or AHGS0798) could increase oil content by ~ 0.34% -~1.50% or ~ 0.61% -~0.88% in four environmental trials.
Combining favorable alleles for two loci, oil content could increase to ~ 1.11% -~2.06% (Fig. 5 and Additional file 1: Table S5). It is indicated that using associated markers to accumulate favorable alleles would be an effective way to increase oil content in peanut breeding. In the present peanut panel, AGGS1014_2-215 bp/AHGS0798-174 bp is one of six combined genotypes between two associated markers, which expressed the highest phenotypic effect. In the varieties released before In conclusion, this study provided insight into the close relationship of the geographical region with population structure and the trait of oil content in China and informed allelic variations of oil content in Chinese cultivars collections. Two associated markers (AGGS1014_2 and AHGS0798) in the present study were verified to be valuable in MAS for oil improvement in peanut.

Plant materials and field planting
A total of 222 cultivars, 55 breeding lines, and 15 landraces from 17 different provinces in China were selected to constitute the peanut panel (Additional file 1: Table S1). A RIL population was developed from a cross between Zhonghua 10 and ICG12625 using single seed descent method and was later used for performing validation of associated markers.

DNA isolation and genotyping
The genomic DNA of 292 peanut accessions was extracted from fresh leaves following the modified cetyltrimethylammonium bromide method. The quality and quantity of DNA were checked using 1% agarose gel and NanoDrop (Thermo Fisher Scientific, USA), respectively.
The SSR allele was numerically coded according to the fragment size.

Genotypic data analysis
The allele number, major allele frequency, genetic diversity and polymorphism information content (PIC) were calculated using PowerMarker V3.25 software [52]. The number of subgroups of this peanut panel was estimated using STRUCTURE software V2.1 based on the model-based Bayesian clustering method [53]. To determine an optimum number of subgroups (K), five independent runs were performed to estimate each K values from 1 to 10 for each accession. For each run, a burn-in length of 50000 followed by 10000 iterations were conducted with the admixture and related frequency models. The optimal K value was determined by the posterior probability [LnP(D)] and ΔK [54].

Evaluation of oil content and phenotypic data analysis
The percentages of Oil and H 2 O in seeds were measured using nuclear magnetic resonance (PQ001, Niumag, China). Matured seeds (~10g) with less than 10% moisture content were analyzed for each of the three sub-samples per entry. Oil content (%) was calculated based on dry-weight using the formula {[oil%/(100−H 2 O%)]×100} [11].
The authors declare that the research was conducted in the absence of any commercial or financial relationship that could be construed as a potential conflict of interest.