Genetic dissection of eating and cooking qualities in different subpopulations of cultivated rice (Oryza sativa L.) through association mapping

Background Eating and cooking qualities (ECQs) of rice (Oryza sativa L.) determine consumer acceptance and the economic value of rice varieties. The starch physicochemical properties, i.e. amylose content, gel consistency, gelatinization temperature and pasting viscosity are important indices for evaluating rice ECQs. Genetic factors are required for development of rice varieties with excellent ECQs and association mapping is one of the promising approaches for discovering such associated genetic factors. Results A genome-wide association mapping was performed on a set of 253 non-glutinous rice accessions consisting of 83 indica and 170 japonica cultivated rice varieties through phenotyping for 11 ECQ traits in two consecutive years and genotyping with 210 polymorphic SSR and candidate-gene markers. These markers amplified 747 alleles with an average of 3.57 alleles per locus. The structure, phylogenetic relationship, and principal component analysis indicated a strong population differentiation between indica and japonica accessions and association mapping was thus undertaken within indica and japonica subpopulations. All traits showed a large phenotypic variation and highly significant phenotypic correlations were present between most of traits. A total of 33 and 30 loci were located for 11 ECQs in indica and japonica subpopulations respectively. Most of associated loci were overlapped with starch synthesis-related genes (SSRGs), and the Wx locus gathered 14 associated loci with the largest effects on amylose content, gel consistency and pasting viscosities. Eight subpopulation specific markers, RM588, Wx-(CT)n, SSI and SBE1 for indica subpopulation and RM550, Wxmp, SSIIa and SBE4 for japonica subpopulation, were identified, suggesting alleles of SSRGs showed the subspecific tendency. Nevertheless, allelic variation in SSIIa showed no tendency towards subspecies. One associated maker RM550 detected in japonica subpopulation for amylose content and pasting viscosity was verified a potential novel and stably expressed locus and could be selected for further fine mapping. Conclusion This study illustrated the potential for dissecting genetic factors of complex traits in domesticated rice subspecies and provided highly associated markers to facilitate marker-assisted selection for breeding high-quality indica or japonica rice varieties.


Background
Rice (Oryza sativa L.) is the most important food crop in the world and feeds more than half of the world's population. Grain physicochemical characteristics of the milled rice play critical roles in determining eating and cooking qualities (ECQs) of rice. Amylose content (AC), Gel consistency (GC), Gelatinization temperature (GT) and pasting properties are widely recognized as the important indices in determining ECQs [1].
The genetic basis of ECQs have been extensively studied in rice. Amylose content is the most important chemical property affecting ECQs, mainly controlled by the Wx gene encoding granule-bound starch synthase I (GBSSI), responsible for amylose synthesis in endosperm. Wx a and Wx b are the most common functional alleles for AC regulation in non-glutinous rice [2,3]. In comparison with Wx a , Wx b has G-T mutation at the splicing site of the first intron that reduces the splicing efficiency of precursor mRNA and the amount of GBSSI, thus leading to the reduction of AC in the target material. Moreover, single nucleotide polymorphisms (SNPs) in exons 4 (Wx op , Wx mq and Wx mp ) and 6 (Wx in ) cause amino acid substitutions, resulting in lower AC values and pasting viscosity than their background materials [4][5][6]. In addition, microsatellite variations of (CT) n in noncoding region of the Wx gene was found to be in significant correlation with AC [7]. Further, the findings of dull mutants uncovered the complex genetic regulation of amylose synthesis [8,9]. GC is used as an indicator to measure cold paste viscosity of rice flour, especially reflecting the textural property in cooked rice [10]. GC is highly negatively associated with AC, i.e., a rice variety with high AC tends to show low GC or hard gel after cooling. Map-based cloning of the qGC-6 locus revealed that Wx is the major gene responsible for GC [11], and the same conclusion was drawn through association and linkage mapping [12,13]. Rapid Visco Analyzer (RVA) plays an essential role in estimating ECQs and processing qualities of rice, which can distinguish eating quality of rice varieties with similar AC by the variation trend of pasting viscosity profiles. RVA parameters are mainly controlled by the Wx gene, but several minor-effect genes or QTL other than Wx were found to be associated with RVA parameters [14,15]. GT, indicated by the alkali spreading value (ASV), predicts rice cooking quality. The gene ALK, which is also known as SSIIa, has been reported to regulate rice GT [16]. The detection of QTL and association loci indicted that SSIIa has major effects on GT, and minor effects on GC and/or most of RVA parameters [13,14]. Two functional SNPs in exon 8 of SSIIa, GC/TT and G/A, are responsible for SSIIa activity and lead to different GT of rice flour and chain length distribution of amylopectin [1]. The GC/ TT polymorphism can differentiate rice varieties with high and intermediate GT from low GT rice varieties. The G/A SNP is crucial for SSIIa activity; the allele A causes SSIIa enzyme inactive regardless of GC/TT polymorphism, though A allele is rare in natural rice population [1,17]. Besides Wx and SSIIa, the genetic effects of other starch synthesis related genes (SSRGs) such as SSI, PUL, SSIIIa, SBE1 and SBE3 on rice ECQs have been widely reported [18][19][20]. Additionally, dozens of QTL for each trait of ECQs located on all of rice chromosomes using bi-parental populations could be retrieved in the Q-TARO QTL database and the Gramene QTL database. However, the traditional bi-parental population has showed several limitations, especially in finding the minor QTL. The genetic basis of the variation in ECQs is not yet fully understood among the non-glutinous rice varieties.
Association mapping is a powerful method to correlate genetic loci and phenotypic performance based on a natural population. Association mapping, especially candidate-gene association mapping, has been commonly used to detect QTL for grain qualities in rice. Tian et al. [13] found that the starch synthesis-related genes cooperated with each other to form a regulatory network controlling ECQs. Yan et al. [19] tested 118 waxy rice varieties using 17 SSRGs with 43 genic markers and found that 10 SSRGs are involved in regulating the RVA parameters. However, candidate-gene association mapping may easily miss some unknown loci, while genome-wide association mapping is a more effective strategy to comprehensively identify causal genetic variations in the whole genome [21].
In the present study, we used genome-wide association mapping in combination with SSRGs candidate-gene markers to explore the loci for 11 ECQ traits in different subpopulations of indica and japonica rice varieties and compared the subspecies tendency of the association loci. The purpose is to detect specific QTL of ECQs within rice subspecies and to elucidate the relationships of SSRGs and ECQs based on the similar genetic background.

Population structure
The analysis of model-based population structure provided evidence for a significant population structure. The log-likelihood values increased with the increase in the model parameter K (Fig. 1a). ΔK, as the diagnostic criterion, was thus used to determine a suitable value for K. The highest ΔK value was obtained at K = 2 (Fig. 1b), indicating the whole population composed of 253 rice cultivars could be divided into two subpopulations. The neighbor-joining tree constructed based on the Nei's genetic distances showed that the entire population could be clearly divided into two groups belonging to the indica and japonica rice accessions (Fig. 1c), that is in consistent with the results from the structure analysis. The indica subpopulation consisted of 81 indica accessions, while the japonica subpopulation included all of the 170 japonica accessions as well as 2 indica accessions from South Korea. Through screening minor allele frequency of all markers, dozens of markers were found highly fixed in one subpopulation (Fig. 1d), which may strongly affect the indica-japonica differentiation. Furthermore, PCA plots of the first two component clearly differentiated the whole population into two subpopulations (Fig. 1e), which corresponded with the results of population structure and neighbor-joining tree. The population-differentiation statistic (F ST ) between the indica and japonica subpopulations was estimated at 0.75, indicating an extremely strong population differentiation.
We further evaluated genetic structure in the two subpopulations. By K value analysis, the indica and japonica subpopulations could be divided into four and two subgroups, respectively ( Fig. 2a-d). The neighbor-joining tree of indica and japonica subpopulations showed a consistent divergent subgroups with population structure analysis ( Fig. 2e and f). Among the four subgroups of indica, F ST was estimated at 0.215 suggesting a moderate level of differentiation within indica subpopulation. The genetic differentiation within japonica subpopulation was significantly less (F ST = 0.057) than that in indica subpopulation, implying the fixation of alleles and inbreeding.
Whole-genome patterns of linkage disequilibrium (LD) LD analysis was performed for the indica and japonica subpopulations cultivars based on 210 markers (Additional file 2: Fig. S2) since a strong population differentiation was found in the whole set of rice accessions. In indica subpopulation, out of 21,310 pairs, 21, 251 pairs (99.72%) showed LD with a D' average of 0.36. The loci number with LD supported by D' > 0.5 was 5204 and occupied 24.42%. In japonica subpopulation, 21,893 (99.90%) among 21,915 pairs showed LD, the loci with LD supported by D' > 0.5 occupied 33.34% and the D' average of LD pairs was 0.43. This indicated that LD level was higher in japonica subpopulation than in indica subpopulation, and the japonica subpopulation might have undergone a stronger manual selection than indica subpopulation.

Phenotypic variations and correlations
The mean value, coefficient of variation (CV) and differences for 11 ECQ traits in the whole population and two subpopulations across 2 years are depicted in Table 1 (2016), respectively. There was no significant difference for AC and GC between years, but the dramatic difference was found between two subpopulations. The mean values of ASV were 4.6 (3.2), 4.0 (2.8) and 4.9 (3.4) in three panels in 2015 (2016), respectively. The differences for ASV were significant between years and between subpopulations. The indica subpopulation had larger AC, and smaller GC and ASV in comparison with the japonica subpopulation. For the 8 parameters of RVA profile, the differences of most of parameters in three panels were significant between years, except for the PV, CV, SBV and Ptemp in indica subpopulation. There were also significant differences for most of RVA parameters except for PeT between subpopulations. Most of the RVA parameters had higher average value in indica subpopulation than in japonica subpopulation, while BDV and PeT showed the contrary values.
The phenotypic pair-wise correlations between each pair of the 11 ECQ traits in three panels and 2 years are given in Additional file 1: Table S2. The correlations of the same trait were almost all significant in three panels between both years indicating environmental stability of these traits. The correlation coefficients were generally larger in the whole population and indica subpopulation than in the japonica subpopulation. In both year, except for ASV, AC was negatively correlated with GC, PV and BDV and was positively correlated with HPV, CPV, SBV, CSV, PeT and Ptemp in more than two panels. GC had significant correlations with most of RVA parameters except for PV and ASV in both years. Significant correlations for ASV were mainly identified with PV, BDV, PeT and Ptemp in whole population and the japonica subpopulation in both years. Each pair of most of RVA parameters showed significant correlations in more than two panels, but there were no or weak correlations for PV with CSV and PeT in three panels in both years.

Association analysis
The strong population structure between the two subspecies of cultivated rice makes it inappropriate to look for association loci in the entire population. We thus conducted association analysis separately for indica and japonica subpopulations based on the Q + K mix linear model.
For indica subpopulation, 33 associated loci influencing nine traits were totally identified with P < 2.4 × 10 − 4 , of which 16 loci were detected in both years ( Table 2). There were seven strong association signals with P < 10 − 7 . For AC and GC, the same four loci were identified on the short arm of Chr.6 which were markers of Wx gene or tightly linked with Wx gene. Four loci for AC were detected in both years with the phenotypic variance explained of 7.67 to 17.24%. In four loci for GC, only RM588 was found in both years explaining 6.85 and 9.12% of the phenotypic variance respectively, while the other three loci were detected only in 2016 with a minor phenotypic variance explained of 2.71 to 4.97%. Two markers (SSIIa-S1 and SSIIa-S2) from the OsSSIIa gene affecting ASV were found, explaining the phenotypic variance of approximately 17.0% in 2015. Eighteen loci for five viscosity parameters (PV, BDV, CPV, SBV and CSV) were identified across 2 years of which 10 were detected in both years, with phenotypic variation explained by each locus varying from 4.09 to 10.45%. Five loci for PeT were detected of which four were found in 2015 and one in 2016, explaining the phenotypic variance of 7.96 to 17.82%.
For japonica subpopulation, a total of 30 markers were identified as the association signals with P < 2.4 × 10 − 4 , of which 8 loci were detected in both years (Table 3). For AC, two loci (Wx-S3 and RM550) were detected in both years and the Wx-S3 loci had the largest effect with the phenotypic variance explained of 21.86 and 20.87% in 2015 and 2016, respectively. Four loci for ASV were detected only in 2015 contributing minor effects with the phenotypic variance explained of 3.55 to 5.56%, of which two markers were derived from the OsSSIIa gene. Sixteen loci for six viscosity parameters (PV, HPV, BDV, CPV, SBV and CSV) were totally identified across 2 years. Only the Wx-S3 locus controlling HPV, CPV, SBV and CSV was detected in both years, while other loci were found in 1 year. The detected loci in each viscosity parameter varied from two for BDV, CPV and CSV to four for HPV. For PeT, three loci were identified, of which only the Wx-S3 locus was found in both years. Five loci for Ptemp were detected across 2 years. The locus SSIIIb-S1 was identified only in 2015 which explained 7.51% of the phenotypic variance. The locus RM276 found in both years explained 6.85 and 9.12% of the phenotypic variance in 2015 and 2016, respectively. Three additional loci (SSIIa-S1, SSIIa-S2 and RM5688) were detected only in 2016 accounted for 3.48, 6.17 and 4.52% of the phenotypic variance, respectively. Some association signals for the investigated traits were located nearby the known SSRGs (Wx, SSIIa, SSI, Table 1 Year wise comparative phenotypic performance in terms of eating and cooking qualities in the whole population and two subpopulations of cultivated rice  Figure 3 exhibits the location of these association signals on rice chromosomes separately from the indica and japonica subpopulations. The most association signals derived from two subpopulations across 2 years were assembled in Wx gene. Notably, we regarded those loci that were detected with the physical distance less than 2.0 Mb as the same QTL, and these loci in different subpopulations were regarded as alleles.

Discussion
There is an ancient and well-established divergence between two major subspecies of Oryza sativa, indica and japonica [22]. Chen et al. [23] developed a set of SNPs and uncovered the well-known subspecific differentiation and geographic differentiations within indica and japonica germplasms. For rice eating quality characteristics, a wide range of varietal differences have been recognized in worldwide rice germplasms including indica and japonica rice cultivars [24]. These rice germplasm provide abundant gene resources for the discovery of genes that control important traits. In the present study, a set of 253 non-glutinous germplasm of rice cultivars including 83 indica and 170 japonica accessions were investigated for allelic differences of SSRGs and phenotypic variation of ECQs. The indica cultivars contain both homozygous alleles of Wx a (in a large amount of accessions) and Wx b (in a small amount of accessions) with a wide range of AC from 12.5 to 28.5%, while all the japonica cultivars have the homozygous Wx b allele with AC ranging from 8.0 to 19.0%. This indicates that a large variation in AC existed with interspecies and intraspecies of rice cultivars and AC would be controlled by genetic factors other than Wx locus [25,26]. Genome-wide association mapping coupled with candidate genes was utilized for identifying new genetic factors and verifying effects of the known genes on investigated traits. Given that the linkage disequilibrium could be caused by the admixture of subpopulations and might result in a high number of false positives [27], population structure has to be estimated in the association analysis. Cultivated rice has undergone a longterm domestication and evolution, artificial selection and different ecological conditions contributed to the differentiation of cultivated rice, especially the subspecific differentiation [28,29]. As expected, a strong population  differentiation was detected between two subspecies of indica and japonica rice cultivars in this study. For the japonica subpopulation, two indica accessions from South Korea that harbored Wx b allele were grouped with all the japonica accessions, indicating the similar genetic background with japonica rice. The different loci or alleles associated with the same ECQ trait in the two subpopulations provided access to compare genetic factors between indica and japonica background. Similar to previous reports, the Wx gene was responsible for most of the ECQ traits including AC, GC, PV, HPV, BDV, CPV, SBV, CSV and PeT either in indica or japonica subpopulation [15]. For AC, Wx-S1 (Wx a /Wx b allele) and Wx-S2 (CT variation) were not the most significant associated sites identified in indica subpopulation, but RM588, the Wx adjacent marker, was the major locus explaining the largest phenotypic variation. RM170 was the most upstream associated locus for AC identified in indica subpopulation and was recognized as a Wx allele because of their close positions, suggesting that the difference of the upstream regulatory region of Wx gene could partly explain AC variation among indica rice accessions. In contrast, two association loci (Wx-S3 and RM550) for AC detected in japonica subpopulation were different with those detected in indica subpopulation. The Wx-S3 loci was mapped for AC in japonica subpopulation for both year. Wx mp was an allele mutated from G to A in exon 4 of Wx b resulting in the dramatically decrease of amylose content [30]. Due to lower amylose content, the carrying rice materials presented higher stickiness and lower hardness than the rice varieties with Wx b . RM550 on rice chromosome 2 was the other associated loci for AC identified in japonica subpopulation across both years. A QTL, qAC2, flanked by RM3730 and RM6933 on rice chromosome 2 has been previously detected and fine-mapped in populations derived from crosses between Koshihikari and Akihikari [26,31]. By comparing their genomic region, RM550 and qAC2 were not the same QTL. The allelic differences of Wx-S3 and RM550 could give the main explanation of AC variation among the japonica rice accessions.
Four associated loci for GC were detected in indica subpopulation and RM588 was the locus explaining the largest phenotypic variation of GC, which were the same loci as identified for AC in indica subpopulation. This was in agreement with previous findings that GC was mainly controlled by the Wx alleles [11]. But no locus for GC was detected in japonica subpopulation, probably because of lower phenotypic variation among these rice accessions.
For ASV, two association loci SSIIa-S1 and SSIIa-S2 detected in indica subpopulation showed a high proportion of phenotypic variation, while four loci SSIIa-S1, SSIIa-S2, RM5331 and RM1375 identified in japonica population had a minor proportion of phenotypic variation. The loci RM5331 and RM1375 were linked with the soluble starch syntheses genes SSIIa and SSIIb respectively. This result indicated that SSIIa alleles were randomly distributed in both indica and japonica subpopulations, even though the ASV average in japonica subpopulation was higher than that in indica subpopulation. From the previous reports, allelic variation in SSIIa was identified by exploring the interspecies genetic variation between indica and japonica, due to the intermediate length enrichment of amylopectin chains [2,3]. In the present study, allelic variations of SSIIa without the tendency of subspecies might be due to gene introgression between indica and japonica varieties during longterm domestication and artificial selection [32].
Most of loci for RVA viscosity parameters detected in both subpopulations were SSRGs or adjacent with SSRGs. Of these, Wx alleles and its neighboring markers were mapped by most of RVA traits from both subpopulations, which was consistent with previous reports [33][34][35][36]. Other loci were separately located on different chromosome regions. For example, the loci for BDV, PeT and Ptemp associated with SSIIa, the loci for HPV, CPV and PeT associated with SBE4, the locus for Ptemp associated with SSIIIb and the locus for HPV associated with AGPLar were found in japonica subpopulation. The loci for SBV and PeT associated with SSI and the locus for PV associated with SBE1 were identified in indica subpopulation. Notably, in addition to AC, RM550 was also associated with SBV and CSV in japonica subpopulation, suggesting that it was an important locus for improving eating qualities in rice materials with japonica background. Other loci for RVA parameters associated with SSR markers were detected in individual subpopulation and only in 1 year, which distributed on Chr. 1, 4, 7, 8, 9 and 10. The accuracy of these loci needs to be verified in future, even though most of them have been identified in natural or linkage populations based on different genetic background [15,[37][38][39].

Conclusions
We described a strategy to search for specific genetic factors of rice ECQs in rice subpopulations of indica and japonica cultivars. Wx alleles had the largest effect on AC, GC and most pasting properties. The novel associated locus RM550 on chromosome 2 identified in japonica subpopulation displayed the minor effects on rice AC and RVA pasting, which was irrespective of the effect of Wx. The associated alleles of SSRGs or their adjacent markers, RM588, Wx-(CT) n , SSI and SBE1 for indica subpopulation and RM550, Wx mp , SSIIa and SBE4 for japonica, were specific genetic factors for rice subspecies, which can provide efficiency and precision in marker-assisted selection for indica and japonica breeding. The availability of these markers will facilitate the development of new rice varieties with the desired ECQ traits. After being air-dried and stored at room temperature for 2.5 months, all rice samples were dehusked to brown rice with a rice huller (SY88-TH, Sanyong, South Korea) and subsequently polished using a grain polisher (Kett, Tokyo, Japan). The polished rice was ground to powder in a mill (Foss 1093 Cyclotec Sample Mill, Sweden) and sieved through a 100-mesh sieve to obtain rice flour samples for phenotypic measurement.

Phenotypic evaluation
The AC was determined using the iodine staining method described in the European Standard EN ISO 6647-2-2015. The AC was calculated using a standard curve made from four rice samples with known AC. ASV was measured visually by soaking the milled rice grains in 1.7% KOH for 23 h at 30°C [40]. The GC was determined following the method of Cagampang et al. [41] with minor modifications. RVA profile was determined using a Cereal Visco Analyzer (TechMaster, Perten, Sweden). The heating profile was setup according to the method of AACC61-01 and 61-02. The output RVA parameters include peak viscosity (PV), hot paste viscosity (HPV), cool paste viscosity (CPV), pasting temperature (Ptemp) and peak time (PeT). Three secondary parameters were calculated from the output parameters: breakdown viscosity (BDV = PV -HPV), setback viscosity (SBV = CPV -PV) and consistency viscosity (CSV = CPV -HPV). Each trait was measured in three replicates.

Molecular marker genotyping
Young leaves at the tillering stage were collected from one seedling of each cultivar and genomic DNA was extracted with the CTAB extraction buffer. For genotyping, a total of 210 molecular markers, comprising 157 SSR (simple sequence repeat), 19 STS (sequence tagged site) and 34 genic markers derived from 15 SSRGs were used (Additional file 1: Table S3).

Genetic diversity, phylogenetic analysis and population structure
The summary statistics including the number of alleles per locus, major allele frequency, gene diversity, PIC values, and classical Fst values were determined using Power Marker version 3.25 [42]. Nei's distance [43] was calculated and used for the unrooted phylogeny reconstruction using neighbor-joining method as implemented in Power Marker with the tree viewed using MEGA 4.0 [44].
Analysis of population structure among rice accessions was performed using the software package STRUCTURE V2.2 [45,46]. The optimum number of populations (K) were selected after five independent runs of a burn-in length of 50,000 followed by 10,000 iterations for each value of K (testing from K = 2 to K = 10). A model-based clustering method was applied for identifying subgroups with distinctive allele frequencies and setting each individual into different K cluster, where K is chosen in advance but can be varied for independent runs of the algorithm. The most likely number of cluster (K) was selected by comparing with the logarithmized probabilities of data (Pr [X|K]) [47].

Association analysis
To compare phenotypes of two subpopulations identified by STRUCTURE V2.1, ANOVA was employed with the SAS program version 8 (SAS Institute Inc., Cary, NC), followed by multiple means comparisons. To control for false positives, the Q + K mixed linear model was used for the genome-wide association mapping with TASSEL V5.0 [48]. The Q matrix was obtained from the analysis results of STRUCTURE V2.1. The K matrix was generated from the analysis results using the kinship matrix function in TASSEL V5.0. The P value determines whether a marker (QTL) is significantly associated with the trait and the -LogP value higher than 4 was set as a threshold for strong associations. The R 2 (the fraction of the total variation explained by the marker) for each locus was estimated using TASSEL 5.0.
Additional file 1: Table S1. The allele number, gene diversity and polymorphism information content of each molecular marker genotyping in 253 accessions. Table S2. Pairwise correlation values among 11 eating and cooking quality traits in the whole population and two subpopulations in 2015 and 2016. Table S3. The polymorphic molecular markers and sequences for tagging starch synthesis-related genes.