Identification of putative markers linked to grain plumpness in rice (Oryza sativa L.) via association mapping

Background Poor grain plumpness (GP) is one of the main constraints to reaching the yield potential of hybrid rice. Results In this study, the GP of 177 rice varieties was investigated in three locations across 2 years. By combining the genotype data of 261 simple sequence repeat (SSR) markers, association mapping was conducted to identify the marker-GP association loci. Among 31 marker-GP association loci detected in two or more environments and determined using general linear model (GLM) analyses, seven association loci were also detected using mixed linear model (MLM) analyses. The seven common loci detected by the two analytical methods were located on chromosomes 2, 3 (2), 7, 8 and 12 (2) and explained 7.24~22.28% of the variance. Of these 7 association loci, five markers linked to GP were newly detected: RM5340 on Chr2, RM5480 and RM148 on Chr3, RM1235 on Chr8, and RM5479 on Chr12. Conclusions Five marker-GP association loci were newly detected using both the GLM and MLM analytical methods. Elite allele RM505-170 bp had the highest average phenotypic effect on increasing the GP, and the typical carrier variety was ‘Maozitou’. Based on the distribution of the elite alleles among the carrier varieties, the top 10 parental combinations for improving the GP in rice via cross-breeding were predicted. Electronic supplementary material The online version of this article (10.1186/s12863-017-0559-6) contains supplementary material, which is available to authorized users.


Background
Rice (Oryza sativa L.) is the main staple food for more than 50% of the world's population [1]. As the amount of arable land area decreases, higher rice yields will be needed to meet the needs of the increasing world population [2]. The grain yield of rice per unit area of land is determined by the panicle number, grain number per panicle and grain weight. When the panicle number per unit area of land and grain number per panicle are optimized, improving the grain weight plays a key role in further increasing the yield in rice breeding programmes [3]. The grain weight is closely related to the grain size and grain plumpness (GP) [4], and the grain size is determined by the grain length, grain width and grain thickness.
To our knowledge, 15 quantitative trait loci (QTLs) related to GP have been mapped to date, and they are distributed on chromosomes 1 (3), 2 (1), 5 (2), 6 (3), 7 (3), 8 (1), 11 (1) and 12 (1) [18][19][20]. However, no genes for GP have been cloned. Other studies have found that GP is closely related to sucrose synthesis and transport. Two rice sucrose synthase genes-SUS3 on Chr7 and SUS4 on Chr3-may be involved in carbon allocation in filling grains [21]. The cell-wall invertase gene-OsCIN1 on Chr2-plays an important role in providing a carbon source to develop filial tissues during the early course of grain filling in the caryopsis [22]. The gene grain incomplete filling 1 (GIF1) on Chr4 regulates sucrose transport and uploading during the grain-filling stage, and the overexpression of GIF1 can increase grain filling and final grain weight [23].
Most QTLs for GP have been detected based on linkage mapping using bi-parent-derived populations, and only two alleles at a given locus have been studied [24]. Association mapping, which is a new approach, has greater power to detect more alleles or alleles with weak effects [25]. In this paper, we reported elite alleles for GP detected by association mapping using a population composed of 177 rice varieties and 261 simple sequence repeat (SSR) markers and their carrier varieties.

Geographical distribution of varieties used and field planting
The 177 rice varieties used in this study represent a subset of our previous reports [24,26]. Among them, 148 were from China, and 29 were from Vietnam (Additional file 1: Table S1). The varieties were distributed from 17.00°N to 41.81°N. The 177 varieties were grown from May to November 2013 and 2014 at three locations: Jiangpu Experimental Farm (JEF; 118.62°E, 32.07°N), Nanjing Agricultural University, Jiangsu province, and Xinyang Farm (XF; 114.12°E, 32.10°N) and Yuanyang Farm (YF; 113.96°E, 35.05°N), Henan Academy of Agricultural Sciences, Henan Province, China. JEF and XF are located at almost the same latitude, but their longitudes differ by 4.5°. XF and YF are located at almost the same longitude, but their latitudes differ by 3°. Seedlings aged approximately 30 days were transplanted to the paddy field by hand each year at each location. Each plot consisted of five rows with eight plants per row, and the spacing was 17 cm × 20 cm. The field trial was arranged using a completely randomized block design with two replications at each location.

Phenotyping
The main stem panicles of the 10 plants in the middle three rows of each plot were harvested, threshed and dried under natural sunshine to 13% moisture. All dried spikelets were placed on a translucent lamp box, and the empty grains (unfertilized spikelets) were selected by hand. Then, the full grains were separated from the remaining mixed filled grains (full plus shrunken) using a salt solution with a specific gravity of 1.1. The full grains and shrunken grains were then dried at 105°C for 24 h to constant weight. The measurements of the full grains and mixed filled grains for each plot were replicated three times. The GP was calculated using the following formula: where A c represents the average thousand-grain weight of the mixed filled grains of variety c, and B c represents the average thousand-grain weight of the full grains of variety c [27].

Genotyping
The SSR molecular marker genotype data published in [26] were used in this study, except for RM433 on chromosome 8, which showed no polymorphism among the 177 accessions. The base pair start positions on the chromosomes for each SSR marker are presented (Additional file 2: Table S2) for calculating the physical distance between markers on the chromosomes.

Data analysis
The phenotypic data were statistically analysed using Microsoft Excel 2010. The broad-sense heritability was computed using the formula [28] where σ 2 g is the genetic variance, σ 2 e is the error variance, and n is the number of replications.
Two methods were used to detect the population genetic architecture of the 177 accessions. The first was the Bayesian cluster analysis approach, which was implemented using STRUCTURE version 2.2 [29]. The second was the neighbour-joining method, which was carried out using MEGA version 5.0 based on Nei's genetic distance [30]. The computations followed the same approach as those described in [31]. The coefficient of genetic differentiation (F ST ) [32] was calculated to measure the fixation of different alleles in different subpopulations using Arlequin version 3.0 [33]. The number of alleles per locus, gene diversity and polymorphism information content (PIC) were determined using PowerMarker version 3.25. The r 2 value [34] calculated via TASSEL version 2.1 [35] was used as the preferred measure of linkage disequilibrium (LD).
Two models, the general linear model (GLM) and the mixed linear model (MLM), were used to analyse the associations between GP and SSR markers with TASSEL version 2.1. In the GLM, only the Q matrix was used as a covariate, while in the MLM, both the Q matrix and kinship matrix were used as covariates [36]. The kinship matrix was calculated via SPAGeDi to estimate the genetic relatedness among individuals [37]. A false discovery rate (FDR) of 0.01 was used as a threshold for significant associations [38]. Based on the identified association locus, the 'null allele' (non-amplified allele) was used to determine the phenotypic effects of other alleles [39]. Alleles with frequencies of less than 5% in the population were regarded as rare alleles and treated as missing data. The following formula was used to calculate the positive (negative) average allele effect (AAE) of each locus: where ∑a i is the positive (negative) allelic phenotypic effects of locus i, and n i is the number of positive (negative) alleles within locus i.

Phenotypic variation and genetic diversity in the population studied
Among the six environments, the mean GP values were higher than 90%, and the coefficient of variation ranged from 3.35% to 4.22%. The broad-sense heritability for the GP was greater than 90% in each environment (Table 1). No significant differences were detected over 2 years at any location, indicating that the GP is influenced mainly by genetic factors. A two-way analysis of variance (ANOVA) showed that the differences in GP among the 177 varieties were significant at the α = 0.01 probability level, indicating that a large amount of genetic variation existed in the entire population. Highly significant correlations (α = 0.01) were observed for the GP trait between the 2 years at each location. The coefficients of correlation between the 2 years were 0.814 at YF, 0.975 at XF and 0.974 at JEF. The coefficients of correlation between the pairs of locations were 0.432 (r JEF-XF , JEF and XF), 0.312 (r YF-XF , YF and XF) and 0.367 (r YF-JEF , YF and JEF). Thus, the variation tendency of the GP was consistent across years and locations.
Of the 1948 alleles amplified by 261 SSR marker loci in the 177 varieties, 35.14% were rare alleles with frequencies less than 5%. The average number of alleles per SSR locus was 7.46 and ranged from 2 to 20. The average gene diversity was 0.6734 and ranged from 0.0223 (RM140 on Chr1) to 0.9152 (RM7545 on Chr10). The average PIC value was 0.6395 and ranged from 0.0221 (RM140 on Chr1) to 0.9091 (RM7545 on Chr10) (Additional file 2: Table S2).

Genetic architecture of the original population
Although the 177 accessions represent a subset of the 540 accessions used in our previous reports [26], the present population was still divided into seven subpopulations using ΔK as the diagnostic criterion (Additional file 3: Figure S1a and 1b). This may be caused by a broad geographical distribution (17.00°N to 41.81°N), although the number of points was reduced. Based on the criterion of Q > 0.900, 17 varieties were assigned to the admixed group and were not analysed further. The other 160 varieties were reanalysed using the STRUCTURE software package, and they were clearly differentiated into seven subpopulations with Q > 0.900 for each variety ( Fig. 1c; Additional file 1: Table S1). The neighbour-joining tree constructed based on Nei's genetic distance [30] supported the finding that the original population was composed of seven subpopulations (i.e., SP1 to SP7; Fig. 1d). The numbers of varieties included in SP1-SP7 were 29, 28, 25, 12, 20, 28 and 18, respectively. The varieties in SP3 and SP4 are mainly from Vietnam, whereas the varieties in the other five subpopulations are from China (Additional file 1: Table S1).

Pairwise F ST and Nei's genetic distance among subpopulations
The average F ST value of the seven subpopulations (160 varieties) was 0.6587. The F ST value between SP3 and SP4 was the lowest (0.5978), while that between SP1 and SP4 was the highest (0.7451). Nei's genetic distance between SP3 and SP1 was the longest (0.7624), whereas that between SP5 and SP2 was the shortest (0.5032) ( Table 2). In addition, when the F ST value between SP3 and SP4 was lowest, Nei's genetic distance between SP3 and SP4 was shorter, whereas when the F ST between SP1 and SP4 was highest, Nei's genetic distance between SP1 and SP4 was longer ( Table 2). These findings reveal that the pairwise F ST can reflect the genetic distance between subpopulations. Table 3 shows the levels of LD estimated for the entire population and the seven subpopulations. For the entire  Figure S3). This result further validated the significant LD of the SSR markers in the seven subpopulations and demonstrated that the LD decay velocity varied among these subpopulations.

Significant SSR marker-GP association loci detected in the population studied
In total, 31 association loci between the SSR marker and GP with P-values less than 0.01 were detected by both the GLM and MLM analyses in two or more environments (Additional file 6: Table S3). The GLM analysis revealed 31 marker loci associated with GP (P < 0.01), and the identified markers were located on all of the chromosomes except for chromosome 11 Table S3). The MLM analysis

Elite alleles for GP
The seven common marker-GP association loci from both the GLM and MLM analyses were considered to be robust loci associated with GP (Additional file 6: Table S3). Based on these seven markers, 15 elite alleles were mined in two or more environments (

Top parental combinations predicted for GP improvement
Based on the data in Additional file 6: Table S3, the alleles at seven significant marker-GP association loci in   Table S3). The top 10 parental combinations were predicted (Table 5) for improving the GP in rice via cross-breeding based on the data presented in Additional file 3: Table S3. For instance, 'Yuedao 5' had six elite alleles, and 'Ligengqing' had six elite alleles. Seven elite alleles could be pyramided into one plant using the combination 'Yuedao 5 × Ligengqing' , and as a result, the GP should theoretically be improved by 13.07% (Additional file 7: Table S4; Table 5). Figure 2 shows the unhulled rice grains and brown rice grains of the varieties corresponding to the predicted combinations to improve GP.

Discussion
GP in rice affects not only yield but also milling quality (recovery of head rice), especially in hybrid rice where heavy panicles often increase the yield [17,40]. Mining elite alleles for GP is beneficial for improving this trait.
In the present study, we used 177 accessions, representing a subset of the population (540 accessions) reported in [26], and 261 SSR markers to implement the discovery.
To avoid spurious associations in association mapping [41], we first evaluated the present population genetic architecture using two different analysis methods (STRUCTURE and Nei's genetic distance) and detected 7 subpopulations (Fig. 1), which was the same as those detected in [26]. We inferred that the population genetic structure was mainly affected by geographical location (ecotypes) and nearly unaffected by accession reducing. We also found that the LD decay distances for SP1-SP7 were 4.48 Mb, 5.48 Mb, 4.26 Mb, 3.53 Mb, 5.53 Mb, 5.58 Mb and 8.67 Mb, respectively. The fast decays in SP3 and SP4 could have resulted from rapid artificial hybridization in Vietnam, which should accelerate the recombination of the chromosomes and, thereby, weaken the LD. We calculated the average standardized individual allele sizes of the seven subpopulations according to the methods described by [42] and observed that the average allele sizes in SP3 and SP4 were significantly higher than those in the other subpopulations (Fig. 3). This fact further confirmed that directional evolution for the allele size has occurred in rice [24,31]. We also observed that the SSR alleles tended to decrease in size from the low-latitude subpopulations (SP3 and SP4, 17-23゜N) to the high-latitude subpopulations (the other five subpopulations, 30-39゜N); this behaviour may be explained by the emergence of mutations or changes in the mutation rate causing a change in the allele size in rice [43,44]. No significant differences were found between SP3 and SP4, possibly because of the short geographical distance between the two subpopulations. The same phenomenon was detected among the remaining five subpopulations (Table 6). Moreover, the high proportion of the rare alleles (35.14%) might be related to the geography of rice migration. New alleles appeared and certain original alleles disappeared with the changes in the cultivation environment, resulting in the emergence of varieties with rare alleles.
Furthermore, the number of detected marker-GP association loci decreased when the same population was cultivated towards the west and north. As shown in Additional file 6: Table S3, in JEF (32.07°N, 118.62°E), 24 and 21 marker-GP association loci were detected in 2013 and 2014, respectively, by the GLM analyses, and four (RM5480, RM1235, RM511 and RM5479) and three (RM5480, RM511 and RM5479) association markers were detected in 2013 and 2014, respectively, by the MLM analyses. In XF (32.10°N, 114.12°E), ten and six marker-GP association loci were detected in 2013 and 2014, respectively, by the GLM analyses, and two (RM148 and RM505) and three (RM148, RM505 and RM1235) association markers were detected in 2013 and 2014, respectively, by the MLM analyses. In YF (35.05°N, 113.96°E), four and seven SSR marker-GP association loci were detected in 2013 and 2014, respectively, by the GLM analyses, and only one association marker (RM5340) was detected by the MLM analyses in the both years. Furthermore, among the seven association loci detected by the two analytical methods (GLM and MLM), no identical association marker loci were found among the three locations except for RM1235 on Chr8. Thus, there are many gene loci underlying GP, and different genes exhibit different characteristics in different environments.
Based on previous studies, GIF1, as an important gene cloned associated with rice grain filling, encodes a cellwall invertase required for carbon partitioning during early grain filling. GIF1 is a potential domestication gene; thus, a domestication-selected gene can be used for further crop improvement. In our study, we found no markers associated with GP near the region of GIF1.
By comparing with other studies, we found that five of the seven SSR markers detected in this study were novel; the other two SSR markers were located near the chromosome regions harbouring grain filling and related QTLs or genes reported in previous studies (Additional file 8: Table S5).Among the seven SSR markers, RM505 on Chr7 was in the region of qGR-7-8 for grain-filling rate [20], and RM511 on Chr12 was the same as the SSR markers for grain-filling rate detected by [45], implying that GP was affected by the grain milk filling rate.
Although the PVE of the seven association loci ranged from 7.24% to 22.28%, the positive AAE values were weak (1.26% on RM1235 to 1.68% on RM505). This result might be explained by the coefficient of variation in GP in the population was not large enough (CV < 5%, Table 1) because the experimental materials were cultivars in different areas. However, because GP is a function of grain weight, a small improvement will contribute considerably to the grain yield of rice.
The elite alleles mined in this study may be used to improve the GP of hybrid rice. Among the top 10 parental combinations predicted, the combinations 'Yuedao 5 × Ligengqing' , 'Yuedao 22 × Ligengqing' , 'Yazihuang × Ligengqing' and 'Yazihuang × Huangsanshi' could theoretically improve the GP by 13.07% ( Table 5). Six of the ten combinations have 'Yuedao 5' as a parent, indicating that 'Yuedao 5' may be an excellent parent for GP improvement.

Conclusions
Seven marker loci were detected for GP, of which five were novel loci. Ten parental combinations were predicted for improving the GP in rice via cross-breeding.