Genetic characterization and genome-wide association mapping for stem rust resistance in spring bread wheat

Background Emerging wheat stem rust races have become a major threat to global wheat production. Finding additional loci responsible for resistance to these races and incorporating them into currently cultivated varieties is the most economic and environmentally sound strategy to combat this problem. Thus, this study was aimed at characterizing the genetic diversity and identifying the genetic loci conferring resistance to the stem rust of wheat. To accomplish this, 245 elite lines introduced from the International Center for Agricultural Research in the Dry Areas (ICARDA) were evaluated under natural stem rust pressure in the field at the Debre Zeit Agricultural Research Center, Ethiopia. The single nucleotide polymorphisms (SNP) marker data was retrieved from a 15 K SNP wheat array. A mixed linear model was used to investigate the association between SNP markers and the best linear unbiased prediction (BLUP) values of the stem rust coefficient of infection (CI). Results Phenotypic analysis revealed that 46% of the lines had a coefficient of infection (CI) in a range of 0 to 19. Genome-wide average values of 0.38, 0.20, and 0.71 were recorded for Nei’s gene diversity, polymorphism information content, and major allele frequency, respectively. A total of 46 marker-trait associations (MTAs) encompassed within eleven quantitative trait loci (QTL) were detected on chromosomes 1B, 3A, 3B, 4A, 4B, and 5A for CI. Two major QTLs with –log10 (p) ≥ 4 (EWYP1B.1 and EWYP1B.2) were discovered on chromosome 1B. Conclusions This study identified several novel markers associated with stem rust resistance in wheat with the potential to facilitate durable rust resistance development through marker-assisted selection. It is recommended that the resistant wheat genotypes identified in this study be used in the national wheat breeding programs to improve stem rust resistance. Supplementary Information The online version contains supplementary material available at 10.1186/s12863-022-01030-4.

East Africa [9,10]. The race can infect 90% of the wheat varieties grown worldwide [11] and yield losses can reach up to 100% in susceptible cultivars under conducive environmental conditions [12]. Races other than Ug99 were also reported in different parts of Western Europe. In 2013, a stem rust epidemic arose in Germany and spread to Denmark, Sweden, and the UK [13]. In 2016/2017, Italy chronicled two epidemics of wheat stem rust caused by race TTRTF, which destroyed tens of thousands of hectares of cultivated wheat [14]. All these reports indicate that the disease is re-emerging as a threat to wheat production globally.
Ethiopia is considered to be a hotspot for the development and evolution of new Pgt races [15]. Many new variants of Pgt, which were first identified in this country, have spread to different parts of the world. TTKSK, TKTTF, TRTTF, JRCQC, and TTTTF are the current major wheat stem races that are threatening wheat productivity in Ethiopia [16]. In 2013/2014, severe stem rust epidemics were caused by Pgt race TKTTF (not a member of Ug99 lineage), resulting in almost total yield loss on widely grown wheat cultivars. Since then, this race has spread widely and has been found in 10 different countries, including Western Europe [17].
To overcome this problem, host plant resistance developed through molecular marker technology is the most sustainable, cost-effective, and environmentally friendly approach for controlling rust diseases [7,18]. Accordingly, many molecular markers linked with Pgt resistance were discovered throughout the wheat genome during the past couple of decades using genome-wide association mapping (GWAS). GWAS has been the most effective tool to detect several quantitative trait loci (QTLs), with moderate to minor effects against Pgt disease [19]. However, factors such as population structure and kinship similarity should be controlled properly to avoid false-positive QTLs. To overcome this, several models, including the mixed linear model (MLM) have been implemented. Since the first report in 2007 [20], various GWAS studies were carried out successfully and high numbers of QTLs have shown Pgt resistance in wheat [21][22][23][24]. So far, more than 80 genes conferring resistances to Pgt have been cataloged in common wheat and wheat relatives [24]. However, only a few genes are effective against all pathogen strains. Of these, Sr2, Sr13, Sr22, Sr25, Sr26, Sr35, Sr39, and Sr40 were reported to be the most effective against Ug99 [18].
The frequent co-evolution of host and pathogen remains a big challenge in the durability of the released resistant cultivars [25]. The narrow genetic diversity of cultivated wheat cultivars [22,26] and the impact of climate change [12] are the major cause of this problem. Thus, additional sources of resistant QTLs, followed by marker-assisted gene pyramiding, are required to produce durable resistant varieties. Therefore, this study aimed to characterize the genetic diversity and to identify novel QTLs associated with resistance to stem rust of wheat through GWAS.

Phenotypic variation and heritability
The performance of genotypes towards stem rust resistance varied greatly. For instance, the disease severity score was ranged between 10 and 80%. The majority (46.7%) had a disease severity (DS) score of 15-30% whilst 8.5% had a DS score of 0% (Fig. 1, Additional file 1). The best linear unbiased estimates (BLUP) values of DS and coefficient of infection (CI) were calculated from adjusted means of each accession across two years, and are summarized in Fig. 1.
The data of disease severity (DS) and infection response (IR) were combined to define the disease response as the coefficient of infection (CI) and 71% of lines had less than 30 (Fig. 1C). Of these, the top twenty resistance lines (presented in Table 1) ranged with the average CI values of 4.5 for pedigree SERI.1B//KAUZ/HEVO/3/AMAD/4/ CHAM-6/FLORKWA-2 to 12 for pedigree SERI.1B// KAUZ/HEVO/3/AMAD/4/WEAVER/JACANA. Additional genotypes scored between 6 to 80 of CI and are presented in Additional file 1. On the other hand, all local controls (i.e. Digelu, Kubssa, Hidasse, Honqolo, and Ogolcho) were susceptible, with average CI ranging from 60 for HIDASSE to 80 for OGOLCHO and HONQOLO.
The ANOVA analysis revealed highly significant variation among genotypes (P < 0.001) and genotype x year interactions (P < 0.001) for all parameters. Heritability (H) values for DS were 79% and IR 72%, suggesting that all parameters had a strong genetic basis. In addition, the disease distribution of the breeding lines was high between seasons, with average correlations of 0.76, 0.85, and 0.78 for DS, IT, and CI, respectively ( Table 2).

Population structure and genetic diversity analysis
The population structure of the panel was inferred through the Bayesian clustering model, principal component analysis (PCA), and neighbor-joining (NJ) tree. The Bayesian clustering model applied on STRU CTU RE software and subsequent application of STRU CTU RE HAR-VESTER showed a delta K peak value of two ( Fig. 2A). As a result, accessions were classified into two sub-populations composed of 106 and 139 lines in sub-populations 1 and 2, respectively (Fig. 2C). The scree plot of PCA showed that weak kinship existed among the lines. For the first 10 principal components (PCs), variances of SNP markers were changed from 7.5% (PC1) to 2% (PC10) and between 0 and 2% after PC10 (Fig. 2B).
Phylogenetic tree analysis of the genetic relationship between the populations was carried out based on the distance-based neighbor-joining tree on TASSEL software v5.2.35 followed by web-based visualization software iTOL. The resulting dendrogram shows three phylogenetic groups color-coded with a STRU CTU RE probability distribution. This is not consistent with the STRU CTU RE result (which was two groups). Since the genotypes are elite lines, passed by complex breeding history, such inconsistency is expected. However, the majority of lines were still grouped in the same group as the STRU CTU RE result and some lines were grouped in the mixed group. For instance, 78 (56%) of the lines in the first group were composed of a sub-population 1, whereas 61 lines (44%) were categorized in subpopulation 2. The second group was composed mainly of subpopulation 2, which consists of 49 (70%) lines; whereas 21 (30%) lines were classified in subpopulation 1. The third group was composed of 58 (76%) lines from subpopulation 2, and 21 (30%) lines from subpopulation 1 (Fig. 3).

Genetic data and linkage disequilibrium
Once sub-optimal quality markers had been filtered out, 9523 SNP markers were retained from 245 lines. The distribution of SNPs across the A, B, and D sub-genomes was 50, 39, and 11%, respectively. The maximum number of SNP markers was recorded on chromosome 2B (930) and the minimum number was on chromosome 4D (48) (Fig. 4). The mean genome-wide heterozygosity, genomewide polymorphic information content (PIC), and gene diversity were 0.006, 0.2, and 0.38, respectively. The PIC scores of SNPs varied, with only 1% being highly informative (> 0.5), while 75 and 24% of markers had moderate (0.25-0.5) and least (< 0.2) PIC scores, respectively.
Linkage disequilibrium decay based on SNP markers of each chromosome was calculated as the Pearson correlation coefficient (r 2 ) between marker pairs as a function of genetic distance (cM). The LOESS curve intercepted the line of critical value at 6 cM in A genome, 8 cM in B genome, and 5 cM in D genome, indicating that all markers within these ranges were considered as a single locus (Fig. 5).

Marker-trait associations
A mixed linear model (MLM) was implemented for MTA, including the population structure and kinship similarity matrix (Q + K) and the BLUPs estimated values of CI of genotypes and quality checked SNP markers. The model appropriately discovered valuable MTAs with neither inflation (false-positive/type I error) nor overcorrection (false-negative/type II error) problems as depicted from the Q-Q plot (Fig. 8).

Discussion
Stem rust has been increasing in severity and incidence and now poses a serious threat to global wheat production [8]. To overcome this threat, efforts are ongoing  worldwide to monitor rust diseases, identify rust pathotypes, and evaluate wheat germplasm for rust resistance [36]. As part of the global effort, this study was designed to quantify the existing allelic variation of breeding lines and to search for sources of resistant QTL for Pgt resistance. Consequently, 245 elite bread wheat lines were evaluated in the field condition to identify QTLs for adult plant resistance to wheat stem rust. A significant variation was observed between breeding lines for adult plants' resistance to stem rust. This study detected several MTAs included in 11 different QTLs with different effects that could potentially play an important role in future marker-assisted pyramiding against the disease.

Field evaluation of wheat germplasm for resistance to stem rust
Disease response characterization under high disease pressure in field conditions remains the best stem rust management strategy in breeding for developing stem rust-resistant cultivars [37]. Ethiopia is considered to be a hotspot for the development of Pgt race diversity and frequent disease epidemics. Studies carried out in Ethiopia showed that most previously identified races such as TTKSK, TKTTF, TTTTF, TRTTF, RRTTF, and others were virulent on most varieties grown in the country [38]. Accordingly, many field evaluation studies for Pgt response have been carried out in different wheat-growing regions of the country [22,39,40]. Most elite breeding lines skewed towards moderate resistance, although some differences were observed between individual genotypes. The two parameters (i.e. DS and IR) showed moderate to high heritability with significant variation among lines and genotype x year interactions, indicating that most of the existing variation was due to genetic bases. CI has been used as the most efficient trait to discover QTLs of stem rust resistance in wheat via GWAS analysis [23,41].

Population structure and genetic diversity
Systematic characterization of population structure and genetic diversity provides a foundation for efficient exploitation of genetic resources and can enhance breeding for durable stem rust resistance in wheat. For the population structure study, three different approaches were applied. Although there is considerable overlap between the three techniques for population analysis, the overall conclusion suggests that there is no clear and substantial separation between individual genotypes. This could be owing to the panel's complicated evolutionary and breeding history. It is suggested that more researches need to be done to better understand the relationships between genotypes from different groups. The mean PIC and gene diversity of the genome was 0.25 and 0.3, respectively. Several studies have previously reported various rates of Nei's gene diversity and PIC in different wheat populations [22,34,[42][43][44].

Linkage disequilibrium and MTAs
The LD of the genome and sub-genomes of the current panel was estimated using SNP markers. The fastest LD decay was observed on the D sub-genome, which agreed with the previous report [45]. The LOESS curve intercepted the line of critical value at 6 cM in A genome, at 8 cM in B genome, and 5 cM in the D genome, indicating that all markers within these ranges are considered as a single locus (Fig. 5). Since many significant SNP markers (36 MTAs) were identified in the present study, the LD pattern in chromosome 1B was analyzed independently and detected five LD blocks (Fig. 6). Similar large LD blocks in 1B chromosomes have been reported previously [46].
On chromosomes 3A, 3B, 4A, 4B, and 5B, the additional six QTLs containing 10 MTAs were discovered. The marker Tdurum_contig777_260 (IWB73429) designated as EWYP3A QTL was adjacent to the all-stage resistance gene Sr27 which is transferred from Secale cereale and Sr35 gene which is transferred from Triticum monococcum [32]. Because EWYP3A is so close to the Sr27 gene area, Sr27 is most likely the underlying gene for this region. On the short arm of chromosome 3B, Sr2 came from Triticum dicoccum and Sr12 originated from Triticum turgidum ssp. were cataloged previously [32]. On this chromosomal, we discovered the EWYP3B.1 QTL, which consists of three markers (Tdurum_contig12899_342, Excalibur_c20277_483, and Tdurum_contig12008_803). The nearest Sr gene to these markers was the Sr2 gene [30,31]. This Sr2 gene has been extensively used in breeding as a source of durable and broad-spectrum adult plant resistance. Individual genotypes carrying the favorable allele of these SNP markers have shown an apparent difference in the CI score (Additional file 1). On chromosome 5B, the SNP marker RAC875_c30011_426 (IWB56412) explained 5.7% of the total phenotypic variation. This marker is found near the chromosome region previously discovered the Sr56 gene that confers the APR to wheat stem rust [32,33]. To the best of our knowledge, the other three QTLs, EWYP3B.2, EWYP4A, and EWYP4B, which were found on chromosomes 3B, 4A, and 4B, respectively, have never been reported and could potentially be novel QTL sources for stem rust resistance breeding programs.

Conclusions
This study characterized the genetic diversity of elite ICARDA breeding lines and performed GWAS based on the evaluation of field stem rust. As a result, substantial genetic variability and field disease response to Pgt was observed among the lines. The study detected several potentially novel loci associated with Pgt resistance. These markers could provide useful genetic information to unlock the genetic basis of resistance to Pgt in wheat. Furthermore, the result will accelerate the introgression of identified resistance QTLs in the wheat breeding program through marker-assisted introgression. The identified resistant lines could also be used as crossing parents in stem rust-resistant breeding programs.

Plant materials, field stem rust trials, and disease pathotyping
A set of 245 elite breeding lines was obtained from the International Center for Agricultural Research in the Dry Areas (ICARDA) shuttle breeding program. Field screening was conducted in Ethiopia for two consecutive cropping seasons (2018 and 2019) at the Debre Zeit Agricultural Research Center (DARC). DARC is located at 08° 44′ N latitude and 38° 58′ E longitude and 1900 m.a.s.l with 19 °C annual average temperature and 851 mm rainfall. The experiment was conducted using an augmented design, including five local cultivars (Digelu, Kubssa, Hidasse, Honqolo, and Ogolcho) as checks. Each line was planted in a 1 m long single row and the distance between rows was 30 cm. The border of each block was surrounded by susceptible local spreader wheat varieties to promote natural stem rust infection.
Stem rust phenotyping was conducted based on disease severity (DS) and infection response (IR) under natural disease pressure [50]. Both parameters were recorded three times for each line in each year. The highest recorded value was then taken for the GWAS analysis after calculating the coefficient of infection (CI) from the two parameters (i.e. DS and IR).
The CI was calculated by multiplying the DS by a constant value of IR recorded according to Yu et al. (2011).

Statistical analysis of phenotypic data
Analysis of variance (ANOVA) was performed for DS, IR, and CI using the nlme package in the R 4.0.2 environment (Pinheiro et al., 2020) fitting the value of DS, IR, and CI as a function of lines, years, and a combination of lines and years. To determine the consistency of DS, IR, and CI, Pearson correlation coefficients between seasons were calculated.
Broad-sense heritability (H 2 ) was calculated using the following formula: Where σ 2 G is the genotypic variance, σ 2 E is the environmental variance, σ 2 GXE is the genotype by environment interaction variance, σ 2 error is the residual error variance and n is the number of years.
To reduce false-positive associations, best linear unbiased predictors (BLUPs) for CI were calculated using a mixed model in lme4 package implemented in R environment [51] according to the following model where y is the response variable:

Population structure and genetic diversity
The optimal sub-populations of the panel were estimated based on three different approaches. The Bayesian model-based population structure was estimated from 100 unlinked SNP markers located at least 10 cM apart across the genome using STRU CTU RE 2.3.1 software [52,53]. To execute this, three independent runs were performed for each hypothetical K value run from 2 to 15 with the length of the burn-in period of 10,000 steps followed by 100,000 Monte Carlo Markov Chain (MCMC). The results obtained from this procedure were used in a web-based informatics tool namely, "Structure Harvester" [54] to define the optimal K value, based on ∆K method Evanno, 2005 [55]. Each genotype was assigned to one subpopulation based on its membership probability. The second approach used to determine the optimal subpopulation was based on a marker-based kinship matrix (K matrix) on a scaled identity-by-state method using the whole set of SNP markers from TAS-SEL 5 software [56]. Finally, the principal components analysis (PCA) of genetic relatedness was performed with the same software and added to the regression model as a covariant.
Genetic diversity was estimated based on polymorphic information content (PIC), heterozygosity, and Nei's gene diversity using the whole set of SNP markers from Pow-erMarker 3.25 software [57]. Phylogenetic analysis based on distance-based neighbor-joining method was calculated with TASSEL 5 software and visualized through web-based program iTOL (v 4.3.2) [58].
Genotyping, linkage disequilibrium, and genome-wide association analysis DNA extraction of lines was carried out on one-week-old seedlings following the protocol described by Allen et al. (2006) [59] using Cetyeltrimethylammonium bromide y = lmer Trait ∼ 1|Genetype + (1|Year) (CTAB). Genotyping was performed by Illumina iSelect 15 K single nucleotide polymorphism (SNP) wheat array and called by GenomeStudio V2011.1 software. The resulting 13,006 SNPs were further screened using those only minor allelic frequency (MAF) > 5%, and missing data percentage of < 10%. Five lines were excluded as a result of this screening. Finally, 9523 quality SNP markers were generated from 245 lines that were used for further analysis.
The resulting SNP data were subjected to linkage disequilibrium (LD) analysis as squared allelic frequency correlations (R 2 ) between each pair implemented in TASSEL v5.2 and GAPIT (Genomic Association and Prediction Integrated Tool) R package [60]. The critical R 2 value (where the LD is due to the physical linkage) was determined by taking the 95% of R 2 data of unlinked markers as the threshold, according to Breseghello and Sorrells (2006) [61].
Marker-trait association analysis (MTAs) between the BLUP value of CI and SNPs markers were analyzed using a mixed linear model (MLM) in TASSEL 5.2 software. Using the formula: y = Xα + Qδ + Kμ + e; where y = phenotypic values, X is SNP marker genotypes, α is a vector containing fixed effects as a result of the genotype, Q is population structure as PCA, δ is a vector containing fixed effects resulting from population structure, K is the relative kinship matrix, μ is a vector of random additive genetic effects and e is a vector of residuals. Marker trait associations were declared significant at a threshold value of -log 10 (p) ≥ 3 (corresponding p value ≤ 0.001) [62].