Modelling of population structure through contemporary groups in genetic evaluation

Background Forest trees can occupy extensive geography and environmentally highly variable areas which result in high genetic variability in the direction of pressure from natural selection. At the same time, the majority of conifer species are wind-pollinated from both short and long distances, resulting in wide-spread gene flow, which can lead to maladaptation to local conditions. Quantitative analyses of provenance/progeny tests correct for genetic differences between populations to ensure unbiased genetic parameters are obtained. Commonly, the provenance effect is fitted as a fixed term or can be implemented as a contemporary group in the pedigree. Results The use of a provenance effect, either as a fixed term or as the same contemporary groups in both maternal and paternal sides of the pedigree, resulted in fairly similar precision of genetic parameters in our case. However, when we developed a phantom contemporary group for the paternal side of the pedigree that considered a different genetic quality of pollen compared with the maternal contribution from trees in the local environment, the model fit and accuracy of breeding values increased. Conclusion Consideration of the mating dynamics and the vector of gene flow are important factors in modelling contemporary genetic groups, particularly when implementing pedigrees within a mixed model framework to obtain unbiased estimates of genetic parameters. This approach is especially important in traits involved in local adaptation.


Background
Forest tree species usually occupy large geographical areas and are exposed to a wide range of environmental conditions. These large geographical ranges predetermine large genetic changes along the environmental gradients, such as latitude or altitude, through natural selection. The discovery of factors driving natural selection is essential for understanding a species' adaptability and for the development of population management strategies suitable for changing climate conditions [1]. Common garden experiments, which include genetically broad material representing a large proportion of a species' natural distribution, are a useful tool to dissect genetic divergence *Correspondence: Jaroslav.Klapste@scionresearch.com 1 Scion (New Zealand Forest Research Institute Ltd.), 49 Sala Street, 3010 Rotorua, New Zealand Full list of author information is available at the end of the article and phenotypic plasticity, both of which contribute to the phenotypic response to the changing environment [2]. The conclusions from such experiments are essential for effective implementation of seed transfer [3] or the extrapolation of population redistribution following predicted changes in climate conditions [4,5]. In extreme cases, the results of climate change models show the change in the species distribution and thus local species composition [6].
Forest tree breeding populations are mostly in the initial stage of domestication, and thus the history of evolutionary processes involved in local adaptation such as migration, genetic drift, mutations and selection greatly affect the results of any initial genetic evaluations. The geographical differences in these processes should be included in these evaluations through the inclusion of a provenance effect, fitted either as a fixed or random effect [7]. Ugarte et al. [8] tested differences between models using genetic groups as fixed vs. random term in mixed linear models, and found an advantage from the later when males are assigned to genetic groups non-randomly to reduce prediction error variance. Since geographical structure in populations of forest tree species is present [9], the non-random contribution of males should be considered. Westell et al. [10] proposed using random genetic groups as a means to model the average effect of all phantom parents belonging to a particular genetic group in the evaluation of animal models, which is the favoured approach in genetic evaluations. Hadfield et al. [11] addressed several issues connected with using animal models in quantitative evolutionary genetics to infer natural selection in breeding values, namely their spatial structure and temporal changes. In particular, the assumption that the models must capture all factors contributing to the explanation of a phenotype's variance to be considered close to the true model and obtain robust, unbiased estimates is often under question.
Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco) is divided into two varieties: coastal Douglas-fir (Pseudotsuga menziesii var. menziesii) and interior Douglas-fir (Pseudotsuga menziesii var. glauca) [12]. A previous genetic marker-based analysis performed using allozymes found high population genetic differentiation (G ST ∼ 0.24) in Douglas-fir compared with other conifer species [13], and 51% of that was attributed to the difference between the inland and coastal varieties [14]. Another Douglas-fir provenance study, including populations from British Columbia, Washington and Oregon, did not discover any strong patterns in productivity along geographical or climatic gradients. This could be explained by high sensitivity to microsite conditions, which was not considered in global patterns, or by the sampling strategy. However, the poorest performers on the maritime sites tested were originally from submaritime areas, indicating differences in ecotypic differentiation [3]. Earlier work by St Clair et al. [15] investigated provenance/progeny tests including provenances from Washington and Oregon, and constructed composite traits which strongly aligned to the east-west cline following temperature and elevation, and north-south cline following latitude and summer drought.
The aim of our study was to investigate the fitting of the provenance effect either as a fixed effect or as a contemporary group in the pedigree within a mixed model framework to reflect the best model (data) fit. In addition, the modelling of contemporary groups in the pedigree was included to reflect differences in the genetic composition of maternal and paternal contributions in species with long-distance pollen flow.

Model fit and heritability
Four scenarios using a univariate mixed linear model were tested for each trait in each environment. Two basic models used provenance, as fixed (ABLUP-F) or random effect (ABLUP-R). Both had the poorest model fit in terms of Akaike's Information Criterion (AIC). ABLUP-R mostly resulted in a better model fit compared to ABLUP-F with the exception of traits that showed statistically significant Q ST produced by ABLUP-R model (Tables 1 and 2). Lastly, two models that used genetic groups implemented directly in the pedigree (ABLUP-GC1 and ABLUP-GC2) showed different model fit. While model ABLUP-GC2 showed similar or slightly improved AIC compared to ABLUP-F (Tables 1 and 3), ABLUP-GC1 showed the best model fit across all tested scenarios (Table 4). Similar to model fit, ABLUP-F and ABLUP-GC2 resulted in the identical heritability estimates ranging from 0.12 (MAL1) to 0.51 (VEL1) at Gowan Hill and from 0.04 (MAL1) to 0.84 (VEL1) at Kaingaroa. The model ABLUP-R resulted in slightly lower heritability estimates compared to ABLUP-F and reached values from 0.10 (BR2) to 0.47 (VEL1) at Gowan Hill and from 0.02 (MAL1) to 0.77 (VEL1) at Kaingaroa. The decrease in heritability estimates in ABLUP-R model was attributed to the fact that provenance effects were included in random terms and thus included in the denominator of heritability estimation. This model allowed for partitioning of genetic variance attributed to within provenance variance and within individuals between provenances and estimation of Q ST . The Q ST values ranged from 0.032 (STR2) to 0.25 (DBH1) at Gowan Hill and from 0.00 (VEL1) to 0.27 (DBH1) at Kaingaroa ( Table 2). The model ABLUP-GC2 resulted in heritability estimates similar to model ABLUP-F ( Table 3). The best fit model (ABLUP-GC1) resulted in higher additive genetic as well as residual variance estimates, especially for traits showing high genetic differentiation (statistically significant Q ST ). However, the level of heritability remained similar to other scenarios and reached values from 0.11 (BR2) to 0.46 (VEL1) at Gowan Hill and from 0.06 (MAL1 and AC2) to 0.74 (VEL1) at Kaingaroa ( Table 4).

Accuracy of breeding value estimates
The models ABLUP-F and ABLUP-R resulted in similar accuracy of breeding value estimates reaching values from 0.31 (AC2) to 0.65 (STR2) at Gowan Hill and from 0.14 (VEL1) to 0.56 (DBH1) at Kaingaroa (Tables 1 and 2). The model ABLUP-GC2 resulted in the lowest accuracy of breeding value estimates which ranged from 0.30 (MAL1) to 0.66 (STR2) at Gowan Hill and from 0.17 (NR2) to 0.58 (DBH1) at Kaingaroa ( Table 3). The best fit model (ABLUP-GC1), resulted in the highest accuracy of breeding value estimates ranging from 0.32 (AC2)  [16]. The needle retention trait, indirectly inferring resistance to disease, was therefore scored, evaluated and showed pattern found for productivity at an early age (Fig. 2). However, opposite pattern in productivity at a later age ( Fig. 1 -bottom right) would assume that decreased productivity of most sensitive provenances at an early age ( Fig. 1 -bottom left) is diminished with age.

Genetic correlations
A bivariate mixed linear model was implemented for the estimation of pair-wise genetic correlations using ABLUP-F (the default model) and ABLUP-CG1 (the model which showed best AIC for all traits). Both models resulted in similar genetic correlations with slightly lower estimates in the ABLUP-CG1 model.

Population structure and its relevance in forest trees
Local adaptation is the result of evolutionary forces such as migration, random drift, natural selection and mutation. Contemporary groups can efficiently model genetic differences between sets of individuals coming from different environments undergoing different directions of local adaptation. However, the mobility of seed and pollen in wind-pollinated species such as conifers is different [17], and both parents do not necessarily come from the same genetic group. While seed dispersion is realised mostly within 60 meters, pollen dispersal is usually within the range of several hundreds of metres [18], however, this can reach as much as 500 -750 km in conifer species [19,20]. Moreover, while a seed donor is considered adapted to the local environment due to survival and successfully reaching sexual maturity (see Box 2 in [21] for reference), the pollen donor has not necessarily interacted with the local environment due to the possibility of long-distance pollen transfer. The investigation of evolutionary responses across populations sampled along different environmental gradients and planted under a common environment is critical for understanding population genetic divergence and will guide the selection of material suitable for future climatic conditions [2]. However, using appropriately informed models is necessary to obtain an unbiased estimation of genetic parameters and make correct inferences about evolutionary responses [11], which is especially important in traits responsible for local adaptation. Most of the conifer domestication programmes are in their initial phases, where samples are collected across a wide range of environments and planted under common environmental conditions [22,23]. Under such conditions, population genetic divergence provides important information about the strength of local adaptation and its modelling is critical to obtain accurate genetic parameters about the evolutionary capacity to respond to future climate conditions [11]. The current intensity of climate change is placing pressure on forest tree populations to improve resilience to increasingly variable environments during their long lifespans. Widely distributed conifer species are able to promote local adaptation due to intensive gene flow. However, it might not be enough to cope with rapid climate change [24]. Climate and growth models have estimated that maladaptation to future climate results not only in reduced productivity and increased mortality [25] but also in increased sensitivity to pathogens [26]. Therefore, active matching of forest populations to future climate conditions [4,5,24,27], as well the monitoring and assessing host-pathogen interaction is required for optimal resilience [28]. In extreme cases, this type of active management can lead to a change in species composition [6].

Modelling of population structure in genetic evaluation
The population structure is usually fitted as either fixed or random term in the genetic evaluation using mixed linear models [8]. The modelling of population structure through contemporary groups implemented directly in pedigree improved model fit and increased the accuracy of breeding values in the current study. However, assigning both maternal and paternal contribution to the same contemporary group resulted in the same model fit as using provenance as a fixed term. Consequently, defining only a different contemporary group for the paternal side of the pedigree, reflecting a different genetic background, resulted in improvement of both model fit and the accuracy of breeding values. In addition, models considering provenance as a fixed term or including both parents in the same contemporary group resulted in slightly higher heritability, than different contemporary groups in the both sides of the pedigree. These results indicate that the incorrect fit of the population structure would result in a somewhat overestimated genetic parameters. The different genetic groups for maternal and paternal contributions is expected, this is supported by the research results of mating dynamics in seed orchards. Evidence was shown on the presence of pollen contamination in seed crop in the range of 10% [29] to 90% [30] with indicating a relatively stable seasonal variability [29,31]. The use of genetic groups provides flexibility in modelling of population structure, which corresponds to reallife seed and pollen mobility [18][19][20] and should be preferred over the fitting of provenance as a fixed term.
However, in the case of insect-pollinated species, pollen mobility is directed by the pollinator and is usually limited only to the local population. Gonzaga et al. [32] investigated mating patterns in Eucalyptus seed orchard and found average pollen dispersal of 94 metres and pollen contamination of 14%.
Similarly, Rao et al. [33] reported mean pollen contamination of 17.6% in Eucalyptus breeding arboretum. Under such conditions, modelling of both the maternal and paternal sides of pedigree by the same contemporary group is a sensible solution, due to the fact  Genetic correlation between traits (their standard errors in parentheses) measured at Kaingaroa, estimated using ABLUP-GC1 model (above diagonal) and ABLUP-F model (below diagonal) that both parents survived and reached sexual maturity under the same environmental conditions and thus passed the same kind of natural selection (see Box 2 in [21] for reference). The development of genetic markers through next-generation sequencing platforms such as "Genotyping-by-sequencing" [34] can help to improve the inference about population structure. Although, a thorough understanding of gene flow on a landscape scale is anyhow challenging even with the help of genetic markers [35].

Benefits of modelling of population structure for traits potentially involved in local adaptation
Our study investigated investigated the level of genetic differentiation between populations for all investigated traits through Q ST parameters and found them as statistically significant based on their standard errors for productivity traits (DBH) and foliar disease resistance (to SNC) measured by needle retention (NR2). Similar levels of between-population differentiation were found for other potentially adaptive traits in conifers such as bud set, bud flush and cold injury [36]. Previous studies of genetic differentiation in coastal Douglas-fir found very low F ST values ranging between 0.006 and 0.071 [14,37,38]. Additionally, Wilhelmi et al. [39] found differences in needle retention between provenances and concluded that provenances originating from areas of higher foliar disease pressure are more resistant. It can be therefore assumed that these traits are contributing to local adaptation. The considerable genetic variability between populations (large Q ST ) indicates a high level of local natural selection and thus adaptive potential in traits related to productivity [40]. The exploration of provenance performance along latitudes showed that the most productive provenances tested in New Zealand originate from latitude around 38 • N at Gowan Hill and around 40 • N at Kaingaroa (Fig. 1). The advantage of more northern provenances at Kaingaroa (Latitude 38 • 17' S) can be attributed to the reduced needle retention of the southern provenances (Fig. 2) due to poorer resistance to SNC [16]. Douglas-fir at Kaingaroa is exposed to SNC due to the favourable climate for the disease in the North Island, such as mild mean daily winter temperatures and high spring moisture [41]. Therefore, it is critical to include needle retention as selection criteria in the North Island of New Zealand to improve resistance to SNC in Douglas-fir [42]. Correlation analysis discovered strong positive genetic correlations between NR2 and DBH at Kaingaroa. This relationship was more pronounced at later age, which indicates that reduced productivity depends on the first Genetic correlations between sites using ABLUP-F model and ABLUP-CG1 model occurrence and frequency of foliar diseases. The correlations were mostly similar between implemented methods. However, some correlation estimates were stronger at Kaingaroa when contemporary groups were implemented in the pedigree compared with a model using provenance as a fixed term. Nevertheless, the Mantel test found high agreement between correlation matrices from tested models, reaching 0.99 in Gowan Hill and 0.97 in Kaingaroa. Appropriate modelling of population structure is critical to obtain reliable estimates of genetic correlations which can be implemented in evolutionary response to selection [11]. Additionally, reliable estimates of genetic correlations are required in the evolutionary developmental biology field of research to infer organismal modularity [43] and phenotypic integration [44]. Our test of modularity identified a module consisting of productivity (DBH2) and disease resistance (NR2) traits, both identified as traits potentially involved in the process of local adaptation in environment where both traits were expressed at the Kaingaroa site. This represents traits varying in the same way independently from other traits [45]. Interestingly, this module was identified only in the model that used different contemporary groups in the maternal and paternal sides of the pedigree. Therefore, the appropriate modelling of population structure is especially critical for traits involved in local adaptation showing a decent level of genetic divergence at the population level -Q ST ). However, it is worth to mention that the resulting effect of population structure modelling depends on the extent of genetic variation included in the tested sample. Ideally, it should include a sample representing the whole natural distribution of the species. In our case, the sample represented populations from only part of the natural distribution (Oregon and California).
In contrast to the testing of provenance effect as a fixed term in mixed models, the implementation of contemporary groups in the pedigree allows greater flexibility in the modelling of differences in evolutionary forces, such as migration and natural selection between seed and pollen. This is important, especially in wind-pollinated species showing long-distance gene flow across heterogeneous environments, which could potentially cause maladaptation to local environmental conditions. Our study found a positive impact on model fit and accuracy of breeding values from biologically relevant modelling of population structure through contemporary groups implemented in the pedigree. Additionally, the proposed model resulted in changes in genetic correlations between the investigated traits. Such changes affect inference, not only the evolutionary response to selection, but also organismal modularity and phenotypic integration.

Conclusions
The appropriate modelling of population structure in genetic evaluations is critical for unbiased estimates of genetic parameters. Ignorance or inappropriate modelling of population structure can produce an inferior fit of the models and lower accuracy of genetic parameters. Our study found a positive impact on model fit and accuracy of breeding values from biologically sensible modelling of population structure through contemporary groups implemented in the pedigree. The appropriate modelling of population structure was found especially critical for traits involved in local adaptation. This approach is suitable especially for wind-pollinated species with extensive long-distance pollen flow. Additionally, the proposed model resulted in changes in genetic correlations between the investigated traits. Such changes can affect inference, not only about the evolutionary response to selection but about organismal modularity and phenotypic integration.

Methods
Coastal Douglas-fir was introduced in New Zealand during the 1950s through the establishment of a provenance test covering genetic material from Washington, Oregon, and limited representation from California [46]. The results of the early evaluation showed superior growth performance of provenances from Oregon and California, therefore, the selection of new breeding material was focused on these geographical areas. A collection of seed from trees at the original wild stands was imported to New Zealand, and a new provenance/progeny test was established in 1996 [47]. The material used to plant at two New Zealand environments (Kaingaroa, latitude 38 • 17' S, and Gowan Hill, latitude 45 • 52' S) in 1996 were collected from populations in two US regions (California and Oregon), ranging in latitude from 36 • to 48 • N along the western coast of the USA. Each experiment includes 30 replications of 7 sets. Each set contains 34 openpollinated families and 2 controls. The provenances were represented equally within each of the set. A detailed description of the material is provided in a previous study [42]. Tree diameter at breast height (DBH) was measured at ages 11 (DBH1 [mm]) and 21 years (DBH2 [mm]). Trees were also assessed for straightness (STR1 and STR2) and malformation (MAL1 and MAL2) at the same ages. Straightness was scored on a scale of 1 to 9 [48] where one represented a crooked stem and nine a straight stem. Similarly, malformation was scored on scale of 1 to 9 where: 1 -tree with multiple leaders, 2 -tree with two leaders, 3 -main stem shifted for more than half of its diameter, 4 -main stem shifted for less than half of its diameter, 5 -tree with multiple ramicorns instead of main leader, 6 -tree has three and more distinctive ramicorns, 7 -tree has two distinctive ramicorns, 8 -tree has one distinctive ramicorn, 9 -tree is clear of any stem shift, multiple leaders or ramicorns. Acoustic wave velocity (VEL1 [km/s]), as an indirect measure of wood stiffness, was measured by HITMAN ST300 (Fibre-gen, Christchurch, New Zealand) at age 11 years. Needle retention describes the proportion of needles retained (NR2 -measured only at Kaingaroa and scored on a scale of 1 -bad to 6 -good, reflecting the damage of needles of different ages) was measured at age 21 years. Branching pattern (BR2) was scored on 9 degrees scale where: 1 -one whorl per year with one set of branches, 2 -one whorl per year with two sets of branches, 3 -one whorl per year with multiple sets of branches, 4 -short internode with multiple sets of branches per whorl, 6 -9 multinodal trees with increasing score as frequency of nodes increases. Acceptability (AC2) was scored as a binary trait at age 21 years. All class variables (STR, MAL, BR, NR) were transformed into the normal score [49]. Variance components and heritabilities for the investigated traits were estimated using a mixed linear model implemented in the ASReml-R statistical package [50] as follows: where y is the vector of measurements, β is the vector of fixed effects such as intercept and control, g is the vector of random individual tree additive genetic effects following var(g)∼N(0,Aσ 2 g ), where A is the average numerator relationship matrix [51] and σ 2 g is additive genetic variance, r is the vector of random replication effects following var(r)∼N(0,Iσ 2 r ), where σ 2 r is replication variance and I is identity matrix, r(s) is the vector of random set nested within replication effects following var(r(s))∼N(0,Iσ 2 r(s) ), where σ 2 r(s) is set nested within replication variance, e is the vector of random residuals following var(e)∼N(0,Iσ 2 e ), where σ 2 e is residual variance, and X and Z are the incidence matrices assigning the effects from fixed and random vectors to measurements in vector y. Additionally, the provenance term was investigated through three alternative scenarios: 1) used as a fixed term (ABLUP-F), and considered as the default model; 2) implemented as contemporary genetic groups directly in the pedigree and modeling paternal contribution coming from an independent genetic group (phantom group) common across all provenances (ABLUP-CG1); 3) implemented as contemporary genetic groups directly in the pedigree, modeling both maternal and paternal origin from the same genetic group (ABLUP-CG2) [52]. In addition, the model with provenance as a random term was used to test for genetic divergence in quantitative traits (Q ST ) (ABLUP-R) [53]. Genetic correlations between traits within site and between traits across sites were estimated using bivariate mixed linear model implemented in the ASReml-R statistical package [50] as follows: population differentiation due to genetic structure; G ST : Population genetic differentiation (an extension of F ST for multi-allelic markers); GxE: Genotype by environment interaction; MAL1: Stem malformation scored at age of 11 years; MAL2: Stem malformation scored at age of 21 years; N: North; NR2: Needle retention scored at age of 21 years; PEV: Prediction error variance; Q ST : Genetic differentiation among populations expressed by quantitative trait; S: South; SNC: Swiss needle cast; STR1: Stem straightness scored at age of 11 years; STR2: Stem straightness scored at age of 21 years; USA: The United States of America; VEL1: Acoustic wave velocity measured at age of 11 years