Models and partition of variance for quantitative trait loci with epistasis and linkage disequilibrium

Wang, Tao; Zeng, Zhao-Bang

doi:10.1186/1471-2156-7-9

Research article
Open access
Published: 10 February 2006

Models and partition of variance for quantitative trait loci with epistasis and linkage disequilibrium

Tao Wang^1,3 &
Zhao-Bang Zeng^1,2

BMC Genetics volume 7, Article number: 9 (2006) Cite this article

9221 Accesses
40 Citations
Metrics details

Abstract

Background

A genetic model about quantitative trait loci (QTL) provides a basis to interpret the genetic basis of quantitative traits in a study population, such as additive, dominance and epistatic effects of QTL and the partition of genetic variance. The standard quantitative genetics model is based on the least squares partition of genetic effects and also genetic variance in an equilibrium population. However, over years many specialized QTL models have also been proposed for applications in some specific populations. How are these models related? How to analyze and partition a QTL model and genetic variance when both epistasis and linkage disequilibrium are considered?

Results

Starting from the classical description of Cockerham genetic model, we first represent the model in a multiple regression setting by using indicator variables to describe the segregation of QTL alleles. In this setting, the definition of additive, dominance and epistatic effects of QTL and the basis for the partition of genetic variance are elaborated. We then build the connection between this general genetic model and a few specialized models (a haploid model, a diploid F₂ model and a general two-allele model), and derive the genetic effects and partition of genetic variance for multiple QTL with epistasis and linkage disequilibrium for these specialized models.

Conclusion

In this paper, we study extensively the composition and property of the genetic model parameters, such as genetic effects and partition of genetic variance, when both epistasis and linkage disequilibrium are considered. This is the first time that both epistasis and linkage disequilibrium are considered in modeling multiple QTL. This analysis would help us to understand the structure of genetic parameters and relationship of various genetic quantities, such as allelic frequencies and linkage disequilibrium, on the definition of genetic effects, and will also help us to understand and properly interpret estimates of the genetic effects and variance components in a QTL mapping experiment.

Background

Modeling quantitative trait loci (QTL) started with Yule [1, 2] and Pearson [3] (see [4, 5] for the early history of quantitative genetics). However, it was Fisher [6] who laid the firm foundation for quantitative genetics. Fisher defined gene effects (additive, dominance and epistatic effects) based on the partition of genetic variance. He partitioned the genetic variance into a portion due to additive effects (averaged allelic substitution effects), a portion due to dominance effects (allelic interactions), and a portion due to epistatic effects (non-allelic interactions) of genes. He then studied the correlation between relatives using the model. Cockerham [7] used the orthogonal contrasts to redefine the additive and dominance effects of QTL and, by extending the contrasts to include epistatic effects, he partitioned the epistatic variance of two loci into those due to additive × additive, additive × dominance, dominance × additive and dominance × dominance effects of QTL. Cockerham then generalized the model to multiple loci. This was further generalized by Kempthorne [8, 9] to multiple alleles. This model has been used as the basis for studying quantitative genetics ever since.

However, over years, many specialized models have also been proposed. Some are just special cases of the general genetic model and some are simplified variants tailored for particular applications or interpretations. With the propagation of numerous quantitative genetic models, there have also been some confusions in literature on the definition and interpretation of additive, dominance and epistatic effects of QTL and their relationship to the partition of genetic variance. Also, there has never been a study that considers both epistasis and linkage disequilibrium in the partition of genetic variance for multiple QTL. In this paper, we try to build the connection between the general genetic model and a few other commonly used genetic models to clarify the basis for the interpretation of different genetic models.

We start with an introduction of the genetic model as expressed in [7] in the context of variance components. Then by introducing an indicator variable for each QTL allele, we represent the model in a multiple regression setting and examine the definition and meaning of the genetic effects (additive, dominance and epistatic effects) of QTL and partition of the genetic variance in an equilibrium population and also in a disequilibrium population. Most of previous studies on modeling QTL discuss epistasis only in reference to an equilibrium population. An examination of properties of a model with both epistasis and linkage disequilibrium is important for QTL analysis in both experimental and natural populations. This is another goal of the paper and is studied in great detail here. We discuss a few reduced models used for QTL analysis, such as backcross model (essentially a haploid model) and F₂ model. We also give details for a general two-allele model which may be useful for studying the genetic architecture in a natural population using single nucleotide polymorphisms (SNPs).

Previously, in [10], we compared F₂ model and the general two-allele model with another commonly used genetic model, called F_∞ model. By specifying the basis of definition for each model, we compared the properties of these models in the estimation and interpretation of QTL effects including epistasis and discussed a few potential problems of using F_∞ model in a segregating population for QTL analysis. Similarly, we also compared these models with another model proposed by Cheverud [11, 12].

An important result of [10] is that the genetic effects defined in reference to an equilibrium population also apply to a disequilibrium population. The partial regression coefficients, that define the genetic effects in a disequilibrium population, equal to the simple regression coefficients in a corresponding equilibrium population – the usual basis to define and interpret a genetic effect including an epistatic effect. Hardy-Weinberg and linkage disequilibria only introduce covariances between different genetic effects. With this result, in this paper our discussion on epistasis and linkage disequilibrium is focused on the partition and composition of genetic variances and covariances between different genetic effects in different populations.

Results

The genetic model

A general genetic model for the partition of genetic variance (particularly epistatic variance) in a random mating population was first given by Cockerham [7, 13] and extended to multiple alleles by Kempthorne [8, 9], following the basic genetic model formulated by Fisher [6]. The model for two loci A and B with multiple alleles was expressed as follows

$\begin{matrix} G_{j l}^{i k} = μ + α^{i} + α_{j} + δ_{j}^{i} + β^{k} + β_{l} + γ_{l}^{k} + (α^{i} β^{k}) \\ + (α^{i} β_{l}) + (α_{j} β^{k}) + (α_{j} β_{l}) + (α^{i} γ_{l}^{k}) \\ + (α_{j} γ_{l}^{k}) + (δ_{j}^{i} β^{k}) + (δ_{j}^{i} β_{l}) + (δ_{j}^{i} γ_{l}^{k}) \end{matrix} (1)$

where the genotypic value $G_{j l}^{i k}$ is the expected phenotype of an individual carrying alleles A_i, A_j, B_k, and B_lwith phased genotype A_iB_k/A_jB_lformed by the union of a paternal gamete A_iB_kand a maternal gamete A_jB_l. The model partitions the total genotypic value into a number of genetic effects which include additive effects of each allele (α's and β's), dominance effects between two alleles at each locus (δ's and γ's), additive × additive interactions between two alleles at two loci ((αβ)'s), additive × dominance interactions involving three alleles ((αγ)'s and (δβ)'s), and dominance × dominance interaction involving all four alleles ((δγ)'s).

As an ANOVA model, it is known that not all the parameters in model (1) are estimable. A number of constraint conditions on these parameters are therefore needed. Let pⁱ, q^kdenote allelic frequencies for alleles on paternal gametes, and p_j, q_lallelic frequencies for alleles on maternal gametes. It is usually assumed that a weighted summation of genetic effects is zero over any index for each genetic component as a deviation from the mean. Some examples are

$\begin{array}{l} \sum_{i} p^{i} α^{i} = 0, \sum_{i} p^{i} δ_{j}^{i} = 0, \sum_{j} p_{j} δ_{j}^{i} = 0, \\ \sum_{i} p^{i} (α^{i} β^{k}) = 0, \sum_{k} p^{k} (α^{i} β^{k}) = 0, \\ \sum_{j} p_{j} (δ_{j}^{i} β^{k}) = 0, \sum_{k} q^{k} (δ_{j}^{i} β^{k}) = 0, \dots \dots \end{array} (2)$

Under the assumption of random mating and linkage equilibrium and allowing for different allelic frequencies in paternal and maternal gametes, the mean and genetic effects can be expressed as follows based on the least squares principle:

$\begin{array}{l} μ = \sum_{i, j, k, l} p^{i} p_{j} q^{k} q_{l} G_{j l}^{i k} (3) \\ α^{i} = G_{..}^{i .} - G_{..}^{..}, β^{k} = G_{..}^{. k} - G_{..}^{..}, \\ α_{j} = G_{j .}^{..} - G_{..}^{..}, β_{l} = G_{. l}^{..} - G_{..}^{..}, \\ δ_{j}^{i} = G_{j .}^{i .} - G_{..}^{i .} - G_{j .}^{..} + G_{..}^{..}, \\ γ_{l}^{k} = G_{. l}^{. k} - G_{..}^{. k} - G_{. l}^{..} + G_{..}^{..}, \\ (α^{i} β^{k}) = G_{..}^{i k} - G_{..}^{i .} - G_{..}^{. k} + G_{..}^{..}, \\ (α^{i} β_{l}) = G_{. l}^{i .} - G_{..}^{i .} - G_{. l}^{..} + G_{..}^{..}, \\ (α_{j} β^{k}) = G_{j .}^{. k} - G_{j .}^{..} - G_{..}^{. k} + G_{..}^{..}, \\ (α_{j} β_{l}) = G_{j l}^{..} - G_{j .}^{..} - G_{. l}^{..} + G_{..}^{..}, \\ (α^{i} γ_{l}^{k}) = G_{. l}^{i k} - G_{..}^{i k} - G_{. l}^{i .} - G_{. l}^{. k} + G_{..}^{i .} + G_{..}^{. k} \\ + G_{. l}^{..} - G_{..}^{..}, \\ (α_{j} γ_{l}^{k}) = G_{j l}^{. k} - G_{j .}^{. k} - G_{. l}^{. k} - G_{j l}^{..} + G_{j .}^{..} + G_{..}^{. k} \\ + G_{. l}^{..} - G_{..}^{..}, \\ (δ_{j}^{i} β^{k}) = G_{j .}^{i k} - G_{..}^{i k} - G_{j .}^{i .} - G_{j .}^{. k} + G_{..}^{i .} + G_{..}^{. k} \\ + G_{j .}^{..} - G_{..}^{..}, \\ (δ_{j}^{i} β_{l}) = G_{j l}^{i .} - G_{j .}^{i .} - G_{. l}^{i .} - G_{j l}^{..} + G_{..}^{i .} + G_{j .}^{..} \\ + G_{. l}^{..} - G_{..}^{..}, \\ (δ_{j}^{i} γ_{l}^{k}) = G_{j l}^{i k} - G_{j .}^{i k} - G_{. l}^{i k} - G_{j l}^{i .} + G_{j l}^{. k} + G_{..}^{i k} \\ + G_{j l}^{..} + G_{j .}^{i .} + G_{. l}^{. k} + G_{j .}^{. k} + G_{. l}^{i .} - G_{..}^{i .} \\ - G_{..}^{. k} - G_{j .}^{..} - G_{. j}^{..} + G_{..}^{..} \end{array}$

where $G_{..}^{..} = \sum_{i, j, k, l} p^{i} p_{j} q^{k} q_{l} G_{j l}^{i k}$ , $G_{..}^{i .} = \sum_{j, k, l} p_{j} q^{k} q_{l} G_{j l}^{i k}$ , and so on. The total genetic variance is $V_{G} = \sum_{i, j, k, l} p^{i} p_{j} p^{k} p_{l} {(G_{j l}^{i k} - μ)}^{2}$ , and has an orthogonal partition under random mating and linkage equilibrium

$\begin{array}{l} V_{G} = V_{A_{1}} + V_{A_{2}} + V_{D_{1}} + V_{D_{2}} + V_{A_{1} A_{2}} + V_{A_{1} D_{2}} (4) \\ + V_{D_{1} A_{2}} + V_{D_{1} D_{2}} \end{array}$

with

$\begin{array}{l} V_{A_{1}} = \sum_{i} p^{i} {(α^{i})}^{2} + \sum_{j} p_{j} {(α_{j})}^{2} \\ V_{D_{1}} = \sum_{i, j} p^{i} p_{j} {(δ_{j}^{i})}^{2} \\ V_{A_{2}} = \sum_{k} q^{k} {(β^{k})}^{2} + \sum_{l} q_{l} {(β_{l})}^{2} \\ V_{D_{2}} = \sum_{k, l} q^{k} q_{l} {(γ_{l}^{k})}^{2} \\ V_{A_{1} A_{2}} = \sum_{i, k} p^{i} q^{k} {(α^{i} β^{k})}^{2} + \sum_{j, l} p_{j} q_{l} {(α_{j} β_{l})}^{2} \\ + \sum_{i, l} p^{i} q_{l} {(α^{i} β_{l})}^{2} + \sum_{j, k} p_{j} q^{k} {(α_{j} β^{k})}^{2} \\ V_{A_{1} D_{2}} = \sum_{i, k, l} p^{i} q^{k} q_{l} {(α^{i} γ_{l}^{k})}^{2} + \sum_{j, k, l} p_{j} q^{k} q_{l} {(α_{j} γ_{l}^{k})}^{2} \\ V_{D_{1} A_{2}} = \sum_{i, j, k} p^{i} p_{j} q^{k} {(δ_{j}^{i} β^{k})}^{2} + \sum_{i, j, l} p^{i} p_{j} q_{l} {(δ_{j}^{i} β_{l})}^{2} \\ V_{D_{1} D_{2}} = \sum_{i, j, k, l} p^{i} p_{j} q^{k} q_{l} {(δ_{j}^{i} γ_{l}^{k})}^{2} \end{array}$

Using indicator variables, we can represent model (1) in another form. Assume that the two loci A and B have alleles A_i, i = 1, 2, ..., n₁; and B_k, i = 1, 2, ..., n₂, respectively. We define the following indicator variables to represent the segregation of alleles in a population.

$\begin{array}{l} z_{M_{i}}^{(1)} = {\begin{array}{l} 1, & for A_{i} allele from paternal gamete \\ 0, & otherwise . \end{array} \\ z_{F_{j}}^{(1)} = {\begin{array}{l} 1, & for A_{j} allele from maternal gamete \\ 0, & otherwise . \end{array} \end{array}$

for i, j = 1, 2, ..., n₁ at locus A, and

$\begin{array}{l} z_{M_{k}}^{(2)} = {\begin{array}{l} 1, & for B_{k} allele from paternal gamete \\ 0, & otherwise . \end{array} \\ z_{F_{l}}^{(2)} = {\begin{array}{l} 1, & for B_{l} allele from maternal gamete \\ 0, & otherwise . \end{array} \end{array}$

for k, l = 1, 2, ..., n₂ at locus B. In terms of these indicator variables, we have the following.

• Hardy-Weinberg equilibrium (HWE) implies that { $z_{M_{i}}^{(1)}$ , i = 1, 2, ..., n₁} are independent of { $z_{F_{j}}^{(1)}$ , j = 1, 2, ..., n₁}, and { $z_{M_{k}}^{(2)}$ , k = 1, 2, ..., n₂} are independent of { $z_{F_{l}}^{(2)}$ , l = 1, 2, ..., n₂}.

• Linkage equilibrium (LE) implies that { $z_{M_{i}}^{(1)}$ , i = 1, 2, ..., n₁} are independent of { $z_{M_{k}}^{(2)}$ , k = 1, 2, ..., n₂}, and { $z_{F_{j}}^{(1)}$ , j = 1, 2, ..., n₁} are independent of { $z_{F_{l}}^{(2)}$ , l = 1, 2, ..., n₂}.

• There is another type of disequilibrium; i.e., the so-called genotypic disequilibrium [14] for two alleles on different gametes and at different loci. So, the genotypic equilibrium (GE) here means that { $z_{M_{i}}^{(1)}$ , i = 1, 2,..., n₁} are independent of { $z_{F_{l}}^{(2)}$ , l = 1, 2, ..., n₂}, and { $z_{F_{j}}^{(1)}$ , j = 1, 2, ..., n₁} are independent of { $z_{M_{k}}^{(2)}$ , k = 1, 2, ..., n₂}.

It is known that under random mating we have both HWE and GE, which together are called gametic phase equilibrium. Now, let G denote the genotypic value of a progeny drawn randomly from the current population. Based on Cockerham model, G can be expressed as

$\begin{array}{l} G = μ + \sum_{i = 1}^{n_{1}} α^{i} z_{M_{i}}^{(1)} + \sum_{j = 1}^{n_{1}} α_{j} z_{F_{j}}^{(1)} + \sum_{i, j} δ_{j}^{i} z_{M_{i}}^{(1)} z_{F_{j}}^{(1)} \\ + \sum_{k = 1}^{n_{2}} β^{k} z_{M_{k}}^{(2)} + \sum_{l = 1}^{n_{2}} β_{l} z_{F_{l}}^{(2)} + \sum_{k, l} γ_{l}^{k} z_{M_{k}}^{(2)} z_{F_{l}}^{(2)} \\ + [\sum_{i, k} (α^{i} β^{k}) z_{M_{i}}^{(1)} z_{M_{k}}^{(2)} + \sum_{i, l} (α^{i} β_{l}) z_{M_{i}}^{(1)} z_{F_{l}}^{(2)} \\ + \sum_{j, k} (α_{j} β^{k}) z_{F_{j}}^{(1)} z_{M_{k}}^{(2)} + \sum_{j, l} (α_{j} β_{l}) z_{F_{j}}^{(1)} z_{F_{l}}^{(2)}] \\ + [\sum_{i, k, l} (α^{i} γ_{l}^{k}) z_{M_{i}}^{(1)} z_{M_{k}}^{(2)} z_{F_{l}}^{(2)} + \sum_{j, k, l} (α_{j} γ_{l}^{k}) z_{F_{j}}^{(1)} z_{M_{k}}^{(2)} z_{F_{l}}^{(2)}] \\ + [\sum_{i, j, k} (δ_{j}^{i} β^{k}) z_{M_{i}}^{(1)} z_{F_{j}}^{(1)} z_{M_{k}}^{(2)} + \sum_{i, j, l} (δ_{j}^{i} β_{l}) z_{M_{i}}^{(1)} z_{F_{j}}^{(1)} z_{F_{l}}^{(2)}] \\ + \sum_{i, j, k, l} (δ_{j}^{i} γ_{l}^{k}) z_{M_{i}}^{(1)} z_{F_{j}}^{(1)} z_{M_{k}}^{(2)} z_{F_{l}}^{(2)} (5) \end{array}$

This is simply a different presentation of Cockerham model with the same constraint conditions applied on the coefficient parameters. For a given individual with genotype A_iB_k/A_jB_l, G will take the same value of $G_{j l}^{i k}$ as before. However, this expression is helpful for us to understand some details about each component of genetic effects. We can see this more clearly in the examination of some reduced models later.

In general, the genetic effects can be defined separately for alleles that are paternally and maternally transmitted to account for possible biological differences. As a fully parameterized model for $G_{j l}^{i k}$ , model (3) may give $G_{j l}^{i k} \neq G_{j k}^{i l} \neq G_{i l}^{j k} \neq G_{i k}^{j l}$ depending on how genetic effects are defined. If locus A has n₁ alleles, and locus B has n₂ alleles, there are N = $n_{1}^{2} n_{2}^{2}$ possible phased genotypes in total with the partition of the degrees of freedom given in Table 1.

Table 1 Partition of degrees of freedom for two loci with number of alleles n₁ and n₂ (a general case)

Full size table

If we assume that the union of paternal gamete A_iB_kwith maternal gamete A_jB_lhave the same mean effect as that of paternal gamete A_jB_lwith maternal gamete A_iB_k(i.e., $G_{j l}^{i k} = G_{i k}^{j l}$ ), the coupling and repulsion heterozygotes have the same genotypic value (i.e., $G_{j l}^{i k} = G_{j k}^{i l}$ ), and paternal and maternal gametes have the same gametic frequency distribution, we do not need to distinguish paternal and maternal effects. In this case, the two loci can be regarded as 2 factors and each factor has $n_{i}^{2}$ (i = 1, 2) levels produced by the allelic combinations of n_ialleles (cf. [7]). The total number of genotypes is N = n₁(n₁ + 1)n₂(n₂ + 1)/4 and the partition of degrees of freedom is shown in Table 2. Since in this case, αⁱ= α_i, β^k= β_k, ..., and so on, the model can also be expressed as follows

$\begin{array}{l} G = μ + \sum_{i = 1}^{n_{1}} α_{i} (z_{M_{i}}^{(1)} + z_{F_{i}}^{(1)}) + \sum_{i, j} δ_{j}^{i} z_{M_{i}}^{(1)} z_{F_{j}}^{(1)} \\ + \sum_{k = 1}^{n_{2}} β_{k} (z_{M_{k}}^{(2)} + z_{F_{k}}^{(2)}) + \sum_{k, l} γ_{l}^{k} z_{M_{k}}^{(2)} z_{F_{l}}^{(2)} \\ + \sum_{i, k} (α_{i} β_{k}) (z_{M_{i}}^{(1)} + z_{F_{i}}^{(1)}) (z_{M_{k}}^{(2)} + z_{F_{k}}^{(2)}) \\ + \sum_{i, k, l} (α_{i} γ_{l}^{k}) (z_{M_{i}}^{(1)} + z_{F_{i}}^{(1)}) z_{M_{k}}^{(2)} z_{F_{l}}^{(2)} \\ + \sum_{i, j, k} (δ_{j}^{i} β_{k}) z_{M_{i}}^{(1)} z_{F_{j}}^{(1)} (z_{M_{k}}^{(2)} + z_{F_{k}}^{(2)}) \\ + \sum_{i, j, k, l} (δ_{j}^{i} γ_{l}^{k}) z_{M_{i}}^{(1)} z_{F_{j}}^{(1)} z_{M_{k}}^{(2)} z_{F_{l}}^{(2)}) (6) \end{array}$

Table 2 Partition of degrees of freedom for two loci with number of alleles n₁ and n₂ (a simplified case without distinguishing the paternal and maternal origins)

Full size table

For the case of an arbitrary number of loci, the situation will become more complicated. In addition to the additive and dominance effects at each locus and two locus interactions (additive × additive, additive × dominance, dominance × additive, dominance × dominance, with a total number of 2² terms), there are 3 locus interactions (additive × additive × additive, additive × additive × dominance, ..., with a total number of 2³ terms), 4 locus interactions (additive × additive × additive × additive, ..., with a total number of 2⁴ terms), and so on. Though the extension is straightforward, the total number of terms will increase dramatically. We will show some models with multiple loci in later examples by ignoring trigenic and higher order epistasis.

Effects and variance components

Let pⁱ, p_j(i, j = 1, 2, ..., n₁) be allelic frequencies of paternal and maternal gametes at locus A, respectively. Let also q^k, q_l(k, l = 1, 2, ..., n₂) denote allelic frequencies of paternal and maternal gametes at locus B, respectively. In the analysis of variance for the model, it is convenient to use deviations of the indicator variables $z_{M_{i}}^{(1)}$ , $z_{F_{i}}^{(1)}$ , $z_{M_{j}}^{(2)}$ and $z_{F_{j}}^{(2)}$ from their expected values. That is

$\begin{array}{l} x_{M_{i}}^{(1)} = z_{M_{i}}^{(1)} - E (z_{M_{i}}^{(1)}) = z_{M_{i}}^{(1)} - p^{i} \\ = {\begin{array}{l} 1 - p^{i}, & for A_{i} allele from paternal gamete \\ - p^{i}, & otherwise \end{array} \end{array}$

Similarly, define

$\begin{array}{l} x_{F_{j}}^{(1)} = z_{F_{j}}^{(1)} - E (z_{F_{j}}^{(1)}) = z_{F_{j}}^{(1)} - p_{j} \\ x_{M_{k}}^{(2)} = z_{M_{k}}^{(2)} - E (z_{M_{k}}^{(2)}) = z_{M_{k}}^{(2)} - q^{k} \\ x_{F_{l}}^{(2)} = z_{F_{l}}^{(2)} - E (z_{F_{l}}^{(2)}) = z_{F_{l}}^{(2)} - q_{l} \end{array}$

Taking the constraint conditions on the genetic effects into account, we can show that,

$\begin{matrix} \sum_{i = 1}^{n_{1}} α^{i} x_{M_{i}}^{(1)} = \sum_{i = 1}^{n_{1}} α^{i} z_{M_{i}}^{(1)} \\ \sum_{i, j} δ_{j}^{i} x_{M_{i}}^{(1)} x_{F_{j}}^{(1)} = \sum_{i, j} δ_{j}^{i} z_{M_{i}}^{(1)} z_{F_{j}}^{(1)} \\ \sum_{i, k, l} (α^{i} γ_{l}^{k}) x_{M_{i}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)} = \sum_{i, k, l} (α^{i} γ_{l}^{k}) z_{M_{i}}^{(1)} z_{M_{k}}^{(2)} z_{F_{l}}^{(2)} \\ \sum_{i, j, k, l} (δ_{j}^{i} γ_{l}^{k}) x_{M_{i}}^{(1)} x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)} = \sum_{i, j, k, l} (δ_{j}^{i} γ_{l}^{k}) z_{M_{i}}^{(1)} z_{F_{j}}^{(1)} z_{M_{k}}^{(2)} z_{F_{l}}^{(2)} \end{matrix}$

and so on. For example,

$\begin{array}{l} \sum_{i = 1}^{n_{1}} α^{i} x_{M_{i}}^{(1)} = \sum_{i = 1}^{n_{1}} α^{i} (z_{M_{i}}^{(1)} - p^{i}) \\ = \sum_{i = 1}^{n_{1}} α^{i} z_{M_{i}}^{(1)} - \sum_{i = 1}^{n_{1}} α^{i} p^{i} = \sum_{i = 1}^{n_{1}} α^{i} z_{M_{i}}^{(1)} \end{array}$

as $\sum_{i = 1}^{n_{1}} α^{i} p^{i} = 0$ by the constrain condition (2). Therefore, model (5) can be rewritten as

$\begin{array}{l} G = μ + \sum_{i = 1}^{n_{1}} α^{i} x_{M_{i}}^{(1)} + \sum_{j = 1}^{n_{1}} α^{i} x_{F_{j}}^{(1)} + \sum_{i, j} δ_{j}^{i} x_{M_{i}}^{(1)} x_{F_{j}}^{(1)} \\ + \sum_{k = 1}^{n_{2}} β^{k} x_{M_{k}}^{(2)} + \sum_{l = 1}^{n_{2}} β_{l} x_{F_{l}}^{(2)} + \sum_{k, l} γ_{l}^{k} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)} \\ + [\sum_{i, k} (α^{i} β^{k}) x_{M_{i}}^{(1)} x_{M_{k}}^{(2)} + \sum_{i, l} (α^{i} β_{l}) x_{M_{i}}^{(1)} x_{F_{l}}^{(2)} \\ + \sum_{j, k} (α_{j} β^{k}) x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} + \sum_{j, l} (α_{j} β_{l}) x_{F_{j}}^{(1)} x_{F_{l}}^{(2)}] \\ + [\sum_{i, k, l} (α^{i} γ_{l}^{k}) x_{M_{i}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)} + \sum_{j, k, l} (α_{j} γ_{l}^{k}) x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)}] \\ + [\sum_{i, j, k} (δ_{j}^{i} β^{k}) x_{M_{i}}^{(1)} x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} + \sum_{i, j, l} (δ_{j}^{i} β_{l}) x_{M_{i}}^{(1)} x_{F_{j}}^{(1)} x_{F_{l}}^{(2)}] \\ + \sum_{i, j, k, l} (δ_{j}^{i} γ_{l}^{k}) x_{M_{i}}^{(1)} x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)} (7) \end{array}$

When we deal with some reduced models in the next section, we will find that the model of this form is especially helpful as it makes the model parameter constraints built into regression variables which is suited for genetic interpretation. The form of model (7) can also facilitate the demonstration that under Hardy-Weinberg, linkage and genotypic equilibria, the regression coefficients (genetic effects) are Cockerham's least squares effects (3) (Appendix A), and the genotypic variance V_Ghas the orthogonal partition (4) (Appendix B).

Now we discuss the properties of model in a disequilibrium situation. As stated in [14], there are three types of disequilibria

• Typel: between alleles on the same gametes but at different loci

• Type2: between alleles at the same locus but on different gametes

• Type3: between alleles on different gametes and at different loci.

If we denote $P_{j l}^{i k}$ as the genotypic frequency of A_iB_k/A_jB_l, $P_{j .}^{i .}$ as the genotypic frequency of A_i/A_j, and so on, following [14], the digenic disequilibria can be written as

$\begin{array}{l} D_{..}^{i k} = Cov (z_{M_{i}}^{(1)}, z_{M_{k}}^{(2)}) = E (x_{M_{i}}^{(1)} x_{M_{k}}^{(2)}) = P_{..}^{i k} - p^{i} q^{k} \\ D_{j l}^{..} = Cov (z_{F_{j}}^{(1)}, z_{F_{l}}^{(2)}) = E (x_{F_{j}}^{(1)} x_{F_{l}}^{(2)}) = P_{j l}^{..} - p_{j} q_{l} \\ D_{j .}^{i .} = E (x_{M_{i}}^{(1)} x_{F_{j}}^{(1)}) = P_{j .}^{i .} - p^{i} p_{j} \\ D_{. l}^{. k} = E (x_{M_{k}}^{(2)} x_{F_{l}}^{(2)}) = P_{. l}^{. k} - q^{k} q_{l} \\ D_{. j}^{i .} = E (x_{M_{i}}^{(1)} x_{F_{l}}^{(2)}) = P_{. l}^{i .} - p^{i} q_{l} \\ D_{j .}^{. k} = E (x_{F_{j}}^{(1)} x_{M_{k}}^{(2)}) = P_{j .}^{. k} - p_{j} q^{k} . \end{array}$

And the trigenic disequilibria

$\begin{matrix} D_{j .}^{i k} = E (x_{M_{i}}^{(1)} x_{F_{j}}^{(1)} x_{M_{k}}^{(2)}) \\ = P_{j .}^{i k} - p^{i} P_{j .}^{. k} - q^{k} P_{j .}^{i .} - p_{j} P_{..}^{i k} + 2 p^{i} p_{j} q^{k} \\ D_{. l}^{i k} = E (x_{M_{i}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)}) \\ = P_{. l}^{i k} - p^{i} P_{. l}^{. k} - q^{k} P_{. l}^{i .} - q_{l} P_{..}^{i k} + 2 p^{i} q^{k} q_{l} \\ D_{j l}^{i .} = E (x_{M_{i}}^{(1)} x_{F_{j}}^{(1)} x_{F_{l}}^{(2)}) \\ = P_{j l}^{i .} - p^{i} P_{j l}^{..} - p_{j} P_{..}^{i .} - q_{l} P_{j .}^{i .} + 2 p^{i} p_{j} q_{l} \\ D_{j l}^{. k} = E (x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)}) \\ = P_{j l}^{. k} - p_{j} P_{. l}^{. k} - q^{k} P_{j .}^{. l} - q_{l} P_{j .}^{. k} + 2 p_{j} q^{k} q_{l} \end{matrix}$

Similarly for the quadrigenic disequilibrium, we may define

$\begin{matrix} D_{j l}^{i k} = E (x_{M_{i}}^{(1)} x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)}) \\ = P_{j l}^{i k} - p^{i} P_{j l}^{. k} - p_{j} P_{. l}^{i k} - q^{k} P_{j l}^{i .} - q_{l} P_{j .}^{i k} \\ + p^{i} p_{j} P_{. l}^{. k} - p^{i} q^{k} P_{j l}^{..} + p^{i} q_{l} P_{j .}^{. k} + p_{j} q^{k} P_{. l}^{i .} \\ + p_{j} q_{l} P_{..}^{i k} + q^{k} q_{l} P_{j .}^{i .} - 3 p^{i} p_{j} q^{k} q_{l} . \end{matrix}$

If we express $D_{j l}^{i k}$ as a function of lower-order linkage disequilibria, we have

$\begin{matrix} D_{j l}^{i k} = P_{j l}^{i k} - p^{i} D_{j l}^{. k} - p_{j} D_{. l}^{i k} - q^{k} D_{j l}^{i .} - q_{l} D_{j .}^{i k} \\ - p^{i} p_{j} D_{. l}^{. k} - p^{i} q^{k} D_{j l}^{..} - p^{i} q_{l} D_{j .}^{. k} - p_{j} q^{k} D_{. l}^{i .} \\ - p_{j} q_{l} D_{..}^{i k} - q^{k} q_{l} D_{j .}^{i .} - p^{i} p_{j} q^{k} q_{l} . \end{matrix}$

This definition is the same as that given by [15, 16]. Note that $\sum_{i} z_{M_{i}}^{(1)} = 1$ . Then, we have

$\begin{array}{l} \sum_{i} D_{j l}^{i k} = E [(\sum_{i} x_{M_{i}}^{(1)}) x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)}] \\ = E [\sum_{i} (z_{M_{i}}^{(1)} - p^{i}) x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)} = 0 \end{array}$

In general, $D_{j l}^{i k}$ is summed to zero over any allele involved, so are $D_{j .}^{i k}$ and other disequilibrium measurements.

With Hardy-Weinberg and genotypic equilibria but linkage disequilibrium, model (7) leads to the following expression for the overall mean

$\begin{matrix} E (G) = μ + \sum_{i, k} (α^{i} β^{k}) D_{..}^{i k} + \sum_{j, l} (α_{j} β_{l}) D_{j l}^{..} \\ + \sum_{i, j, k, l} (δ_{j}^{i} γ_{l}^{k}) D^{i k} D_{j l} \end{matrix}$

where μ is the mean genotypic value under linkage equilibrium, and $Δ_{μ} = \sum_{i, k} (α^{i} β^{k}) D_{..}^{i k} + \sum_{j, l} (α_{j} β_{l}) D_{j l}^{..} + \sum_{i, j, k, l} (δ_{j}^{i} γ_{l}^{k}) D_{..}^{i k} D_{j l}^{..}$ represents the departure from μ due to linkage disequilibrium and epistasis. If there is no epistasis, linkage disequilibrium does not affect the mean genotypic value. Similar results were given by [17]. Note that for marginal means of the genotypic values, we have

$\begin{array}{l} G_{..}^{..} = \sum_{i, j, k, l} P_{..}^{i k} P_{j l}^{..} G_{j l}^{i k}, G_{..}^{i .} = \frac{1}{p^{i}} \sum_{j, k, l} P_{..}^{i k} P_{j l}^{..} G_{j l}^{i k}, \\ G_{..}^{i k} = \sum_{j, l} P_{j l}^{..} G_{j l}^{i k}, G_{j .}^{i .} = \frac{1}{p^{i} p_{j}} \sum_{k, l} P_{j l}^{..} P_{..}^{i k} G_{j l}^{i k}, \\ G_{j .}^{i k} = \frac{1}{p_{j}} \sum_{l} P_{j l}^{..} G_{j l}^{i k}, ... \end{array}$

and so on.

Then the question is what the genetic effects are in a disequilibrium population. Do Hardy-Weinberg and linkage disequilibria change the definition and values of genetic effects? The short answer to this question is "no" in a fully characterized model, but "yes" in a model that ignores some QTL or genetic effects. This is proved and discussed in [10]. With Hardy-Weinberg and linkage disequilibria, the genetic effects no longer correspond to the deviations from marginal means of genotypic values in a disequilibrium population. In a multiple regression model (7), the genetic effects are partial regression coefficients. These partial regression coefficients correspond to the simple regression coefficients, or deviations from marginal means of genotypic values, only in an equilibrium population. In a disequilibrium population, a direct analysis on the partial regression coefficients can be very complex (see the appendix of [10] for a relatively simple example). However, in a full model which includes all relevant loci and genetic effects, the model parameters depend only on how the regressors, i.e. x variables in (7), are defined and are independent of correlations between x variables, i.e. Hardy-Weinberg and linkage disequilibria. So, the genetic effects are still the same as those defined in the equilibrium population, although the population mean and marginal means of genotypic values are changed in a disequilibrium population.

Hardy-Weinberg and linkage disequilibria introduce correlation between x variables, thus covariances between different genetic effect components. Define

$\begin{array}{l} A_{1} = \sum_{i = 1}^{n_{1}} α^{i} x_{M_{i}}^{(1)} + \sum_{j = 1}^{n_{1}} α_{j} x_{F_{j}}^{(1)} \\ D_{1} = \sum_{i, j} δ_{j}^{i} x_{M_{i}}^{(1)} x_{F_{j}}^{(1)} \\ A_{2} = \sum_{k = 1}^{n_{2}} β^{k} x_{M_{k}}^{(2)} + \sum_{l = 1}^{n_{2}} β_{l} x_{F_{l}}^{(2)} \\ D_{2} = \sum_{k, l} γ_{l}^{k} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)} \\ A_{1} A_{2} = \sum_{i, k} (α^{i} β^{k}) x_{M_{i}}^{(1)} x_{M_{k}}^{(2)} + \sum_{i, l} (α^{i} β_{l}) x_{M_{i}}^{(1)} x_{F_{l}}^{(2)} \\ + \sum_{j, k} (α_{j} β^{k}) x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} + \sum_{j, l} (α_{j} β_{l}) x_{F_{j}}^{(1)} x_{F_{l}}^{(2)} \\ A_{1} D_{2} = \sum_{i, k, l} (α^{i} γ_{l}^{k}) x_{M_{i}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)} \\ + \sum_{j, k, l} (α_{j} γ_{l}^{k}) x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)} \\ A_{2} D_{1} = \sum_{i, j, k} (δ_{j}^{i} β^{k}) x_{M_{i}}^{(1)} x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} \\ + \sum_{i, j, l} (δ_{j}^{i} β_{l}) x_{M_{i}}^{(1)} x_{F_{j}}^{(1)} x_{F_{l}}^{(2)} \\ D_{1} D_{2} = \sum_{i, j, k, l} (δ_{j}^{i} γ_{l}^{k}) x_{M_{i}}^{(1)} x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)} \end{array}$

Then we can write

G = μ + A₁ + A₂ + D₁ + D₂ + A₁A₂ + A₁D₂ + A₂D₁ + D₁D₂

In a disequilibrium population, the partition of the genotypic variance becomes

$V_{G} = \sum_{i = 1}^{8} \sum_{j = 1}^{8} V_{i j} = 1^{T} V 1$

where

V = (V_ij)_{8 × 8}

It is a symmetric matrix. In Appendix C, we give the detailed result for each component of the matrix with linkage disequilibrium, but assuming Hardy-Weinberg equilibrium.

For the rest of paper, when we discuss disequilibrium, we mainly discuss linkage disequilibrium and assume Hardy-Weinberg and genotypic equilibria which can be achieved by random mating in one generation. Hardy-Weinberg disequilibrium can be taken into account which will make results more complex and is thus omitted.

Reduced models

In many genetic applications, experimental population has some regular genetic structure by design. In these cases, the genetic model can be further simplified to reflect the experimental design structure. Also sometimes we may want to simplify the genetic model by imposing certain constrains or assumptions, such as the number of alleles, to increase the feasibility of analysis. In this section, we give a few reduced genetic models that are relevant to many genetic applications.

1. Backcross population or recombinant inbred population (haploid model)

Backcross population or recombinant inbred population is a common experimental design for QTL mapping study. By crossing two inbred lines, we can create a F₁ population. If we randomly backcross F₁ to one of the inbred lines, we have a backcross population. Let us assume that the cross is AA (paternal) × Aa (maternal). In a random-mating backcross population, there are only two possible genotypes at each segregating locus A_rA_ror A_ra_r, for r = 1, 2, ..., m, where m is the number of QTL. Since for the paternal gametes,

$z_{M_{1}}^{(r)} = {\begin{array}{l} 1, & for A_{r} allele from paternal gamete \\ 0, & otherwise \end{array} = 1,$

and

$z_{M_{2}}^{(r)} = {\begin{array}{l} 1, & for a_{r} allele from paternal gamete \\ 0, & otherwise \end{array} = 0,$

thus $x_{M_{1}}^{(r)} = z_{M_{1}}^{(r)} - 1 = 0$ and $x_{M_{2}}^{(r)} = z_{M_{2}}^{(r)} = 0$ for r = 1, 2, ..., m. For maternal gametes however,

$\begin{matrix} x_{F_{1}}^{(r)} = {\begin{array}{l} 1 / 2, for A_{r} from maternal gamete \\ - 1 / 2, otherwise \end{array} \\ = - x_{F_{2}}^{(r)}, for r = 1, 2, \cdot \cdot \cdot, m . \end{matrix}$

Thus the model becomes

$\begin{matrix} G = μ + \sum_{r = 1}^{m} a_{r} x_{F_{1}}^{(r)} + \sum_{r < s} b_{r s} (x_{F_{1}}^{(r)} x_{F_{1}}^{(s)}) \\ + \sum_{r < s < t} c_{r s t} (x_{F_{1}}^{(r)} x_{F_{1}}^{(s)} x_{F_{1}}^{(t)}) + \dots \end{matrix} (8)$

where $a_{r} = α_{1}^{(r)} - α_{2}^{(r)}$ is the substitution effect between homozygote genotype A_rA_rand heterozygote genotype $A_{r} a_{r}, b_{r s} = (α_{1}^{(r)} α_{1}^{(s)}) - (α_{1}^{(r)} α_{2}^{(s)}) - (α_{2}^{(r)} α_{1}^{(s)}) + (α_{2}^{(r)} α_{2}^{(s)})$ is the interaction effect between loci r and $s, c_{r s t} = (α_{1}^{(r)} α_{1}^{(s)} α_{1}^{(t)}) - (α_{1}^{(r)} α_{1}^{(s)} α_{2}^{(t)}) - (α_{1}^{(r)} α_{2}^{(s)} α_{1}^{(t)}) + (α_{1}^{(r)} α_{2}^{(s)} α_{2}^{(t)}) - (α_{2}^{(r)} α_{1}^{(s)} α_{1}^{(t)}) + (α_{2}^{(r)} α_{1}^{(s)} α_{2}^{(t)}) + (α_{2}^{(r)} α_{2}^{(s)} α_{1}^{(t)}) - (α_{2}^{(r)} α_{2}^{(s)} α_{2}^{(t)})$ , ..., and so on. Taking constraint conditions into account, we have α₁ = -α₂, β₁ = -β₂, and so on. Then, a_r= $2 α_{1}^{(r)}$ , b_rs= 4( $α_{1}^{(r)} α_{1}^{(s)}$ ), and c_rst= 8( $α_{1}^{(r)} α_{1}^{(s)} α_{1}^{(t)}$ ), and so on. With linkage equilibrium, the genetic effects as the partial regression coefficients of the model correspond to the simple regression coefficients. For example, for the substitution effect of locus r, a_r, it is the covariance between genotypic value G and substitution effect design variable $x_{F_{1}}^{(r)}$ divided by the variance of $x_{F_{1}}^{(r)}$ . So in general, we have

$\begin{matrix} a_{r} = E [(G - μ) x_{F_{1}}^{(r)}] / E (x_{F_{1}}^{(r) 2}) \\ = E [G (z_{F_{1}}^{(r)} - 1 / 2)] / (1 / 4) \\ = 2 [E (G | z_{F_{1}}^{(r)} = 1) - E (G)] \\ b_{r s} = E [(G - μ) x_{F_{1}}^{(r)} x_{F_{1}}^{(s)}] / [E (x_{F_{1}}^{(r) 2}) E (x_{F_{1}}^{(s) 2})] \\ \begin{array}{l} = 4 [E (G | z_{F_{1}}^{(r)} = z_{F_{1}}^{(s)} = 1) - E (G | z_{F_{1}}^{(r)} = 1) \\ - E (G | z_{F_{1}}^{(s)} = 1) + E (G)] \end{array} \\ c_{r s t} = E [(G - μ) x_{F_{1}}^{(r)} x_{F_{1}}^{(s)} x_{F_{1}}^{(t)}] / \\ [E (x_{F_{1}}^{(r) 2}) E (x_{F_{1}}^{(s) 2}) E (x_{F_{1}}^{(t) 2})] \\ \begin{array}{l} = 8 [E (G | z_{F_{1}}^{(r)} = z_{F_{1}}^{(s)} = z_{F_{1}}^{(t)} = 1) \\ - E (G | z_{F_{1}}^{(r)} = z_{F_{1}}^{(s)} = 1) \\ - E (G | z_{F_{1}}^{(r)} = z_{F_{1}}^{(t)} = 1) \\ - E (G | z_{F_{1}}^{(s)} = z_{F_{1}}^{(t)} = 1) \\ + E (G | z_{F_{1}}^{(r)} = 1) + E (G | z_{F_{1}}^{(s)} = 1) \\ + E (G | z_{F_{1}}^{(t)} = 1) - E (G)] \end{array} \end{matrix}$

The orthogonal partition of the genotypic variance in an equilibrium population is

$V_{G} = \frac{1}{4} \sum_{r = 1}^{l} a_{r}^{2} + \frac{1}{4^{2}} \sum_{r < s} b_{r s}^{2} + \frac{1}{4^{3}} \sum_{r < s < t} c_{r s t}^{2} + \dots (9)$

As noted above, linkage disequilibrium does not change the values of genetic effects in a full model. The model parameters are still the same as those defined in the equilibrium population. However, in this case there is a simple relationship between the substitution effects at multiple loci and marginal means of genotypic values in a disequilibrium population [18]. This is noted here. Let $P_{r s} = P {z_{F_{1}}^{(r)} = z_{F_{1}}^{(s)} = 1}$ , and the digenic linkage disequilibrium be defined as

$D_{r s} = Cov(z_{F_{1}}^{(r)}, z_{F_{1}}^{(s)}) = E (x_{F_{1}}^{(r)} x_{F_{1}}^{(s)})$

Ignoring trigenic and higher order linkage disequilibria, we have

$E [(G - μ) x_{F_{1}}^{(r)}] = \frac{1}{4} a_{r} + \sum_{r' \neq r} D_{r r'} a_{r'}$

$E [(G - μ) x_{F_{1}}^{(r)} x_{F_{1}}^{(s)}] = \frac{1}{4^{2}} b_{r s}$

Therefore, the digenic interaction effects can be expressed as

$\begin{array}{l} b_{r s} = 4^{2} [P_{r s} E (G | z_{F_{1}}^{(r)} = z_{F_{1}}^{(s)} = 1) - D_{r s} μ] \\ - 4 [E (G | z_{F_{1}}^{(r)} = 1) + E (G | z_{F_{1}}^{(s)} = 1) - E (G)] \end{array}$

Then the substitution effects can be expressed as a function of marginal means in the disequilibrium population as

$(\begin{matrix} a_{1} \\ a_{2} \\ ⋮ \\ a_{l} \end{matrix}) = {(I + 4 D)}^{- 1} (2 q)$

where I is a m × m identity matrix, D = (D_ij)_{m × m}with all diagonal elements being zeros;

q = (q₁, q₂, ..., q_m,)^T, with q_i= E(G| $z_{F_{1}}^{(i)}$ = 1) - E(G), for i = 1, 2, ..., m.

The partition of genetic variance with linkage disequilibrium is complex. Here we give details of the partition of genotypic variance for the following model

$G = μ + \sum_{r = 1}^{m} a_{r} x_{F_{1}}^{(r)} + \sum_{r < s} b_{r s} (x_{F_{1}}^{(r)} x_{F_{1}}^{(s)}) (10)$

Let x_r= $x_{F_{1}}^{(r)}$ and x_s= $x_{F_{1}}^{(s)}$ to simplify the notation here. The genotypic variance is

$\begin{array}{l} V_{G} = V (\sum_{r} a_{r} x_{r}) + 2 Cov(\sum_{r} a_{r} x_{r}, \sum_{r < s} b_{r s} x_{r} x_{s}) \\ + V (\sum_{r < s} b_{r s} x_{r} x_{s}) \\ = \sum_{r} a_{r}^{2} p_{r} (1 - p_{r}) + 2 \sum_{r < s} a_{r} a_{s} D_{r s} \\ + 2 \sum_{r < s} [a_{r} b_{r s} (1 - 2 p_{r}) D_{r s} + a_{s} b_{r s} (1 - 2 p_{s}) D_{r s}] \\ + 2 \sum_{r < s < t} (a_{r} b_{s t} + a_{s} b_{r t} + a_{t} b_{r s}) D_{r s t} \\ + \sum_{r < s} b_{r s}^{2} [(1 - 2 p_{r}) (1 - 2 p_{s}) D_{r s} \\ + p_{r} (1 - p_{r}) p_{s} (1 - p_{s}) - D_{r s}^{2}] \\ + 2 \sum_{r < s < t} {b_{r s} b_{r t} [(1 - 2 p_{r}) D_{r s t} \\ - p_{r} (1 - p_{r}) D_{s t} - D_{r s} D_{r t}] \\ + b_{r s} b_{s t} [(1 - 2 p_{s}) D_{r s t} + p_{s} (1 - p_{s}) D_{r t} - D_{r s} D_{s t}] \\ + b_{r t} b_{s t} [(1 - 2 p_{t}) D_{r s t} \\ + p_{t} (1 - p_{t}) D_{r s} - D_{r t} D_{s t}]} \\ + 2 \sum_{r < s < t < u} [b_{r t} b_{s u} (D_{r s t u} - D_{r t} D_{s u}) \\ + b_{r u} b_{s t} (D_{r s t u} - D_{r u} D_{s t}) \\ + b_{r s} b_{t u} (D_{r s t u} - D_{r s} D_{t u})] (11) \end{array}$

where

D_rst= E(x_rx_sx_t) and D_rstu= E(x_rx_sx_tx_u)

are three locus and four locus linkage disequilibria. This is a general partition of genetic variance for a haploid model.

For the backcross population, it can be shown that D_rst= 0 (see Appendix D for both backcross and F₂ populations) and D_rstu= D_rsD_tufor loci r, s, t and u in this order under the assumption of no crossing-over interference. Also with this assumption, D_rt= 4D_rsD_stand D_rs= (1 - 2λ_rs)/4, where λ_rsis the recombination frequency between loci r and s. Since, p_r= p_s= 1/2, the variance becomes

$\begin{array}{l} V_{G} = & \frac{1}{4} \sum_{r} a_{r}^{2} + 2 \sum_{r < s} a_{r} a_{s} D_{r s} + \\ \frac{1}{16} \sum_{r < s} b_{r s}^{2} (1 - 16 D_{r s}^{2}) \\ + \frac{1}{2} \sum_{r < s < t} [b_{r s} b_{r t} (1 - 16 D_{r s}^{2}) D_{s t} \\ + b_{r t} b_{s t} (1 - 16 D_{s t}^{2}) D_{r s}] \\ + 2 \sum_{r < s < t < u} (b_{r t} b_{s u} + b_{r u} b_{s t}) \\ (1 - 16 D_{s t}^{2}) D_{r s} D_{t u} & (12) \end{array}$

In this partition of variance, the first summation term is the genetic variance due to the substitution effect of each QTL, the second summation term is the covariance between substitution effects of QTL pairs due to linkage disequilibrium, the third summation term is the genetic variance due to epistatic effects of QTL, and the fourth and fifth summation terms are the covariance between different epistatic effects of QTL due to linkage disequilibrium. There is no covariance between the main substitution effects and epistatic effects (see also [19]).

For a backcross population, the genetic interpretation of the substitution effect a_rdepends on which parental line is backcrossed. In one backcross AA × Aa, the substitution effect is traditionally defined as the difference between the additive effect and dominance effects, and in the other backcross Aa × aa, it is the sum of the additive and dominance effects. Only with both backcrosses, can one estimate both additive and dominance effects separately (for example [20]).

The same model also applies to a recombinant inbred population which is another very popular experimental design for QTL mapping study. For a recombinant inbred population, the substitution effects of QTL are the additive effects and the epistatic effects are the additive × additive interaction effects. Statistical methods to map QTL and to estimate various components of the genetic variance due to QTL including epistasis has been developed through the maximum likelihood approach [19, 21]. In a few cases where the method was applied, we estimated, for the first time, how the quantitative genetic variance was partitioned into various components in designed experimental populations. For example, Weber et al. [22] reported the result of QTL mapping for wing shape on the third chromosome of Drosophila melanogaster from a cross of divergent selection lines. From 519 recombinant inbred lines, 11 QTL were mapped on the third chromosome. Nine QTL pairs showed significant epistatic effects. The total genetic variance amounts to 95.5% of the phenotypic variance in the recombinant inbred lines with phenotypes measured and averaged over 50 male flies for each recombinant inbred line. The partition of the genetic variance is as follows (see Table 6 and 7 of [22]): 27.4% due to the variances of additive effects (equivalent to the first summation term of (12)); 67.3% due to the covariances between additive effects (the second summation term); 7.2% due to the variances of epistatic effects (the third summation term); and -6.0% due to the covariances between epistatic effects (the fourth and fifth summation terms). The covariances between additive and epistatic effects, expected to be 0, account for -0.4% due to sampling. Similar kind of partition of the genetic variance is also observed in a group of 701 second chromosome recombinant inbred lines from a cross of the same divergent selection lines (see Table 4 and 5 of [23]). See also [20] for another example.

2. F₂ population

F₂ is created from a cross between pairs of F₁ individuals. It is also a very popular experimental design for QTL mapping study. The advantage of this design is that both additive and dominance effects of a QTL can be estimated as well as various epistatic effects. The design also has more statistical power for QTL detection as compared to a backcross population. In a random-mating F₂ population, there are only two alleles at each segregating locus and allelic frequencies are expected to be one half if there is no segregation distortion.

Let us consider only two loci first. Let A and a denote the two alleles at locus 1, and B and b at locus 2. In this case, $x_{M_{1}}^{(1)} = - x_{M_{2}}^{(1)}$ and $x_{F_{1}}^{(1)} = - x_{F_{2}}^{(1)}$ . Assuming $G_{j l}^{i j} = G_{i k}^{j l} = G_{j k}^{i l} = G_{i l}^{j k}$ , it also holds that α¹ = α₁ = -α² = -α₂, $δ_{1}^{1} = δ_{2}^{2} = - δ_{1}^{2}$ , and so on. The additive term for locus 1 then becomes

$A_{1} = α^{1} (x_{M_{1}}^{(1)} - x_{M_{2}}^{(1)}) + α_{1} (x_{F_{1}}^{(1)} - x_{F_{2}}^{(1)}) = 2 α_{1} w_{1}$

with

$w_{1} = x_{M_{1}}^{(1)} + x_{F_{1}}^{(1)} = {\begin{array}{l} 1, for A A at locus 1 \\ 0, for A a at locus 1 \\ - 1, for a a at locus 1 \end{array} (13)$

and the dominance term is

$D_{1} = 2 (δ_{1}^{1}) (x_{M_{1}}^{(1)} x_{F_{1}}^{(1)} + x_{M_{1}}^{(1)} x_{F_{1}}^{(1)}) = (- 2) (δ_{1}^{1}) v_{1}$

with

$v_{1} = (- 2) x_{M_{1}}^{(1)} x_{F_{1}}^{(1)} = {\begin{matrix} - 1 / 2, for A A at locus 1 \\ 1 / 2, for A a at locus 1 \\ - 1 / 2, for a a at locus 1 \end{matrix} (14)$

Note that the v variable in this section for F₂ differ, by a factor -2, from the v variable in the next section for a general two-allele model to conform to the usual definition for the F₂ model. Similarly, for locus 2

A₂ = 2β₁w₂ and D₂ = (-2)( $γ_{1}^{2}$ )v₂

with

$w_{2} = {\begin{matrix} 1, for B B at locus 2 \\ 0, for B b at locus 2 \\ - 1, for b b at locus 2 \end{matrix}$

and

$v_{2} = {\begin{matrix} - 1 / 2, for B B at locus 2 \\ 1 / 2, for B b at locus 2 \\ - 1 / 2, for b b at locus 2 \end{matrix} (15)$

The model can then be written as

$\begin{matrix} G = μ + a_{1} w_{1} + d_{1} v_{1} + a_{2} w_{2} + d_{2} v_{2} \\ + {(a a)}_{12} (w_{1} w_{2}) + {(a d)}_{12} (w_{1} v_{2}) \\ + {(d a)}_{12} (v_{1} w_{2}) + {(d d)}_{12} (v_{1} v_{2}) \end{matrix} (16)$

where the parameters are related as a₁ = 2α₁, a₂ = 2β₂, d₁ = - $- 2 δ_{1}^{1}$ , d₂ = - $- 2 γ_{1}^{1}$ , (aa)₁₂ = 4(α₁β₁), (ad)₁₂ = 8( $α_{1} γ_{1}^{1}$ ), (da)₁₂ = 8( $δ_{1}^{1} β_{1}$ ), (dd)₁₂ = 16( $δ_{1}^{1} γ_{1}^{1}$ ). With random mating and linkage equilibrium, we have

$\begin{matrix} a_{1} = E [G - μ) w_{1}] / E (w_{1}^{2}) = 2 (G_{1.}^{..} - G_{..}^{..}) \\ d_{1} = E [(G - μ) v_{1}] / E (v_{2}^{1}) \\ = (- 2) (G_{1.}^{1.} - 2 G_{1.}^{..} + G_{..}^{..}) \\ a_{2} = 2 (G_{.1}^{..} - G_{..}^{..}) \\ d_{2} = (- 2) (G_{.1}^{.1} - 2 G_{.1}^{..} + G_{..}^{..}) \\ (a a)_{12} = E [(G - μ) w_{1} w_{2}] / E (w_{1}^{2} w_{2}^{2}) \\ = 4 (G_{..}^{11} - G_{1.}^{..} - G_{.1}^{..} + G_{..}^{..} \\ (a d)_{12} = E [(G - μ) w_{1} v_{2}] / E (w_{1}^{2} v_{2}^{2}) \\ = (- 4) (G_{11}^{.1} - 2 G_{11}^{..} - G_{.1}^{.1} + G_{1.}^{..} + 2 G_{.1}^{..} - G_{..}^{..}) \\ (d a)_{12} = E [(G - μ) v_{1} w_{2}] / E (v_{1}^{2} w_{2}^{2}) \\ = (- 4) (G_{11}^{1.} - 2 G_{11}^{..} - G_{1.}^{1.} + G_{.1}^{..} + 2 G_{1.}^{..} - G_{..}^{..}) \\ (d d)_{12} = E [(G - μ) v_{1} v_{2}] / E (v_{1}^{2} v_{2}^{2}) \\ = 4 (G_{11}^{11} - 2 G_{11}^{1.} - 2 G_{11}^{.1} + G_{1.}^{1.} + G_{.1}^{.1} + 4 G_{11}^{..} \\ - 2 G_{1.}^{..} - 2 G_{.1}^{..} + G_{..}^{..}) \end{matrix}$

The orthogonal partition of the genotypic variance is

$\begin{matrix} V_{G} = \frac{1}{2} a_{1}^{2} + \frac{1}{4} d_{1}^{2} + \frac{1}{2} a_{2}^{2} + \frac{1}{4} d_{2}^{2} + \frac{1}{4} {(a a)}_{12}^{2} \\ + \frac{1}{8} {(a d)}_{12}^{2} + \frac{1}{8} {(d a)}_{12}^{2} + \frac{1}{16} {(d d)}_{12}^{2} . & (17) \end{matrix}$

Recently, Kao and Zeng [18] have examined many genetic and statistical issues of the above F₂ model and the effects of linkage disequilibrium. As we have shown here, the F₂ model is a special case of Cockerham model with two alleles at each locus and all allelic frequencies being 1/2.

Now we give the partition of genetic variance for m loci with epistasis and linkage disequilibrium in the F₂ population. Generalizing model (16) to m loci and ignoring the trigenic and higher order epistasis, we have the following model

$\begin{matrix} G = μ + \sum_{r = 1}^{m} a_{r} w_{r} + \sum_{r = 1}^{m} d_{r} v_{r} + \sum_{r < s} {(a a)}_{r s} (w_{r} w_{s}) \\ + \sum_{r \neq s} {(a d)}_{r s} (w_{r} v_{s}) + \sum_{r < s} {(d d)}_{r s} (v_{r} v_{s}) \\ = μ + A + D + A A + A d + D D & (18) \end{matrix}$

The partition of genetic variance for this model under the assumption of Hardy-Weinberg equilibrium is

$\begin{array}{l} V_{G} = V_{A} + V_{D} + V_{A A} + V_{A D} + V_{D D} \\ + 2 Cov (A, D) + 2 Cov (A, A A) + 2 Cov (A, A D) \\ + 2 Cov(A,DD)+2Cov (D, A A) + 2 Cov (D, A D) \\ + 2 Cov(D, D D) + 2 C o v (A A, A D) \\ + 2 Cov(A A, D D) + 2 Cov (A D, D D) . & (19) \end{array}$

The detail of each component is presented in Appendix D.

The F₂ model is a special case of the general two-allele model with p_r= 1/2. Note the difference on the v variable used for the F₂ model and for the general two-allele model below. This partition of genetic variance can provide a basis for the interpretation of genetic variance estimation by multiple interval mapping in a F₂ population [19, 21].

3. A general two-allele model

Here, we provide details of a general two-allele model for multiple loci. This model is probably useful for studying genetic architecture of a quantitative trait in natural populations. Let the two alleles at locus r be A_rand a_rfor r = 1, 2, ..., m with m the number of QTL. Assume that the frequencies and genetic effects of alleles are the same for both paternal and maternal gametes. Let p_rdenote the frequency of allele A_rat locus r. Note that in this case $z_{M_{1}}^{(r)} = 1 - z_{M_{2}}^{(r)}$ , r = 1, 2, ..., m. Also $x_{M_{1}}^{(r)} = z_{M_{1}}^{(r)} - E [z_{M_{1}}^{(r)}] = (1 - z_{M_{2}}^{(r)}) - E (1 - z_{M_{2}}^{(r)}) = - x_{M_{2}}^{(r)}$ . Similarly, $x_{F_{2}}^{(r)} = - x_{F_{1}}^{(r)}$ Ignoring higher order epistasis involving at least three loci, we can define a two-allele model as

$\begin{matrix} G = μ + \sum_{r = 1}^{m} a_{r} w_{r} + \sum_{r = 1}^{m} d_{r} v_{r} + \sum_{r < s} {(a a)}_{r s} (w_{r} w_{s}) \\ + \sum_{r \neq s} {(a d)}_{r s} (w_{r} v_{s}) + \sum_{r < s} {(d d)}_{r s} (v_{r} v_{s}) & (20) \end{matrix}$

where

$\begin{matrix} w_{r} = x_{M_{1}}^{(r)} + x_{F_{1}}^{(r)} \\ = {\begin{matrix} (1 - p_{r}) & for A_{r} A_{r} at locus r \\ 1 - 2 p_{r} & for A_{r} a_{r} at locus r \\ - 2 p_{r} & for a_{r} a_{r} at locus r \end{matrix} (21) \end{matrix}$

$\begin{matrix} v_{r} = x_{M_{1}}^{(x)} x_{F_{1}}^{(x)} \\ = {\begin{array}{l} {(1 - p_{r})}^{2} & for A_{r} A_{r} at locus r \\ - p_{r} (1 - p_{r}) & for A_{r} a_{r} at locus r \\ p_{r}^{2} & for a_{r} a_{r} at locus r \end{array} (22) \end{matrix}$

for r = 1, 2, ..., m. The coefficients are associated with the original parameters in Cockerham model as follows.

$\begin{matrix} a_{r} = & α_{1}^{(r)} - α_{2}^{(r)} \\ d_{r} = & δ_{1}^{1 (r)} - δ_{2}^{1 (r)} - δ_{1}^{2 (r)} + δ_{2}^{2 (r)} \\ {(a a)}_{r s} = & (α_{1}^{(r)} α_{1}^{(s)}) - (α_{1}^{(r)} α_{2}^{(s)}) - (α_{2}^{(r)} α_{1}^{(s)}) \\ + (α_{2} α_{2}^{(s)}) \\ {(a d)}_{r s} = & (α_{1}^{(r)} δ_{1}^{1 (s)}) - (α_{1}^{(r)} δ_{2}^{1 (s)}) - (α_{1}^{(r)} δ_{1}^{2 (s)}) \\ + (α_{1}^{(r)} δ_{2}^{2 (s)}) - (α_{2}^{(r)} δ_{1}^{1 (s)}) + (α_{2}^{(r)} δ_{2}^{1 (s)}) \\ + (α_{2}^{(r)} δ_{1}^{2 (s)}) - (α_{2}^{(r)} δ_{2}^{2 (s)}) \\ {(d d)}_{r s} = & (δ_{1}^{1 (r)} δ_{1}^{1 (s)}) - (δ_{1}^{1 (r)} δ_{2}^{1 (s)}) - (δ_{1}^{1 (r)} δ_{1}^{2 (s)}) \\ + (δ_{1}^{1 (r)} δ_{2}^{2 (s)}) - (δ_{2}^{1 (r)} δ_{1}^{1 (s)}) + (δ_{2}^{1 (r)} δ_{2}^{1 (s)}) \\ + (δ_{2}^{1 (r)} δ_{1}^{2 (s)}) - (δ_{2}^{1 (r)} δ_{2}^{2 (s)}) - (δ_{1}^{2 (r)} δ_{1}^{1 (s)}) \\ + (δ_{1}^{2 (r)} δ_{2}^{1 (s)}) + (δ_{1}^{2 (r)} δ_{1}^{2 (s)}) - (δ_{1}^{2 (r)} δ_{2}^{2 (s)}) \\ \begin{array}{l} + (δ_{2}^{2 (r)} δ_{1}^{1 (s)}) - (δ_{2}^{2 (r)} δ_{2}^{1 (s)}) - (δ_{2}^{2 (r)} δ_{1}^{2 (s)}) \\ + (δ_{2}^{2 (r)} δ_{2}^{2 (s)}) \end{array} \end{matrix}$

The constraint conditions further lead to

$\begin{matrix} a_{r} = α_{1}^{(r)} - \frac{(- p_{r})}{(1 - p_{r})} α_{1}^{(r)} = \frac{1}{(1 - p_{r})} α_{1}^{(r)} \\ d_{r} = \frac{1}{{(1 - p_{r})}^{2}} δ_{1}^{1 (r)} \\ (a a)_{r s} = \frac{1}{(1 - p_{r}) (1 - p_{s})} (α_{1}^{(r)} α_{1}^{(s)}) \\ (a d)_{r s} = \frac{1}{(1 - p_{r}) {(1 - p_{s})}^{2}} (α_{1}^{(r)} δ_{1}^{1 (s)}) \\ (d d)_{r s} = \frac{1}{{(1 - p_{r})}^{2} {(1 - p_{s})}^{2}} (δ_{1}^{1 (r)} δ_{1}^{1 (s)}) \end{matrix}$

With Hardy-Weinberg, linkage and genotypic equilibria, the partial regression coefficients in the above model correspond to the simple regression coefficients

$\begin{array}{l} a_{r} & = & \frac{1}{(1 - p_{r})} [E (G | z_{M_{1}}^{(r)} = 1) - E (G)] \\ d_{r} & = & \frac{1}{{(1 - p_{r})}^{2}} [E (G | z_{M_{1}}^{(r)} = z_{F_{1}}^{(r)} = 1) \\ - 2 E (G | z_{M_{1}}^{(r)} = 1) + E (G)] \\ {(a a)}_{r s} & = & \frac{1}{(1 - p_{r}) (1 - p_{s})} [E (G | z_{M_{1}}^{(r)} = z_{M_{1}}^{(s)} = 1) \\ - E (G | z_{M_{1}}^{(r)} = 1) - E (G | z_{M_{1}}^{(s)} = 1) + E (G)] \\ {(a d)}_{r s} & = & \frac{1}{(1 - p_{r}) {(1 - p_{s})}^{2}} [E (G | z_{M_{1}}^{(r)} = z_{M_{1}}^{(s)} = z_{F_{1}}^{(s)} = 1) \\ - 2 E (G | z_{M_{1}}^{(r)} = z_{M_{1}}^{(s)} = 1) \\ - E (G | z_{M_{1}}^{(s)} = z_{F_{1}}^{(s)} = 1) + E (G | z_{M_{1}}^{(r)} = 1) \\ + 2 E (G | z_{M_{1}}^{(s)} = 1) - E (G)] \\ {(d d)}_{r s} & = & \frac{1}{{(1 - p_{r})}^{2} {(1 - p_{s})}^{2}} \\ [E (G | z_{M_{1}}^{(r)} = z_{F_{1}}^{(r)} = z_{M_{1}}^{(s)} = z_{F_{1}}^{(s)} = 1) \\ - 2 E (G | z_{M_{1}}^{(r)} = z_{F_{1}}^{(r)} = z_{M_{1}}^{(s)} = 1) \\ - 2 E (G | z_{M_{1}}^{(r)} = z_{M_{1}}^{(s)} = z_{F_{1}}^{(s)} = 1) \\ - E (G | z_{M_{1}}^{(r)} = z_{F_{1}}^{(r)} = 1) \\ + E (G | z_{M_{1}}^{(s)} = z_{F_{1}}^{(r)} = 1) \\ + 4 E (G | z_{M_{1}}^{(r)} = z_{F_{1}}^{(s)} = 1) \\ - 2 E (G | z_{M_{1}}^{(r)} = 1) - 2 E (G | z_{M_{1}}^{(s)} = 1) \\ + E (G)] \end{array}$

Note that in this case the genetic effects in the original model are

$\begin{array}{l} α_{1}^{(r)} & = & E (G | z_{M_{1}}^{(r)} = 1) - E (G) \\ δ_{1}^{1} (r) & = & E (G | z_{M_{1}}^{(r)} = z_{F_{1}}^{(r)} = 1) \\ - 2 E (G | z_{M_{1}}^{(r)} = 1) + E (G) \\ (α_{1}^{(r)} α_{1}^{(s)}) & = & E (G | z_{M_{1}}^{(r)} = z_{M_{1}}^{(s)} = 1) \\ - E (G | z_{M_{1}}^{(r)} = 1) - E (G | z_{M_{1}}^{(s)} = 1) \\ + E (G) \\ (α_{1}^{(r)} δ_{1}^{1 (s)}) & = & E (G | z_{M_{1}}^{(r)} = z_{M_{1}}^{(s)} = z_{F_{1}}^{(s)} = 1) \\ - 2 E (G | z_{M_{1}}^{(r)} = z_{M_{1}}^{(s)} = 1) \\ - E (G | z_{M_{1}}^{(s)} = z_{F_{1}}^{(s)} = 1) \\ + E (G | z_{M_{1}}^{(r)} = 1) + 2 E (G | z_{M_{1}}^{(s)} = 1) \\ - E (G) \\ (δ_{1}^{1 (r)} δ_{1}^{1 (s)}) & = & E (G | z_{M_{1}}^{(r)} = z_{F_{1}}^{(r)} = z_{M_{1}}^{(s)} = z_{F_{1}}^{(s)} = 1) \\ - 2 E (G | z_{M_{1}}^{(r)} = z_{F_{1}}^{(r)} = z_{M_{1}}^{(s)} = 1) \\ - 2 E (G | z_{M_{1}}^{(r)} = z_{M_{1}}^{(s)} = z_{F_{1}}^{(s)} = 1) \\ + E (G | z_{M_{1}}^{(r)} = z_{F_{1}}^{(r)} = 1] \\ + E (G | z_{M_{1}}^{(s)} = z_{F_{1}}^{(s)} = 1) \\ + 4 E (G | z_{M_{1}}^{(r)} = z_{F_{1}}^{(s)} = 1) \\ - 2 E (G | z_{M_{1}}^{(r)} = 1) - 2 E (G | z_{M_{1}}^{(s)} = 1) \\ + E (G) \end{array}$

They are the same as the least squares definition.

Yet another form of this result is shown in Table 1 of [10]. Zeng et al. [10] also show that linkage disequilibrium does not change the values of genetic effects in a full model. This means that the partial regression coefficients in a disequilibrium population equal to the simple regression coefficients in a corresponding equilibrium population with the same allelic frequency configuration.

The partition of genotypic variance in an equilibrium population is

$\begin{matrix} V_{G} = 2 \sum_{r = 1}^{m} p_{r} (1 - p_{r}) a_{r}^{2} + \sum_{r = 1}^{m} p_{r}^{2} {(1 - p_{r})}^{2} d_{r}^{2} \\ + 4 \sum_{r < s} p_{r} p_{s} (1 - p_{r}) (1 - p_{s}) {(a a)}_{r s}^{2} \\ + 2 \sum_{r \neq s} p_{r} (1 - p_{r}) p_{s}^{2} {(1 - p_{s})}^{2} {(a d)}_{r s}^{2} \\ + \sum_{r < s} p_{r}^{2} p_{s}^{2} {(1 - p_{r})}^{2} {(1 - p_{s})}^{2} {(d d)}_{r s}^{2} & (23) \end{matrix}$

or

$\begin{matrix} V_{G} = \frac{2 p_{r}}{(1 - p_{r})} \sum_{r = 1}^{m} {(α_{1}^{(r)})}^{2} + \frac{p_{r}^{2}}{{(1 - p_{r})}^{2}} \sum_{r = 1}^{m} {(δ_{1}^{1 (r)})}^{2} \\ + \frac{4 p_{r} p_{s}}{(1 - p_{r}) (1 - p_{s})} \sum_{r < s} {(α_{1}^{(r)} α_{1}^{(s)})}^{2} \\ + \frac{2 p_{r} p_{s}^{2}}{(1 - p_{r}) {(1 - p_{s})}^{2}} \sum_{r \neq s} {(α_{1}^{(r)} δ_{1}^{1 (s)})}^{2} \\ + \frac{p_{r}^{2} p_{s}^{2}}{{(1 - p_{r})}^{2} {(1 - p_{s})}^{2}} \sum_{r < s} {(δ_{1}^{1 (r)} δ_{1}^{1 (s)})}^{2} \end{matrix} (24)$

The partition of the genetic variance with epistasis and linkage disequilibrium is complex. We give the result with trigenic and quadrigenic linkage disequilibria included as well. The partition of variance has a similar form as (19). The detail of each component is presented in Appendix E.

Discussion

In this paper we explore various properties of the standard quantitative genetic model with multiple interacting loci in linkage equilibrium and disequilibrium. Starting from the traditional least squares model, we represent it in the setting of multiple regression with standardized allelic indicator variables and their products as the independent variables and the trait value as the dependent variable. Then the partial regression coefficients associated with these indicator variables define the additive, dominance and epistatic effects for QTL. This is the original definition of QTL effects introduced by Fisher [6] and extended to epistasis by Cockerham [7]. We examine the properties and meaning of these QTL effects in an equilibrium population and also in a disequilibrium population. We show details of the partition of genetic variance for both equilibrium and disequilibrium populations in terms of QTL effects, allelic frequencies and disequilibrium measures. Moreover, we relate this general model to several reduced models used for QTL mapping analysis in cross populations from inbred lines, such as F₂, backcross and recombinant inbred lines. The detailed partition of genetic variance in these populations can provide a basis for the interpretation of genetic variance component estimates from multiple interval mapping [21].

The purpose of modeling QTL is to provide a meaningful and convenient framework and basis to infer and interpret relative significance of each QTL and intricate inter-relationship among QTL on a set of quantitative traits in an experimental or natural population for genetic study. The linear model provides a framework to study the effects of QTL on the mean and variance of the distribution of a trait or multiple traits in a population. With the assumption of a normal distribution for both genotypic and phenotypic values of a quantitative trait or multiple traits, this analysis on the first and second order statistics is sufficient to characterize the relationship between QTL and trait (s). Otherwise, it is an approximation on the relationship. In this model, the model parameters are partitioned into two parts: one is the effects of QTL (additive, dominance and epistatic effects), and the other is frequencies and correlations (Hardy-Weinberg and linkage disequilibria) of QTL alleles. Together they characterize the genetic architecture of quantitative traits in a population.

This linear model also provides a framework for statistical inference of genetic model parameters. If QTL genotypes are known and directly observed, a regression analysis of trait phenotype on QTL genotypes would provide a direct estimation of the genetic model parameters. However, if QTL genotypes are unknown and are only indirectly observed through molecular markers, the statistical inference of QTL and model parameters becomes more complicated. Statistically, we can regard QTL genotypes as missing data with trait phenotypes and marker genotypes as observed data and use a mixture model through the maximum likelihood analysis to infer the conditional distribution of missing data and through that to infer QTL parameters which also include the number and genomic position of QTL [19, 24, 25]. The likely positions of QTL are searched in the whole genome if data permit and the number of significant QTL positions can be estimated through some model selection procedure.

On modeling QTL, the consistence of model parameters in a multiple-locus setting is an important consideration. It is important for a model to be multiple-locus consistent, and the relationship within and between loci can be clearly and readily analyzed, estimated and interpreted. Here the consistence means that the effect of a QTL is consistently defined in a reference equilibrium population for one, two or multiple loci. In statistics, this is the property of orthogonality. This property is particularly important for the study of epistasis. With that the additive, dominance and epistatic effects can be independently and consistently estimated for one, two, three or multiple loci in the reference population where the model is defined and interpreted. Thus, if the number of QTL is incorrectly identified which seems to be always the case in practice, the parameter values for those identified QTL can still be consistently estimated. However, the situation would certainly be different and complicated if the population is not at equilibrium, for example for QTL in linkage disequilibrium. Linkage disequilibrium would complicate the partition of genetic variance, and could certainly bias the estimation of parameter values for those identified QTL if the QTL model (number and genomic position of QTL) is miss-identified.

In this paper, we study extensively the composition and property of the genetic model parameters, such as genetic effects and partition of genetic variance, when both epistasis and linkage disequilibrium are considered. This would help us to understand the relationship of various genetic quantities, such as allelic frequencies and linkage disequilibrium, on the definition of genetic effects. It would also help us to understand and properly interpret estimates of the genetic effects and variance components in a QTL mapping experiment. It is important to emphasize that modeling QTL is inherently population based as it defines the variation of QTL in reference to a population, either a study population, cross population or natural population. The very basic concept of additive effect of a QTL is a population concept and is population dependent. It depends on the genotypes at other loci and depends on the genetic structure of the population (allelic frequencies, Hardy-Weinberg and linkage disequilibria).

We also clarify the connection between the general genetic model and some reduced models. By restricting the number of alleles at each locus to two and setting allelic frequencies to half, the general genetic model is reduced to the F₂ model. This simplification reduces the partition of genetic variance enormously.

Another property for this F₂ population is that, if there is no crossing-over interference, the three-locus linkage disequilibrium is expected to be zero regardlessly whether the loci are linked. Also the four-locus disequilibrium is reduced to the product of the two-locus disequilibria for the two non-adjacent locus pairs. If there is crossing-over interference, the three-locus linkage disequilibrium would be a good measure of the interference. As many QTL mapping experiments are performed in a F₂ population, this reduced model is very relevant to QTL mapping analysis for the interpretation of genetic architecture in a F₂ population. Another reduced model is the backcross model which is essentially a haploid model.

We give many details for a general two-allele model with epistasis and linkage disequilibrium. Research on QTL mapping analysis has been shifted in recent years from inbred line crosses to natural populations. With the availability of very dense SNP markers, it is now possible to use SNP for fine mapping of QTL in a natural population. Currently most QTL fine mapping studies are concentrated on candidate genes. It will be increasingly possible to have genome-wide SNP data for a sample of individuals from a natural population. The general two-allele model can be used as a framework to interpret and estimate the genome-wide genetic architecture for a quantitative trait in a natural population. The model can be extended to multiple alleles to take haplotypes into account if needed.

Appendix

A. Cockerham least squares estimates

In this appendix, we show that the regression coefficients (genetic effects) in model (7) are Cockerham least squares estimates under Hardy-Weinberg, linkage and genotypic equilibria. First, note that at each locus only one allele is present on a gamete. That is if an individual inherits an allele A_ifrom a parental gamete, $z_{M_{i}}^{(1)}$ = 1 and the other $z_{M_{j}}^{(1)}$ = 0 for j ≠ i Therefore, when i ≠ j, we have E( $z_{M_{i}}^{(1)} z_{M_{j}}^{(1)}$ ) = 0 or $E (x_{M_{i}}^{(1)} x_{M_{j}}^{(1)}) = E (z_{M_{i}}^{(1)} z_{M_{j}}^{(1)}) - E (z_{M_{i}}^{(1)}) E (z_{M_{j}}^{(1)}) = - p^{i} p^{j}$ . Using these relationships, we can show the following for model (7).

• Additive effects: For i = 1, 2, ..., n₁, we can show

$\begin{array}{l} E [(G - μ) x_{M_{i}}^{(1)}] = E [(\sum_{i'} α^{i'} x_{M_{i'}}^{(1)}) x_{M_{i}}^{(1)}] = α^{i} E (x_{M_{i}}^{(1) 2}) + \sum_{i' \neq i} α^{i'} E (x_{M_{i}^{'}}^{(1)} x_{M_{i}}^{(1)}) \\ = α^{i} p^{i} (1 - p^{i}) + \sum_{i' \neq i} α^{i'} (- p^{i'} p^{i}) = α^{i} p^{i} + (- p^{i}) \sum_{i'} α^{i'} p^{i'} = α^{i} p^{i}, \end{array}$

On the other hand,

$\begin{array}{l} E [(G - μ) x_{M_{i}}^{(1)}] = E (G x_{M_{i}}^{(1)}) = E [G (z_{M_{i}}^{(1)} - p^{i})] = E (G z_{M_{i}}^{(1)}) - p^{i} E (G) \\ = E [E (G z_{M_{i}}^{(1)} | z_{M_{i}}^{(1)})] - p^{i} G_{..}^{..} = p^{i} E (G | z_{M_{i}}^{(1)} = 1) - p^{i} G_{..}^{..} = p^{i} (G_{..}^{i .} - G_{..}^{..}) . \end{array}$

Therefore, $α^{i} = G_{..}^{i .} - G_{..}^{..}$ for i = 1, 2, ..., n₁. Similarly, we can show that $α_{j} = G_{j .}^{..} - G_{..}^{..}$ for j = 1, 2, ..., n₁, and $β^{k} = G_{..}^{. k} - G_{..}^{..}$ , $β_{l} = G_{. l}^{..} - G_{..}^{..}$ for k, l = 1, 2, ..., n₂.

• Dominance effects: For locus 1, we have

$\begin{array}{l} E [(G - μ) x_{M_{i}}^{(1)} x_{F_{j}}^{(1)}] = E [(\sum_{i', j'} δ_{j'}^{i'} x_{M_{i'}}^{(1)} x_{F_{j'}}^{(1)}) x_{M_{i}}^{(1)} x_{F_{j}}^{(1)}] \\ = \sum_{j'} δ_{j'}^{i} E (x_{M_{i}}^{(1) 2}) E (x_{F_{j}}^{(1)} x_{F_{j'}}^{(1)}) + \sum_{i' \neq i} \sum_{j'} δ_{j'}^{i'} E (x_{M_{i}}^{(1)} x_{M_{i'}}^{(1)}) E (x_{F_{j}}^{(1)} x_{F_{j'}}^{(1)}) \\ = p^{i} (1 - p^{i}) \sum_{j'} δ_{j'}^{i} E (x_{F_{j}}^{(1)} x_{F_{j'}}^{(1)}) + \sum_{i' \neq i} (- p^{i} p^{i'}) \sum_{j'} δ_{j'}^{i'} E (x_{F_{j}}^{(1)} x_{F_{j'}}^{(1)}) \\ = p^{i} \sum_{j'} δ_{j'}^{i} E (x_{F_{j}}^{(1)} x_{F_{j}'}^{(1)}) - \sum_{i'} p^{i} p^{i'} \sum_{j'} δ_{j'}^{i} E (x_{F_{j}}^{(1)} x_{F_{j'}}^{(1)}) \\ = p^{i} (p_{j} δ_{j}^{i} + \sum_{j'} δ_{j'}^{i} p_{j'}) - p^{i} \sum_{j'} E (x_{F_{j}}^{(1)} x_{F_{j'}}^{(1)}) \sum_{i'} p^{i'} δ_{j'}^{i'} \\ = p^{i} p_{j} δ_{j}^{i} for i, j = 1, 2, \dots, n_{1} . \end{array}$

On the other hand,

$\begin{array}{l} E [(G - μ) x_{M_{i}}^{(1)} x_{F_{j}}^{(1)}] = E [G (z_{M_{i}}^{(1)} - p^{i}) (z_{F_{j}}^{(1)} - p_{j})] \\ = E (G z_{M_{i}}^{(1)} z_{F_{j}}^{(1)} - p^{i} E (G z_{F_{j}}^{(1)}) - p_{j} E (G z_{M_{i}}^{(1)}) + p^{i} p_{j} E (G) \\ = E [E (G z_{M_{i}}^{(1)} z_{F_{j}}^{(1)} | z_{M_{i}}^{(1)}, z_{F_{j}}^{(1)})] - p^{i} E [E (G z_{F_{j}}^{(1)} | z_{F_{j}}^{(1)}]) - p_{j} E [E (G z_{M_{i}}^{(1)} | z_{M_{i}}^{(1)})] + p^{i} p_{j} G_{..}^{..} \\ = p^{i} p_{j} (G_{j .}^{i .} - G_{..}^{i .} - G_{j .}^{..} + G_{..}^{..}) \end{array}$

Therefore, $δ_{j}^{i} = G_{j .}^{i .} - G_{..}^{i .} - G_{j .}^{..} + G_{..}^{..}$ for i, j = 1, 2, ..., n₁. Similar results can be derived for other dominance terms at locus 2.

• Additive × additive effects: Note that

$\begin{array}{l} E [(G - μ) x_{M_{i}}^{(1)} x_{M_{k}}^{(2)}] = \sum_{i^{'}, k^{'}} (α^{i^{'}} β^{k^{'}}) E (x_{M_{i}}^{(1)} x_{M_{i^{'}}}^{(1)}) E (x_{M_{k}}^{(2)} x_{M_{k^{'}}}^{(2)}) \\ = \sum_{k^{'}} (α^{i} β^{k^{'}}) E (x_{M_{i}}^{(1) 2}) E (x_{M_{k}}^{(2)} x_{M_{k^{'}}}^{(2)}) + \sum_{i^{'} \neq i} \sum_{k^{'}} (α^{i^{'}} β^{k^{'}}) (- p^{i} p^{i^{'}}) E (x_{M_{k}}^{(2)} x_{M_{k^{'}}}^{(2)}) \\ = p^{i} \sum_{k^{'}} (α^{i} β^{k^{'}}) E (x_{M_{k}}^{(2)} x_{M_{k^{'}}}^{(2)}) - p^{i} \sum_{i^{'}} \sum_{k^{'}} (α^{i^{'}} β^{k^{'}}) p^{i^{'}} E (x_{M_{k}}^{(2)} x_{M_{k^{'}}}^{(2)}) \\ = p^{i} q^{k} (α^{i} β^{k^{'}}) \end{array}$

and

$E [(G - μ) x_{M_{i}}^{(1)} x_{M_{k}}^{(2)}] = E (G x_{M_{i}}^{(1)} x_{M_{k}}^{(2)}) = E [G (z_{M_{i}}^{(1)} - p^{i}) (z_{M_{k}}^{(2)} - q^{k})] = p^{i} q^{k} (G_{..}^{i k} - G_{..}^{i .} - G_{..}^{. k} + G_{..}^{..})$

We have $(α^{i} β^{k}) = G_{..}^{i k} - G_{..}^{i .} - G_{..}^{. k} + G_{..}^{..}$ for i = 1, 2, ..., n₁ and k = 1, 2, ..., n₂. Similar results can be derived for other additive by additive terms.

• Additive × dominance effects: Note that

$\begin{array}{l} E [(G - μ) x_{M_{i}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)}] = E [x_{M_{i}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)} \sum_{i^{'}, k^{'}, l^{'}} (α^{i^{'}} γ_{l^{'}}^{k^{'}}) x_{M_{i^{'}}}^{(1)} x_{M_{k^{'}}}^{(2)} x_{F_{l^{'}}}^{(2)}] \\ = \sum_{k^{'}, l^{'}} (α^{i} γ_{l^{'}}^{k^{'}}) E (x_{M_{i}}^{(1) 2}) E (x_{M_{k}}^{(2)} x_{M_{k^{'}}}^{(2)}) E (x_{F_{l}}^{(2)} x_{F_{l^{'}}}^{(2)}) = p^{i} \sum_{k^{'}, l^{'}} (α^{i} γ_{l^{'}}^{k^{'}}) E (x_{M_{k}}^{(2)} x_{M_{k^{'}}}^{(2)}) E (x_{F_{l}}^{(2)} x_{F_{l^{'}}}^{(2)}) = p^{i} q^{k} q_{l} (α^{i} γ_{l}^{k}) \end{array}$

and

$\begin{array}{l} E [(G - μ) x_{M_{i}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)}] = E [G (z_{M_{i}}^{(1)} - p^{i}) (z_{M_{k}}^{(2)} - q^{k}) (z_{F_{l}}^{(2)} - q_{l})] \\ = p^{i} q^{k} q_{l} (G_{. l}^{i k} - G_{..}^{i k} - G_{. l}^{i .} - G_{. l}^{. k} + G_{..}^{i .} + G_{..}^{. k} + G_{. l}^{..} - G_{..}^{..}) \end{array}$

For i = 1, 2, ..., n₁ and k, l = 1, 2, ..., n₂, we have

$(α^{i} γ_{l}^{k}) = G_{. l}^{i k} - G_{..}^{i k} - G_{. l}^{i .} - G_{. l}^{. k} + G_{..}^{i .} + G_{..}^{. k} + G_{. l}^{..} - G_{..}^{..}$

Similar results can be derived for dominance by additive terms.

• Dominance × dominance effects: Note that

$\begin{array}{l} E [(G - μ) x_{M_{i}}^{(1)} x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)}] = E [x_{M_{i}}^{(1)} x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)} \sum_{i^{'}, j^{'}, k^{'}, l^{'}} (δ_{j^{'}}^{i^{'}} γ_{l^{'}}^{k^{'}}) x_{M_{i^{'}}}^{(1)} x_{F_{j^{'}}}^{(1)} x_{M_{k^{'}}}^{(2)} x_{F_{l^{'}}}^{(2)}] \\ = \sum_{i^{'}, j^{'}} E (x_{M_{i}}^{(1)} x_{M_{i^{'}}}^{(1)}) E (x_{F_{j}}^{(1)} x_{F_{j^{'}}}^{(1)}) \sum_{k^{'}, l^{'}} (δ_{j^{'}}^{i^{'}} γ_{l^{'}}^{k^{'}}) E (x_{M_{k}}^{(2)} x_{M_{k^{'}}}^{(2)}) E (x_{F_{l}}^{(2)} x_{F_{l^{'}}}^{(2)}) \\ = \sum_{i^{'}, j^{'}} E (x_{M_{i}}^{(1)} x_{M_{i^{'}}}^{(1)}) E (x_{F_{j}}^{(1)} x_{F_{j^{'}}}^{(1)}) (δ_{j^{'}}^{i^{'}} γ_{l}^{k}) q^{k} q_{l} = (δ_{j}^{i} γ_{l}^{k}) p^{i} p_{j} q^{k} q_{l} \end{array}$

On the other hand

$\begin{array}{l} E [(G - μ) x_{M_{i}}^{(1)} x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)}] = E [G (z_{M_{i}}^{(1)} - p^{i}) (x_{F_{j}}^{(1)} - p_{j}) (z_{M_{k}}^{(2)} - q^{k}) (z_{F_{l}}^{(2)} - q_{l})] \\ = p^{i} p_{j} q^{k} q_{l} (G_{j l}^{i k} - G_{j .}^{i k} - G_{. l}^{i k} - G_{j l}^{i .} - G_{j l}^{. k} + G_{..}^{i k} + G_{j l}^{..} \\ + G_{j .}^{i .} + G_{. l}^{. k} + G_{j .}^{. k} + G_{. l}^{i .} - G_{..}^{i .} - G_{..}^{. k} - G_{j .}^{..} - G_{. l}^{..} + G_{..}^{..}) \end{array}$

For i, j = 1, 2, ..., n₁ and k, l = 1, 2, ..., n₂, we have

$(δ_{j}^{i} γ_{l}^{k}) = G_{j l}^{i k} - G_{j .}^{i k} - G_{. l}^{i k} - G_{j l}^{i .} - G_{j l}^{. k} + G_{..}^{i k} + G_{j l}^{..} + G_{j .}^{i .} + G_{. l}^{. k} + G_{j .}^{. k} + G_{. l}^{i .} - G_{..}^{i .} - G_{..}^{. k} - G_{j .}^{..} - G_{. l}^{..} + G_{..}^{..}$

B. Partition of genotypic variance in linkage equilibrium

Here we show that the genotypic variance V_Gof model (7) has the orthogonal partition (4) under Hardy-Weinberg, linkage and genotypic equilibria. First, note that the index variables $x_{M_{i}}^{(1)}$ , $x_{F_{j}}^{(1)}$ , $x_{M_{k}}^{(2)}$ , $x_{F_{l}}^{(2)}$ have expectation zero. Second, the assumption of Hardy-Weinberg, linkage and genotypic equilibria mean that all alleles in different gametes and loci are independent so that, for example,

$C o v (x_{M_{i}}^{(r)}, x_{F_{j}}^{(r)}) = E (x_{M_{i}}^{(r)} x_{F_{j}}^{(r)}) = E (x_{M_{i}}^{(r)}) E (x_{F_{j}}^{(r)}) = 0$

Thus the additive and dominance effects within a locus are orthogonal to each other because

$C o v (x_{M_{k}}^{(r)}, x_{M_{i}}^{(r)}, x_{F_{j}}^{(r)}) = E (x_{M_{k}}^{(r)} x_{M_{i}}^{(r)} x_{F_{j}}^{(r)}) = E (x_{M_{k}}^{(r)}) E (x_{M_{i}}^{(r)}) E (x_{F_{j}}^{(r)}) = 0$

for any 1 ≤ i, j, k ≤ n₁ and locus r = 1 or 2. Similarly, the epistatic effects between loci are orthogonal to additive and dominance effects and also to other epistatic effects. Therefore, the total genotypic variance V_Gcan be partitioned as

$V_{G} = V_{A_{1}} + V_{A_{2}} + V_{D_{1}} + V_{D_{2}} + V_{A_{1} A_{2}} + V_{A_{1} D_{2}} + V_{D_{1} A_{2}} + V_{D_{1} D_{2}}$

with each component analyzed below.

• The additive variance: For locus 1,

as $\sum_{i = 1}^{n_{1}} α^{i} p^{i} = 0$ by the constrain condition (2). Similarly, for locus 2, we have $V_{A_{2}} = \sum_{k = 1}^{n_{2}} {(β^{k})}^{2} q^{k} + \sum_{j = 1}^{n_{2}} {(β_{l})}^{2} q_{l}$ .

• The dominance variance: For locus 1,

Similarly, for locus 2, we have $V_{D_{2}} = \sum_{k, l} {(γ_{l}^{k})}^{2} q^{k} q_{l}$ .

• The additive × additive variance:

$\begin{matrix} V_{A_{1} A_{2}} = V a r (\sum_{i, k} (α^{i} β^{k}) x_{M_{i}}^{(1)} x_{M_{k}}^{(2)}) + V a r (\sum_{i, l} (α^{i} β_{l}) x_{M_{i}}^{(1)} x_{F_{l}}^{(2)}) \\ + V a r (\sum_{j, k} (α_{j} β^{k}) x_{F_{j}}^{(1)} x_{M_{k}}^{(2)}) + V a r (\sum_{j, l} (α_{j} β_{l}) x_{F_{j}}^{(1)} x_{F_{l}}^{(2)}) \\ = \sum_{i, k} p^{i} q^{k} {(α^{i} β^{k})}^{2} + \sum_{j, l} p_{j} q_{l} {(α_{j} β_{l})}^{2} + \sum_{i, l} p^{i} q_{l} {(α^{i} β_{l})}^{2} + \sum_{j, k} p_{j} q^{k} {(α_{j} β^{k})}^{2} \end{matrix}$

• The additive × dominance variance:

$\begin{matrix} V_{A_{1} D_{2}} = V a r (\sum_{i, k, l} (α^{i} γ_{l}^{k}) x_{M_{i}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)}) + V a r (\sum_{j, k, l} (α_{j} γ_{l}^{k}) x_{F_{j}}^{(1)} x_{M_{k}}^{(2)} x_{F_{l}}^{(2)}) \\ = \sum_{i, k, l} p^{i} q^{k} q_{l} {(α^{i} γ_{l}^{k})}^{2} + \sum_{j, k, l} p_{j} q^{k} q_{l} {(α_{j} γ_{l}^{k})}^{2} \end{matrix}$

Similarly, the dominance × additive variance is $V_{D_{1} A_{2}} = \sum_{i, j, k} p^{i} p_{j} q^{k} {(δ_{j}^{i} β^{k})}^{2} + \sum_{i, j, l} p^{i} p_{j} q_{l} {(δ_{j}^{i} β_{l})}^{2}$ .

• The dominance × dominance variance:

C. Partition of genotypic variance in linkage disequilibrium

We present the partition of the genotypic variance based on model (7) under the assumption of Hardy-Weinberg equilibrium but linkage disequilibrium.

• The additive variance:

$V_{A_{1}} = \sum_{i = 1}^{n_{1}} p^{i} {(α^{i})}^{2} + \sum_{j = 1}^{n_{1}} p_{j} {(α_{j})}^{2}, V_{A_{2}} = \sum_{k = 1}^{n_{2}} q^{k} {(β^{k})}^{2} + \sum_{j = 1}^{n_{2}} q_{l} {(β_{l})}^{2}$

• The dominance variance:

$V_{D_{1}} = \sum_{i, j} {(δ_{j}^{i})}^{2} p^{i} p_{j}, V_{D_{2}} = \sum_{k, l} {(γ_{l}^{k})}^{2} q^{k} q_{l}$

• The additive × additive variance:

$\begin{array}{l} V_{A_{1} A_{2}} = & \sum_{i, k} (p^{i} q^{k} + D_{..}^{i k}) {(α^{i} β^{k})}^{2} - [\sum_{i, k} (α^{i} β^{k}) D_{..}^{i k}]^{2} + \sum_{j, l} (p_{j} q_{l} + D_{j l}^{..}) {(α_{j} β_{l})}^{2} \\ - [\sum_{j, l} (α_{j} β_{l}) D_{j l}^{..}]^{2} + \sum_{i, l} p^{i} q_{l} {(α^{i} β_{l})}^{2} + \sum_{j, k} p_{j} q^{k} {(α_{j} β^{k})}^{2} + 2 \sum_{i, j, k, l} (α^{i} β^{k}) (α_{j} β_{l}) D_{j l}^{..} D_{..}^{i k} \\ + 2 \sum_{i, j, k, l} (α^{i} β_{l}) (α_{j} β^{k}) D_{j l}^{..} D_{..}^{i k} + 2 (\sum_{i, k} (α^{i} β^{k}) D_{..}^{i k}) (\sum_{j, l} (α_{j} β_{l}) D_{j l}^{..}) \end{array}$

• The additive × dominance variance:

$V_{A_{1} D_{2}} = \sum_{i, k, l} q_{l} (p^{i} q^{k} + D_{..}^{i k}) {(α^{i} γ_{l}^{k})}^{2} + \sum_{j, k, l} q^{k} (p_{j} q_{l} + D_{j l}^{..}) {(α_{j} γ_{l}^{k})}^{2} + 2 \sum_{i, j, k, l} (α^{i} γ_{l}^{k}) (α_{j} γ_{l}^{k}) D_{j l}^{..} D_{..}^{i k}$

• The dominance × additive variance:

$V_{D_{1} A_{2}} = \sum_{i, j, k} p_{j} (p^{i} q^{k} + D_{..}^{i k}) {(δ_{j}^{i} β^{k})}^{2} + \sum_{i, j, l} p^{i} (p_{j} q_{l} + D_{j l}^{..}) {(δ_{j}^{i} β_{l})}^{2} + 2 \sum_{i, j, k, l} (δ_{j}^{i} β^{k}) (δ_{j}^{i} β_{l}) D_{j l}^{..} D_{..}^{i k}$

• The dominance × dominance variance:

$V_{D_{1} D_{2}} = \sum_{i, j, k, l} {(δ_{j}^{i} γ_{l}^{k})}^{2} (D_{..}^{i k} + p^{i} q^{k}) (D_{j l}^{..} + p_{j} q_{l})$

• The covariances related to additive and dominance effects:

$\begin{array}{l} C o v (A_{1}, A_{2}) = \sum_{i, k} (α^{i} β^{k}) D_{..}^{i k} + \sum_{j, l} (α_{j} β_{l}) D_{j l}^{..} \\ C o v (D_{1}, D_{2}) = \sum_{i, j, k, l} (δ_{j}^{i}) (γ_{l}^{k}) D_{j l}^{..} D_{..}^{i k} \\ C o v (A_{1}, D_{1}) = C o v (A_{1}, D_{2}) = C o v (A_{2}, D_{1}) = C o v (A_{2}, D_{2}) = 0 \end{array}$

• The covariances related to additive × additive effects:

$\begin{array}{l} C o v (A_{1}, A_{1} A_{2}) = \sum_{i, k} (α^{i}) (α^{i} β^{k}) D_{..}^{i k} + \sum_{j, l} (α_{j}) (α_{j} β_{l}) D_{j l}^{..} \\ C o v (A_{2}, A_{1} A_{2}) = \sum_{i, k} (β^{k}) (α^{i} β^{k}) D_{..}^{i k} + \sum_{j, l} (β_{l}) (α_{j} β_{l}) D_{j l}^{..} \\ C o v (D_{1}, A_{1} A_{2}) = \sum_{i, j, k} p_{j} (δ_{j}^{i}) (α_{j} β^{k}) D_{..}^{i k} + \sum_{i, j, l} p^{i} (δ_{j}^{i}) (α^{i} β_{l}) D_{j l}^{..} \\ C o v (D_{2}, A_{1} A_{2}) = \sum_{i, k, l} q_{l} (α^{i} β_{l}) (γ_{l}^{k}) D_{..}^{i k} + \sum_{j, k, l} q^{k} (α_{j} β^{k}) (γ_{l}^{k}) D_{j l}^{..} \end{array}$

The covariances related to additive × dominance and dominance × additive effects:

$\begin{matrix} C o v (A_{1}, A_{1} D_{2}) & = & \sum_{i, j, k, l} [(α^{i}) (α_{j} γ_{l}^{k}) + (α_{j}) (α^{i} γ_{l}^{k})] D_{j l}^{..} D_{..}^{i k} \\ C o v (A_{2}, A_{1} D_{2}) & = & \sum_{i, k, l} q_{l} (β_{l}) (α^{i} γ_{l}^{k}) D_{..}^{i k} + \sum_{j, k, l} q^{k} (β^{k}) (α_{j} γ_{l}^{k}) D_{j l}^{..} \\ C o v (A_{1}, D_{1} A_{2}) & = & \sum_{i, j, k} p_{j} (α_{j}) (δ_{j}^{i} β^{k}) D_{..}^{i k} + \sum_{i, j, l} p^{i} (α^{i}) (δ_{j}^{i} β_{l}) D_{j l}^{..} \\ C o v (A_{2}, D_{1} A_{2}) & = & \sum_{i, j, k, l} [(β^{k}) (δ_{j}^{i} β_{l}) + (β_{l}) (δ_{j}^{i} β^{k})] D_{j l}^{..} D_{..}^{i k} \\ C o v (D_{1}, A_{1} D_{2}) & = & \sum_{i, j, k, l} (δ_{j}^{i}) [(α^{i} γ_{l}^{k}) + (α_{j} γ_{l}^{k})] D_{j l}^{..} D_{..}^{i k} \\ C o v (D_{2}, A_{1} D_{2}) & = & \sum_{i, k, l} q_{l} (γ_{l}^{k}) (α^{i} γ_{l}^{k}) D_{..}^{i k} + \sum_{j, k, l} q^{k} (γ_{l}^{k}) (α_{j} γ_{l}^{k}) D_{j l}^{..} \\ C o v (D_{1}, D_{1} A_{2}) & = & \sum_{i, j, k} p_{j} (δ_{j}^{i}) (δ_{j}^{i} β^{k}) D_{..}^{i k} + \sum_{i, j, l} p^{i} (δ_{j}^{i}) (δ_{j}^{i} β_{l}) D_{j l}^{..} \\ C o v (D_{2}, D_{1} A_{2}) & = & \sum_{i, j, k, l} (γ_{l}^{k}) [(δ_{j}^{i} β^{k}) + (δ_{j}^{i} β_{l})] D_{j l}^{..} D_{..}^{i k} \\ C o v (A_{1} A_{2}, A_{1} D_{2}) & = & \sum_{i, k, l} q_{l} (α^{i} β_{l}) (α^{i} γ_{l}^{k}) D_{..}^{i k} + \sum_{j, k, l} q^{k} (α_{j} β^{k}) (α_{j} γ_{l}^{k}) D_{j l}^{..} \\ + \sum_{i, j, k, l} [(α^{i} β^{k}) (α_{j} γ_{l}^{k}) + (α^{i} β_{l}) (α_{j} γ_{l}^{k}) + (α_{j} β^{k}) (α^{i} γ_{l}^{k}) + (α_{j} β_{l}) (α^{i} γ_{l}^{k})] D_{j l}^{..} D_{..}^{i k} \\ Cov (A_{1}, A_{2}, D_{1} A_{2}) & = & \sum_{i, j, k} p_{j} (α_{j} β^{k}) (δ_{j}^{i} β^{k}) D_{..}^{i k} + \sum_{i, j, l} p^{i} (α^{i} β_{l}) (δ_{j}^{i} β_{l}) D_{j l}^{..} \\ + \sum_{i, j, k, l} [(α^{i} β^{k}) (δ_{j}^{i} β l) + (α_{j} β^{k}) (δ_{j}^{i} β_{l}) + (α^{i} β_{l}) (δ_{j}^{i} β^{k}) + (α_{j} β_{l}) (δ_{j}^{i} β^{k})] D_{j l}^{..} D_{..}^{i k} \\ C o v (A_{1} D_{2}, D_{1} A_{2}) & = & \sum_{i, j, k, l} [(α^{i} γ_{l}^{k}) (δ_{j}^{i} β^{k}) (D_{..}^{i k} + p^{i} q^{k}) D_{j l}^{..} + (α^{i} γ_{l}^{k}) (δ_{j}^{i} β_{l}) D_{j l}^{..} D_{..}^{i k} \\ + (α_{j} γ_{l}^{k}) (δ_{j}^{i} β^{k}) D_{j l}^{..} D_{..}^{i k} + (α_{j} γ_{l}^{k}) (δ_{j}^{i} β_{l}) (D_{j l}^{..} + p_{j} q_{l}) D_{..}^{i k}] \end{matrix}$

• The covariances related to dominance × dominance effects:

$\begin{matrix} C o v (A_{1}, D_{1} D_{2}) = \sum_{i, j, k, l} (δ_{j}^{i} γ_{l}^{k}) (α^{i} + α_{j}) D_{j l}^{..} D_{..}^{i k} \\ C o v (A_{2}, D_{1} D_{2}) = \sum_{i, j, k, l} (δ_{j}^{i} γ_{l}^{k}) (β^{k} + β_{l}) D_{j l}^{..} D_{..}^{i k} \\ C o v (D_{1}, D_{1} D_{2}) = \sum_{i, j, k, l} (δ_{j}^{i} γ_{l}^{k}) (δ_{j}^{i}) D_{j l}^{..} D_{..}^{i k} \\ C o v (D_{2}, D_{1} D_{2}) = \sum_{i, j, k, l} (δ_{j}^{i} γ_{l}^{k}) (γ_{l}^{k}) D_{j l}^{..} D_{..}^{i k} \\ C o v (A_{1} A_{2}, D_{1} D_{2}) = - (\sum_{i, k} (α^{i} β^{k}) D_{..}^{i k} + \sum_{j, l} (α_{j} β_{l}) D_{j l}^{..}) (\sum_{i, j, k, l} (δ_{j}^{i} γ_{l}^{k}) D_{j l}^{..} D_{..}^{i k}) + \sum_{i, j, k, l} (δ_{j}^{i} γ_{l}^{k}) {[(α^{i} β^{k}) \\ + (α^{i} β_{l}) + (α_{j} β^{k}) + (α_{j} β_{l})] D_{j l}^{..} D_{..}^{i k} + (α^{i} β^{k}) p^{i} q^{k} D_{j l}^{..} + (α_{j} β_{l}) p_{j} q_{l} D_{..}^{i k}} \\ C o v (A_{1} D_{2}, D_{1} D_{2}) = \sum_{i, j, k, l} (δ_{j}^{i} γ_{l}^{k}) [(α^{i} γ_{l}^{k}) (D_{..}^{i k} + p^{i} q^{k}) D_{j l}^{..} + (α_{j} γ_{l}^{k}) (D_{j l}^{..} + p_{j} q_{l}) D_{..}^{i k}] \\ C o v (D_{1} A_{2}, D_{1} D_{2}) = \sum_{i, j, k, l} (δ_{j}^{i} γ_{l}^{k}) [(δ_{j}^{i} β^{k}) (D_{..}^{i k} + p^{i} q^{k}) D_{j l}^{..} + (δ_{j}^{i} β_{l}) (D_{j l}^{..} + p_{j} q_{l}) D_{..}^{i k}] \end{matrix}$

D. Partition of genotypic variance in F₂ population with linkage disequilibrium

Detail of each component of equation (19) for F₂ population is presented here.

• The additive variance:

$V_{A} = \frac{1}{2} \sum_{r = 1}^{m} a_{r}^{2} + 2 \sum_{r \neq s} a_{r} a_{s} D_{r s}$

• The dominance variance:

$V_{D} = \frac{1}{4} \sum_{r = 1}^{m} d_{r}^{2} + 4 \sum_{r \neq s} d_{r} d_{s} D_{r s}^{2}$

• The additive × additive variance:

$\begin{matrix} V_{A A} = \frac{1}{4} \sum_{r < s} {(a a)}_{r s}^{2} + \sum_{r \neq s \neq s^{'}} {(a a)}_{r s} {(a a)}_{r s^{'}} D_{s s^{'}} \\ + \frac{1}{2} \sum_{r \neq s \neq r^{'} \neq s^{'}} {(a a)}_{r s} {(a a)}_{r^{'} s^{'}} (D_{r r^{'} s s^{'}} + D_{r s^{'}} D_{r^{'} s} + D_{r r^{'}} D_{s s^{'}} - D_{r s} D_{r^{'} s^{'}}) \end{matrix}$

• The additive × dominance variance:

$\begin{matrix} V_{A D} = \frac{1}{8} \sum_{r \neq s} {(a d)}_{r s}^{2} + \frac{1}{2} \sum_{r \neq s} {(a d)}_{r s} {(a d)}_{s r} D_{r s} + 2 \sum_{r \neq s \neq s^{'}} {(a d)}_{r s} [{(a d)}_{r s^{'}} D_{s s^{'}}^{2} + {(a d)}_{s s^{'}} D_{s s^{'}} D_{r s^{'}} \\ + {(a d)}_{s^{'} r} D_{r s} D_{s s^{'}} + \frac{1}{4} {(a d)}_{s^{'} s} D_{r s^{'}}] + 8 \sum_{r \neq s \neq r^{'} \neq s^{'}} {(a d)}_{r s} {(a d)}_{r^{'} s^{'}} D_{r s r^{'} s^{'}} D_{s s^{'}} \end{matrix}$

• The dominance × dominance variance:

$\begin{array}{l} V_{D D} = \sum_{r < s} {(d d)}_{r s}^{2} (\frac{1}{16} - 16 D_{r s}^{4}) + \sum_{r \neq s \neq s'} {(d d)}_{r s} {(d d)}_{r s'} (D_{s s'}^{2} - 16 D_{r s}^{2} D_{r s'}^{2}) \\ + 4 \sum_{r \neq s \neq r' \neq s'} {(d d)}_{r s} {(d d)}_{r' s'} (D_{r s r' s'}^{2} - D_{r s}^{2} D_{r' s'}^{2}) \end{array}$

• The covariances: Cov(A, D) = Cov(A, AA) = Cov(A, DD) = Cov(D, AD) = Cov(D, DD) = Cov(AA, AD) = Cov(AD, DD) = 0, and

$\begin{matrix} Cov (A, A D) = - \sum_{r \neq s} a_{r} D_{r s} [4 {(a d)}_{r s} D_{r s} + {(a d)}_{s r}] - 4 \sum_{r \neq s \neq r'} a_{r'} {(a d)}_{r s} D_{r s} D_{r' s} \\ Cov (D, A A) = - \sum_{r < s} (d_{r} + d_{s}) {(a a)}_{r s} D_{r s} - 4 \sum_{r \neq s \neq r'} d_{r'} {(a a)}_{r s} D_{r r'} D_{r' s} \\ Cov (A A, D D) = \sum_{r < s} {(a a)}_{r s} {(d d)}_{r s} (\frac{1}{2} D_{r s} - 8 D_{r s}^{3}) + \sum_{r \neq s \neq s'} {(a a)}_{r s} {(d d)}_{r s'} (2 D_{r s'} D_{s s'} - 8 D_{r s} D_{r s'}^{2}) \\ + 2 \sum_{r \neq s \neq r' \neq s'} {(a a)}_{r s} (d d)_{r' s'} (D_{r s r' s'} D_{r' s'} - D_{r s} D_{r' s'}^{2}) \end{matrix}$

where (aa)_sr= (aa)_rsand (dd)_sr= (dd)_rsfor r <s.

In this presentation, we utilized the assumption of no crossing-over interference which results in the third order linkage disequilibrium of three loci being zero, i.e. D_rss'= 0. It may be instructive to show this result.

$\begin{matrix} D_{r s t} = E (x_{r} x_{s} x_{t}) = E ((z_{r} - p_{r}) (z_{s} - p_{s}) (z_{t} - p_{t})) \\ = E (z_{r} z_{s} z_{t} - p_{r} z_{s} z_{t} - z_{r} p_{s} z_{t} - z_{r} z_{s} p_{t} + p_{r} p_{s} z_{t} + p_{r} z_{s} p_{t} + z_{r} p_{s} p_{t} - p_{r} p_{s} p_{t}) \\ = E (z_{r} z_{s} z_{t} - \frac{1}{2} z_{s} z_{t} - \frac{1}{2} z_{r} z_{t} - \frac{1}{2} z_{r} z_{s} + \frac{1}{4}) \end{matrix}$

Let r₁ be the recombination frequency between loci r and s, r₂ be that between s and t, and r₁₂ be that between r and t. Under the assumption of no crossing-over interference, for loci r, s, t in this order, we have E(z_rz_sz_t) = $\frac{1}{2}$ (1 - r₁)(1 - r₂), E(z_rz_s) = $\frac{1}{2}$ (1 - r₁), E(z_sz_t) = $\frac{1}{2}$ (1 - r₂), and E(z_rz_t) = $\frac{1}{2}$ (1 - r₁₂). Since the assumption of no crossing-over interferene implies r₁₂ = r₁ + r₂ - 2r₁r₂, thus D_rst= 0.

E. Partition of genotypic variance for the general two-allele model with linkage disequilibrium

Detail of each component of equation (19) for the general two-allele model is presented here.

• The additive variance:

$V_{A} = 2 \sum_{r = 1}^{m} p_{r} (1 - p_{r}) a_{r}^{2} + 2 \sum_{r \neq s} a_{r} a_{s} D_{r s}$

• The dominance variance:

$V_{D} = \sum_{r = 1}^{m} p_{r}^{2} {(1 - p_{r})}^{2} d_{r}^{2} + \sum_{r \neq s} d_{r} d_{s} D_{r s}^{2}$

• The additive × additive variance:

$\begin{matrix} V_{A A} = 2 \sum_{r < s} {(a a)}_{r s}^{2} [(1 - 2 p_{r}) (1 - 2 p_{s}) D_{r s} + 2 p_{r} (1 - p_{r}) p_{s} (1 - p_{s})] \\ + 2 \sum_{r \neq s \neq s'} {(a a)}_{r s} {(a a)}_{r s'} [(1 - 2 p_{r}) D_{r s s'} + 2 p_{r} (1 - p_{r}) D_{s s'}] \\ + \frac{1}{2} \sum_{r \neq s \neq r' \neq s'} {(a a)}_{r s} {(a a)}_{r' s'} (D_{r r' s s'} + D_{r s'} D_{r' s} + D_{r r'} D_{s s'} - D_{r s} D_{r' s'}) \end{matrix}$

The additive × dominance variance:

$\begin{array}{l} V_{A D} = 2 \sum_{r \neq s} {(a d)}_{r s}^{2} [{(1 - 2 p_{s})}^{2} D_{r s}^{2} + (1 - 2 p_{r}) (1 - 2 p_{s}) p_{s} (1 - p_{s}) D_{r s} + p_{r} (1 - p_{r}) p_{s}^{2} {(1 - p_{s})}^{2}] \\ + 2 \sum_{r \neq s} {(a d)}_{r s} {(a d)}_{s r} D_{r s} [2 (1 - 2 p_{r}) (1 - 2 p_{s}) D_{r s} + p_{r} (1 - p_{r}) p_{s} (1 - p_{s})] \\ + 2 \sum_{r \neq s \neq s'} {(a d)}_{r s} {{(a d)}_{r s'} [D_{r s s'}^{2} + (1 - 2 p_{r}) D_{r s s'} D_{s s'} + p_{r} (1 - p_{r}) D_{s s'}^{2}] \\ + (a d)_{s s'} [2 (1 - 2 p_{s}) D_{r s s'} D_{s s'} + p_{s} (1 - p_{s}) D_{s s'} D_{r s'}] \\ + (a d)_{s' r} [2 (1 - 2 p_{r}) D_{r s s'} D_{r s} + p_{r} (1 - p_{r}) D_{r s} D_{s s'}] \\ + {(a d)}_{s' s} [{(1 - 2 p_{s})}^{2} D_{r s} D_{s s'} + (1 - 2 p_{s}) p_{s} (1 - p_{s}) D_{r s s'} + p_{s}^{2} {(1 - p_{s})}^{2} D_{r s'}]} \\ + 2 \sum_{r \neq s \neq r' \neq s'} {(a d)}_{r s} {(a d)}_{r' s'} (D_{r s r' s'} D_{s s'} + D_{r s s'} D_{s r' s'}) \end{array}$

• The dominance × dominance variance:

$\begin{array}{l} V_{D D} & = & \sum_{r < s} {(d d)}_{r s}^{2} {{[(1 - 2 p_{r}) (1 - 2 p_{s}) D_{r s} + p_{r} (1 - p_{r}) p_{s} (1 - p_{s})]}^{2} - D_{r s}^{4}} \\ + \sum_{r \neq s \neq s'} {(d d)}_{r s} {(d d)}_{r s'} {{[(1 - 2 p_{r}) D_{r s s'} + p_{r} (1 - p_{r}) D_{s s'}]}^{2} - D_{r s}^{2} D_{r s'}^{2}} \\ + \frac{1}{4} \sum_{r \neq s \neq r' \neq s'} {(d d)}_{r s} {(d d)}_{r' s'} (D_{r s r' s'}^{2} - D_{r s}^{2} D_{r' s'}^{2}) \end{array}$

• The covariances:

$\begin{array}{l} Cov (A, D) & = & 0 \\ Cov (A, A A) & = & 2 \sum_{r < s} {(a a)}_{r s} D_{r s} [(1 - 2 p_{r}) a_{r} + (1 - 2 p_{s}) a_{s}] + \sum_{r \neq s \neq r'} a_{r'} {(a a)}_{r s} D_{r' r s} \\ Cov (A, A D) & = & 2 \sum_{r \neq s} a_{r} D_{r s} [{(a d)}_{r s} D_{r s} + p_{r} (1 - p_{r}) {(a d)}_{s r}] + 2 \sum_{r \neq s \neq r'} a_{r'} {(a d)}_{r s} D_{r s} D_{r' s} \\ Cov (A, D D) & = & 2 \sum_{r < s} {(d d)}_{r s} D_{r s}^{2} [(1 - 2 p_{r}) a_{r} + (1 - 2 p_{s}) a_{s}] + 2 \sum_{r \neq s \neq r'} a_{r'} {(d d)}_{r s} D_{r s r'} D_{r s} \\ Cov (D, A A) & = & 2 \sum_{r < s} {(a a)}_{r s} D_{r s} [p_{r} (1 - p_{r}) d_{r} + p_{s} (1 - p_{s}) d_{s}] + 2 \sum_{r \neq s \neq r'} d_{r'} {(a a)}_{r s} D_{r r'} D_{r' s} \\ Cov (D, A D) & = & 2 \sum_{r \neq s} d_{r} D_{r s} (1 - 2 p_{r}) [D_{r s} {(a d)}_{r s} + p_{r} (1 - p_{r}) {(a d)}_{s r}] + 2 \sum_{r \neq s \neq r'} d_{r'} {(a d)}_{r s} D_{r' r s} D_{r' s} \\ C o v (D, D D) & = & \sum_{r < s} {(d d)}_{r s} D_{r s}^{2} [{(1 - 2 p_{r})}^{2} d_{r} + {(1 - 2 p_{s})}^{2} d_{s}] + 2 \sum_{r \neq s \neq r'} d_{r'} {(d d)}_{r s} D_{r' r s}^{2} \\ Cov (A A, A D) & = & 2 \sum_{r < s} {(a a)}_{r s} D_{r s} {{(a d)}_{r s} [(1 - 2 p_{r}) p_{s} (1 - p_{s}) + 2 (1 - 2 p_{s}) D_{r s}] \\ + {(a d)}_{s r} [(1 - 2 p_{s}) p_{r} (1 - p_{r}) + 2 (1 - 2 p_{r}) D_{r s}]} \\ + 2 \sum_{r \neq s \neq s'} {(a a)}_{r s} {{(a d)}_{s s'} [(1 - 2 p_{s}) D_{s s'} D_{r s'} + 2 D_{r s s'} D_{s s'}] \\ + {(a d)}_{s' s} [2 (1 - 2 p_{s}) D_{r s} D_{s s'} + p_{s} (1 - p_{s}) D_{r s s'}]} \\ + \sum_{r \neq s \neq r' \neq s'} {(a a)}_{r s} {(a d)}_{r' s'} (D_{r s s'} D_{r' s} + D_{r r' s'} D_{s s'} + D_{r' s s'} D_{r s'}) \\ Cov (A A, D D) & = & 2 \sum_{r < s} {(a a)}_{r s} {(d d)}_{r s} [2 (1 - 2 p_{r}) (1 - 2 p_{s}) D_{r s}^{2} + p_{r} (1 - p_{r}) p_{s} (1 - p_{s}) D_{r s} - D_{r s}^{3}] \\ + 2 \sum_{r \neq s \neq s'} {(a a)}_{r s} {(d d)}_{r s'} [2 (1 - 2 p_{r}) D_{r s s'} D_{r s'} + p_{r} (1 - p_{r}) D_{r s'} D_{s s'} - D_{r s} D_{r s'}^{2}] \\ + \frac{1}{2} \sum_{r \neq s \neq r' \neq s'} {(a a)}_{r s} {(d d)}_{r' s'} (D_{r s r' s'} D_{r' s'} + D_{r r' s'} D_{s r' s'} - D_{r s} D_{r' s'}^{2}) \\ Cov (A D, D D) & = & 2 \sum_{r \neq s} {(a d)}_{r s} {(d d)}_{r s} D_{r s} (1 - 2 p_{s}) [(1 - 2 p_{r}) (1 - 2 p_{s}) D_{r s} + p_{r} (1 - p_{r}) p_{s} (1 - p_{s})] \\ + 2 \sum_{r \neq s \neq s'} {(a d)}_{r s} {{(d d)}_{r s'} D_{r s s'} [(1 - 2 p_{r}) D_{r s s'} + p_{r} (1 - p_{r}) D_{s s'}] \\ + {(d d)}_{s s'} D_{s s'} (1 - 2 p_{s}) [(1 - 2 p_{s}) D_{r s s'} + p_{s} (1 - p_{s}) D_{r s'}]} \\ + \sum_{r \neq s \neq r' \neq s'} {(a d)}_{r s} {(d d)}_{r' s'} D_{r s s'} D_{r s r' s'} \end{array}$

where (aa)_sr= (aa)_rsand (dd)_sr= (dd)_rsfor r <s.

The result in Appendix D for the F₂ model with p_r= 1/2 for r = 1, ..., m and also assuming D_rss'= 0 is a special case of the results presented here. There is a difference, by a factor -2, on the specification of v variable for dominance effect for the F₂ model and the general two-allele model, which carries over to the comparison of results in Appendix D and E.

References

Yule GU: Mendel's laws and their probable relations to intra-racial heredity. New Phytologist. 1902, 1: 193-207. 10.1111/j.1469-8137.1902.tb06590.x. 222–238
Article Google Scholar
Yule GU: On the theory of inheritance of quantitative compound characters on the basis of Mendel's laws – A preliminary note. Int Conf Genet 3rd, Rep. Edited by: Wilks W. 1906, Royal Horticultural Society, London, 140-142.
Google Scholar
Pearson K: Mathematical contributions to the theory of evolution XII. On a generalized theory of alternative inheritance, with special reference to Mendel's laws. R Soc Philos Trans A. 1904, 203: 53-86.
Article Google Scholar
Provine WB: The Origins of Theoretical Population Genetics. 1971, Chicago: University of Chicago Press
Google Scholar
Hill WG: Quantitative Genetics, Part I: Explanation and Analysis of Continuous Variation. Benchmark Papers in Genetics. 1984, New York: Van Nostrand Reinhold Company, 15:
Google Scholar
Fisher RA: The correlation between relatives on the supposition of Mendelian inheritance. Trans Roy Soc. 1918, 52: 399-433.
Article Google Scholar
Cockerham C: An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics. 1954, 39: 859-882. [http://www.genetics.org/cgi/reprint/39/6/859?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=&author1=Cockerham+CC&searchid=1140647473818_2969&FIRSTINDEX=30&journalcode=genetics]
PubMed CAS PubMed Central Google Scholar
Kempthorne O: The correlation between relatives in a random mating population. Proc Royal Soc B. 1954, 143: 103-113.
Article Google Scholar
Kempthorne O: An Introduction to Genetic Statistics. 1957, New York: John Wiley & Sons
Google Scholar
Zeng ZB, Wang T, Zou W: Modeling quantitative trait loci and interpretation of models. Genetics. 2005, 169: 1711-1725. 10.1534/genetics.104.035857.
Article PubMed CAS PubMed Central Google Scholar
Cheverud JM, Routman EJ: Epistasis and its contribution to genetic variance components. Genetics. 1995, 139: 1455-1461.
PubMed CAS PubMed Central Google Scholar
Cheverud JM: Detecting epistasis among quantitative trait loci. Epistasis and the Evolutionary Process. Edited by: Wolf JB, Brodie EDI, Wade MJ. 2000, Oxford Univ. Press, Oxford, UK, 58-81.
Google Scholar
Cockerham C: Effects of linkage on the covariances between relatives. Genetics. 1956, 41: 138-141. [http://www.genetics.org/cgi/reprint/41/1/138?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=&author1=Cockerham+CC&searchid=1140647473818_2969&FIRSTINDEX=30&journalcode=genetics]
PubMed CAS PubMed Central Google Scholar
Weir BS: Genetic Data Analysis II. 1996, Sunderland, Massachusetts: Sinauer Associates, Inc
Google Scholar
Slatkin M: On treating the chromosome as the unit of selection. Genetics. 1972, 72: 157-168.
PubMed CAS PubMed Central Google Scholar
Weir BS, Cockerham CC: Mixed self and random mating at two loci. Genet Res. 1973, 21: 247-262.
Article PubMed CAS Google Scholar
Weir BS, Cockerham CC: Two-locus theory in quantitative genetics. Proceedings of the international conference on quantitative genetics. Edited by: Pollack E, Kempthorne O, Bailey TB. 1977, Iowa State University Press, 247-269.
Google Scholar
Kao CH, Zeng ZB: Modeling epistasis of quantitative trait loci using Cockerham's model. Genetics. 2002, 160: 1243-1261.
PubMed PubMed Central Google Scholar
Kao CH, Zeng ZB, Teasdale RD: Multiple interval mapping for quantitative trait loci. Genetics. 1999, 152: 1203-1216.
PubMed CAS PubMed Central Google Scholar
Zeng ZB, Liu J, Stam LF, Kao CH, Mercer JM, Laurie CC: Genetic architecture of a morphological shape difference between two Drosophila species. Genetics. 2000, 154: 299-310.
PubMed CAS PubMed Central Google Scholar
Zeng ZB, Kao CH, Basten CJ: Estimating the genetic architecture of quantitative traits. Genet Res. 1999, 74: 279-289. 10.1017/S0016672399004255.
Article PubMed CAS Google Scholar
Weber K, Eisman R, Higgins S, Kukl L, Patty A, Sparks J, Zeng ZB: An analysis of polygenes affecting wing shape on chromosome three in Drosophila melanogaster. Genetics. 1999, 153: 773-786.
PubMed CAS PubMed Central Google Scholar
Weber K, Eisman R, Higgins S, Morey L, Patty A, Tausek M, Zeng ZB: An analysis of polygenes affecting wing shape on chromosome 2 in Drosophila melanogaster. Genetics. 2001, 159: 1045-1057.
PubMed CAS PubMed Central Google Scholar
Lander ES, Botstein D: Mapping Mendelian Factors Underlying Quantitative Traits Using RFLP Linkage Maps. Genetics. 1989, 121: 185-199.
PubMed CAS PubMed Central Google Scholar
Zeng ZB: Precision mapping of quantitative trait loci. Genetics. 1994, 136: 1457-1468.
PubMed CAS PubMed Central Google Scholar

Download references

Acknowledgements

We are grateful to Bill Hill and Bruce Weir for very helpful discussions and comments and to Wei Zou, Sheng Feng, Sheng-Mao Cheng and Jessica Maia for checking some results. This work was partially supported by NIH GM45344 and by the National Research Initiative of the USDA Cooperative State Research, Education and Extension Service, grant number #2005-00754.

Author information

Authors and Affiliations

Bioinformatics Research Center & Department of Statistics, North Carolina State University, Raleigh, NC, 27695, USA
Tao Wang & Zhao-Bang Zeng
Department of Genetics, North Carolina State University, Raleigh, NC, 27695, USA
Zhao-Bang Zeng
Division of Biostatistics & Human Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, Wl, 53226, USA
Tao Wang

Authors

Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhao-Bang Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhao-Bang Zeng.

Additional information

Authors' contributions

TW conducted the initial derivation of models and partition of variance and drafted the initial manuscript. ZBZ extended the derivation of models and partition of variance, revised and finalized the manuscript. Both authors have read and approved the final manuscript. Comments and requests should be addressed to ZBZ or TW.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Wang, T., Zeng, ZB. Models and partition of variance for quantitative trait loci with epistasis and linkage disequilibrium. BMC Genet 7, 9 (2006). https://doi.org/10.1186/1471-2156-7-9

Download citation

Received: 29 June 2005
Accepted: 10 February 2006
Published: 10 February 2006
DOI: https://doi.org/10.1186/1471-2156-7-9

Models and partition of variance for quantitative trait loci with epistasis and linkage disequilibrium

Abstract

Background

Results

Conclusion

Background

Results

The genetic model

Effects and variance components

Reduced models

1. Backcross population or recombinant inbred population (haploid model)

2. F₂ population

3. A general two-allele model

Discussion

Appendix

A. Cockerham least squares estimates

B. Partition of genotypic variance in linkage equilibrium

C. Partition of genotypic variance in linkage disequilibrium

D. Partition of genotypic variance in F₂ population with linkage disequilibrium

E. Partition of genotypic variance for the general two-allele model with linkage disequilibrium

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Rights and permissions

About this article

Cite this article

Keywords

BMC Genomic Data

Contact us

Models and partition of variance for quantitative trait loci with epistasis and linkage disequilibrium

Abstract

Background

Results

Conclusion

Background

Results

The genetic model

Effects and variance components

Reduced models

1. Backcross population or recombinant inbred population (haploid model)

2. F2 population

3. A general two-allele model

Discussion

Appendix

A. Cockerham least squares estimates

B. Partition of genotypic variance in linkage equilibrium

C. Partition of genotypic variance in linkage disequilibrium

D. Partition of genotypic variance in F2 population with linkage disequilibrium

E. Partition of genotypic variance for the general two-allele model with linkage disequilibrium

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomic Data

Contact us

2. F₂ population

D. Partition of genotypic variance in F₂ population with linkage disequilibrium