Mapping QTL based on the mean phenotype
Take a simple F2 population of size n derived from two homozygous lines as an example. There are the three possible genotypes denoted by Q1Q1, Q1Q2, and Q2Q2, respectively, at a quantitative trait locus Q. The phenotypic value of an individual i is usually described by the following linear model,
y
i
= μ + z
i
a + w
i
d + e
i
,
Where μ is the population mean, a and d are additive and dominant effects of the QTL, e
i
is the residual error with a N(0, σ2) distribution, and
If m
i
records are repeatedly sampled from each individual and the phenotypic value of an individual i is measured by the average of m
i
records, the model is modified as
(2)
where
and the variable with additional subscript j indicates the corresponding variable for the j th record of the i th F2 individual. The residual error now follows a N(0, σ2/m
i
) distribution, given that e
ij
~ N(0, σ2).
Let
(3)
be the conditional density of , where θ = [μ a d σ2]Tare the parameters; the log likelihood function defined under the missing variables z
i
and w
i
is
(4)
The expectation-maximization (EM) algorithm [8] can be used to obtain the MLE, as shown below,
(5)
and
(6)
The expectation shown in Equation 6 can be further expressed as
Define the posterior probabilities of the three QTL genotypes for j th individual as
(7)
where p
ik
are the conditional probabilities inferred by marker information, then
(8)
Because is a function of the unknown parameters, iterations are required for EM algorithm. The iterations are described as
Step 0: Set up initials for θ(0).
Step 1: Calculate the posterior probabilities with equation (7).
Step 2: Substituting (8) into equation (5), estimate
(9)
Step 3: Substituting (8) into equation (6), estimate
Step 4: Go to step 1, which complete one round of iteration.
Mapping QTL based on the repeatability model
Partitioning residual error e
i
in model (1) into an individual-specific permanent environmental effect ζ
i
and random environmental effect ε
ij
, the j th phenotypic value of an individual i is represented as
y
ij
= μ + z
i
a + w
i
d + ζ
i
+ ε
ij
This is a mixed effects model, also called repeatability model, with a and d being treated as the fixed effects and p
i
as the random effect. i.i.d. N(0, ) distribution and ε
ij
i.i.d. N(0, ) distribution.
We use an m
i
× 1 vector y
i
= [yi 1yi 2… y
im
]T, for n = 1, 2, …, n to denote the array of phenotypic values for the i th individual and define ϕ
i
= [1 1 … 1]Tas a vector of dimension m
i
. In matrix notation, model (9) can be written as
y
i
= Ï•
i
μ + z
i
Ï•
i
a + w
i
Ï•
i
d + Ï•
i
ζ
i
+ ε
i
(11)
where ε
i
= [εi 1εi 2 … ε
im
]Tis an m
i
× 1 vector for the random environmental effects which follows N(0, I
i
, ) with I
i
being an (m
i
× 1) × (m
i
× 1) identity matrix. The conditional expectation of model (11) given the fixed effects is
E(y
i
) = M
i
= Ï•
i
μ + z
i
Ï•
i
a + w
i
Ï•
i
d (12)
and the variance-covariance matrix is
(13)
which applies to all i = 1, 2, …, n.
The conditional density of y
i
based on M
i
and V
i
is
(14)
where θ = [μ a d ]. Corresponding log-likelihood function defined is
(15)
With derivative for μ, a and d, we can obtain
(16)
but the explicit equations for and can not be derived in the same way. Instead of above likelihood function, we construct the following likelihood function by using joint conditional density of ,
(17)
Where θ1 = [μ a d ζ
i
]
With derivative for θ1, we obtain
(18)
and
(19)
Where
(20)
so, we can simply utilize existing mixed model EM algorithm to find the MLE of parameters [9]. Followings are the EM steps for the mixed model analysis.
Step 0: Initialize all parameters with values in their legal domain, denoted by θ(0).
Step 1: Compute the posterior probabilities of the three genotypes for each individual
(21)
Step 2: Compute all the expectations involved in the following maximization steps (same with the equation (8)).
Step 3: Find the posterior distribution of the random effect p
i
from equation (18). This posterior distribution turns out to be a mixture of three normal distributions with a mean
(22)
and a variance
(23)
Step 4: Update the population mean, additive effect and dominance effect by equation (16). The resulting equations are equivalent to equations (9) replacing m
i
with .
Step 5: Update the covariance matrix of the random effect
(24)
Step 6: Update the residual variance by equation (19)
Step7: Repeat from step 1 to step 6 until a certain convergence criterion is reached.
MLE of parameters in both model (2) and (10) are iteratively solved at specific location on chromosomes using EM algorithm and the QTL position and effects are determined by means of likelihood ratio statistics in chromosome or genome scanning.