- Methodology article
- Open Access
- Published:

# Mapping quantitative trait loci in line cross with repeat records

*BMC Genetics*
**volume 8**, Article number: 47 (2007)

## Abstract

### Background

Phenotypes with repeat records from one individual or multiple individuals were often encountered in practices of mapping QTL in linecross. The current genetic mapping method for a trait with repeat records is adopted by simply replacing the phenotype by the average value of the repeat records. This simple treatment has not sufficiently utilized the information from the replication and ignored the impacts of the permanent environmental effects on the accuracy of the estimated QTL.

### Results

We propose to map QTL by using the repeatability model to directly analyze the repeat records rather than simply analyze the mean phenotype, improving the efficiency of QTL detecting because of adequately utilizing the information from data and allowing for the permanent environmental effects. A maximum likelihood method implemented via the expectation-maximization (EM) algorithm is applied to perform the parameter estimation of the repeatability model. The superiority of the mapping method based on the repeatability model over simple analysis using the mean phenotype was demonstrated by a series of simulations.

### Conclusion

Our results suggest that the proposed method can serve as a powerful alternative to existing methods. By mean of the repeatability model, utilizing the repeat records on individual may improve the efficiency of QTL detecting in line cross.

## Background

Replication is the fundamental of the experimental design, the important advantages of which are that it allows for an estimate of experimental error and increases the reliability of information obtained at each experimental point [1, 2]. Replication denotes sampling or measuring multiple times under the same experimental condition (within one treatment), where the experimental unit may be either one individual or multiple individuals with the identical genetic background.

Often plants or animals are observed more than once for a particular trait. For examples, fleece weight of sheep in different years, blood pressure and pulse of a human over time, litter size of sows over time, antler size of deer in different seasons, racing results of horses from several races, exam scores of students during university and so on. These records observed belong to replicate ones if they are not influenced by the measuring environments, such as the years, seasons, parities, races.

In classical quantitative genetics, a trait with repeat records is generally analysed by means of the repeatability model [3, 4], in which, there is an additional permanent environmental effect besides an individual's additive genetic value for a trait. The permanent environmental effect as a measure of the differences among experimental units, is a non-genetic effect common to all observations on the same individual [5]. Such environmental effects are usually accounted for in the model to ensure accurate prediction of breeding values [4]. However, the repeatability model has not been paid adequate attention to mapping QTL by using data with repeat records.

The current genetic mapping method for a trait with repeat records is adopted by simply replacing the phenotype by the average value of the repeat records [6, 7]. This simple treatment has not sufficiently utilized the information from the replication and ignored the impacts of the permanent environmental effects on the accuracy of the estimated QTL, although it enables to improve the power of detecting QTL with a certain extent.

In this study, we apply the repeatability model to mapping quantitative trait loci with repeat records and demonstrate the higher efficiency of this model by the simulations.

## Theory and methods

### Mapping QTL based on the mean phenotype

Take a simple F_{2} population of size *n* derived from two homozygous lines as an example. There are the three possible genotypes denoted by *Q*_{1}*Q*_{1}, *Q*_{1}*Q*_{2}, and *Q*_{2}*Q*_{2}, respectively, at a quantitative trait locus *Q*. The phenotypic value of an individual *i* is usually described by the following linear model,

*y*_{
i
}= *μ* + *z*_{
i
}*a* + *w*_{
i
}*d* + *e*_{
i
},

Where *μ* is the population mean, a and d are additive and dominant effects of the QTL, *e*_{
i
}is the residual error with a *N*(0, *σ*^{2}) distribution, and

If *m*_{
i
}records are repeatedly sampled from each individual and the phenotypic value of an individual *i* is measured by the average of *m*_{
i
}records, the model is modified as

where

and the variable with additional subscript *j* indicates the corresponding variable for the *j* th record of the *i* th F_{2} individual. The residual error now follows a *N*(0, *σ*^{2}/*m*_{
i
}) distribution, given that *e*_{
ij
}~ *N*(0, *σ*^{2}).

Let

be the conditional density of ${\overline{y}}_{i}$, where *θ* = [*μ a d* *σ*^{2}]^{T}are the parameters; the log likelihood function defined under the missing variables *z*_{
i
}and *w*_{
i
}is

The expectation-maximization (EM) algorithm [8] can be used to obtain the MLE, as shown below,

and

The expectation shown in Equation 6 can be further expressed as

Define the posterior probabilities of the three QTL genotypes for *j* th individual as

where *p*_{
ik
}are the conditional probabilities inferred by marker information, then

Because ${p}_{ik}^{\ast}$ is a function of the unknown parameters, iterations are required for EM algorithm. The iterations are described as

*Step 0*: Set up initials for *θ*^{(0)}.

*Step 1*: Calculate the posterior probabilities ${p}_{ik}^{\ast}$ with equation (7).

*Step 2*: Substituting (8) into equation (5), estimate

*Step 3*: Substituting (8) into equation (6), estimate

*Step 4*: Go to step 1, which complete one round of iteration.

### Mapping QTL based on the repeatability model

Partitioning residual error *e*_{
i
}in model (1) into an individual-specific permanent environmental effect *ζ*_{
i
}and random environmental effect *ε*_{
ij
}, the *j* th phenotypic value of an individual *i* is represented as

*y*_{
ij
}= *μ* + *z*_{
i
}*a* + *w*_{
i
}*d* + *ζ*_{
i
}+ *ε* _{
ij
}

This is a mixed effects model, also called repeatability model, with *a* and *d* being treated as the fixed effects and *p*_{
i
}as the random effect. i.i.d. *N*(0, ${\sigma}_{\zeta}^{2}$) distribution and *ε* _{
ij
}i.i.d. *N*(0, ${\sigma}_{\epsilon}^{2}$) distribution.

We use an *m*_{
i
}× 1 vector *y*_{
i
}= [*y*_{i 1}*y*_{i 2}… *y*_{
im
}]^{T}, for *n* = 1, 2, …, *n* to denote the array of phenotypic values for the *i* th individual and define *ϕ*_{
i
}= [1 1 … 1]^{T}as a vector of dimension *m*_{
i
}. In matrix notation, model (9) can be written as

*y*_{
i
}= *ϕ*_{
i
}*μ* + *z*_{
i
}*ϕ*_{
i
}*a* + *w*_{
i
}*ϕ*_{
i
}*d* + *ϕ*_{
i
}*ζ*_{
i
}+ *ε*_{
i
} (11)

where *ε*_{
i
}= [*ε*_{i 1}*ε*_{i 2} *… ε*_{
im
}]^{T}is an *m*_{
i
}*×* 1 vector for the random environmental effects which follows *N*(0, *I*_{
i
}, ${\sigma}_{\epsilon}^{2}$) with *I*_{
i
}being an (*m*_{
i
}*×* 1) × (*m*_{
i
}*×* 1) identity matrix. The conditional expectation of model (11) given the fixed effects is

*E*(*y*_{
i
}) = *M*_{
i
}= *ϕ*_{
i
}*μ* + *z*_{
i
}*ϕ*_{
i
}*a* + *w*_{
i
}*ϕ*_{
i
}*d* (12)

and the variance-covariance matrix is

which applies to all *i* = 1, 2, …, *n*.

The conditional density of *y*_{
i
}based on *M*_{
i
}and *V*_{
i
}is

where *θ* = [*μ a d* ${\sigma}_{p}^{2}$${\sigma}_{\epsilon}^{2}$]. Corresponding log-likelihood function defined is

With derivative for *μ*, a and d, we can obtain

but the explicit equations for ${\sigma}_{\zeta}^{2}$ and ${\sigma}_{\epsilon}^{2}$ can not be derived in the same way. Instead of above likelihood function, we construct the following likelihood function by using joint conditional density of ${y}_{i}$,

Where *θ*_{1} = [*μ a d ζ*_{
i
}${\sigma}_{\epsilon}^{2}$]

With derivative for *θ*_{1}, we obtain

and

Where

so, we can simply utilize existing mixed model EM algorithm to find the MLE of parameters [9]. Followings are the EM steps for the mixed model analysis.

*Step 0*: Initialize all parameters with values in their legal domain, denoted by *θ*^{(0)}.

*Step 1*: Compute the posterior probabilities of the three genotypes for each individual

*Step 2*: Compute all the expectations involved in the following maximization steps (same with the equation (8)).

*Step 3*: Find the posterior distribution of the random effect *p*_{
i
}from equation (18). This posterior distribution turns out to be a mixture of three normal distributions with a mean

and a variance

*Step 4*: Update the population mean, additive effect and dominance effect by equation (16). The resulting equations are equivalent to equations (9) replacing *m*_{
i
}with ${\phi}_{}^{T}{V}_{i}^{-1}{\phi}_{}$.

*Step 5*: Update the covariance matrix of the random effect

*Step 6*: Update the residual variance by equation (19)

*Step7*: Repeat from step 1 to step 6 until a certain convergence criterion is reached.

MLE of parameters in both model (2) and (10) are iteratively solved at specific location on chromosomes using EM algorithm and the QTL position and effects are determined by means of likelihood ratio statistics in chromosome or genome scanning.

## Simulation studies

A series of simulation experiments were used to compare the efficiency and behaviour of two mapping methods based on the repeatability model with simple analysis using the mean phenotype for a trait with repeat records. We simulated a single chromosome of 100 cM long with 11 evenly spaced codominant markers for an F2 population with sample size *n* = 100 and a single QTL was put at position 25 cM (between markers 3 and 4). Under the null model, the QTL was assigned a value of zero for both the additive and dominance effects. The empirical critical values of likelihood ratio statistics for testing the presence of the QTL were obtained by simulating 1000 replicates. Under the alternative model, nonzero and equal additive and dominance effects were simulated. The simulations were replicated 100 times. Empirical power was calculated by counting the number of runs in which test statistics were greater than the critical values.

Factor considered include the QTL size, measured as the proportion of the phenotypic variance explained by the QTL (also called the QTL heritability), the number of replicates and ${\sigma}_{\zeta}^{2}$:${\sigma}_{\epsilon}^{2}$ i.e the variance ratio of permanent environmental effect to random environmental effect. The QTL size was set at three levels: *a* = *d* = 0.265, 0.577, 0.943 correspond to the three levels of *h*^{2} = 0.05, 0.10, 0.20 respectively. The number of replicates was examined at five levels: *m* = 1, 3, 5, 10, 15, and ${\sigma}_{\zeta}^{2}$: ${\sigma}_{\epsilon}^{2}$ = 1:4, 2:3, 2.5:2.5, 3:2, 4:1, remaining ${\sigma}_{\zeta}^{2}$ + ${\sigma}_{\epsilon}^{2}$ = 5.0.

The *j* th phenotypic value of individual *i* was simulated by using the repeatability model:

*y*_{
ij
}= *μ* + *z*_{
i
}*a* + *w*_{
i
}*d* + *ξ*_{
i
}*σ*_{
ζ
}+ *η*_{
ij
}*σ*_{
ε
} (25)

Where both *ξ*_{
i
}and *η* _{
ij
}are the random numbers from standard normal distribution.

The results of all simulations consistently show that under the same experimental condition, (1) using the repeatability model can significantly increase the statistical power of QTL detecting compared with simple analysis using the mean phenotype, (2) the position and effects of QTL, especially the proportion of phenotypic variance contributed by QTL were more accuracy estimated by using the repeatability model than using the genetic mapping model without permanent environmental effects to analyze mean phenotype. The superiority of the repeatability model over the simple analysis using the mean phenotype performs in evidence under the condition of the low QTL heritability.

The effects of number of replications on the efficiency and behaviour of the two methods were investigated only at variance ratio of permanent environmental effect to random environmental effect of 1:1. The results of simulations were listed in Table 1 and 2, respectively, by different mapping method. Notices that the simulated results at *m* = 1 (no replication) only correspond to the mapping method based on the mean phenotype for no solution by using the repeatability model. As expected, the statistical power of QTL detecting with replication is higher than no replication, based on either the mean phenotype or the repeatability model. The estimation of QTL parameters show a general tendency to improve as the number of replications increases.

We have also investigated the impact of the variance ratio of permanent environmental effect to random environmental effect on differences in mapping performance between the two methods. The results of simulations fixing five replications were listed in Table 3. The difference in variance between permanent environmental effect and random environmental effect is greater under fixing total variance of random effects, the superiority of the mapping method based on the repeatability model over the mean phenotype is clearer in the statistical power of QTL detecting. The possible reasons are that either the large variance of random environmental effect made reliability of the individual's mean phenotype value low or the variance of residual error in model (2) increases with the variance of permanent environmental effect increased.

## Discussion

For a trait with repeat records, we proposed use of the repeatability model to map QTL, which distinguishes from simple analysis using the mean phenotype not only in the data analyzed but essentially in the model adopted. Simple analysis using the mean phenotype was based on regular genetic model for mapping QTL in linecross, which excluded the permanent environmental effects. The excluded permanent environmental effects were deposited to the residual error, decreasing the accuracy of estimation for QTL parameters, which was strictly proved in the relevant books to statistic models [e.g., [10, 11]]. Of course, the loss of data information has also influenced the performance of mapping QTL based on the mean phenotype.

Replication required either the experimental conditions must be the same when multiple records were observed only from one individual or the genetic backgrounds must be the identical for each individual while those records were from multiple individuals. If the former was not satisfied, then such "repeat" records observed became longitudinal data, such as test-day records of milk production and body weight in cattle, were genetically analysed using the random regresion model which is essentially the repeatability model nested submodels of time [12–14]. Besides cloned individuals and progencies from each plant in RIL, the later was hard to be satisfied. For example, there were incompletely same genetic backgrounds among individuals within a family and F3 progenies from one F2 individual. To improve the efficiency of detecting QTL using such data, the genetic backgrounds should be at least taken into account in the analysis [7], furthermore, the repeatability model may be a good choice for directly analyzing such "repeat" records.

Although we demonstrate the statistical method of QTL mapping using a F_{2} population as an example, other more simple or complex designs, such as backcross population and full-sib family can also be extended. Assuming only one QTL in the model considered here is to conveniently investigate efficiency of presented method based on various estimates. If a trait is controlled by multiple loci, the composite interval mapping [15, 16] or Bayesian mapping [e.g., [17, 18]] will be proposed for mapping those QTLs by incorporating marker-cofactors outside the scanning interval or all the QTLs into the model (9).

## References

- 1.
Fisher RA: The design of experiments. 1971, New York, Hafner Publishing Company, 9

- 2.
Steel RGD, Torrie JH: Principles and procedures of statistics: a biometrical approach. 1980, Tokyo, McGraw-Hill Kogakusha, 2

- 3.
Henderson CR: Applications of Linear Models in Animal Breeding. 1984, Guelph ON Univ of Guelph

- 4.
Mrode RA: Linear Models for the Prediction of Animal Breeding Values. 1996, UK, CAB International

- 5.
Falconer DS: Introduction to Quantitative Genetics. 1960, London,Oliver & Boyd

- 6.
Zhang TY, Yuan J, Yu W, Guo Z, Kohel RJ: Molecular tagging of a major QTL for fiber strong in upland cotton and its marker-assisted selection. Theor Appl Genet. 2003, 106: 262-268.

- 7.
Zhang Y, Xu S: Mapping Quantitative Trait Loci in F2 Incorporating Phenotypes of F3 Progeny. Genetics. 2004, 166: 1981-1993. 10.1534/genetics.166.4.1981.

- 8.
Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via EM algorithm. J R Stat Soc Ser B. 1977, 39: 1-38.

- 9.
Henderson CR: Recent developments in variance and covariance estimation. J Anim Sci. 1986, 63: 208-216.

- 10.
Zar JH: Biostatistical Analysis. 1996, Prentice Hall, 3

- 11.
Neter J, Kutner MH, Nachtsheim CJ, Wasserman W: Applied Linear Statistical Models. 1996, RD Irwin, Homewood, IL, 4

- 12.
Henderson CR: Analysis of covariance in the mixed model: Higher level, no homogenous, and random regressions. Biometrics. 1982, 38: 623-640. 10.2307/2530044.

- 13.
Schaeffer LR: Application of random regression model in animal breeding. Livest Prod Sci. 2004, 86: 35-45. 10.1016/S0301-6226(03)00151-9.

- 14.
Macgregor S, Knott SA, White I, Visscher PM: Quantitative trait locus analysis of longitudinal quantitative trait data in complex pedigrees. Genetics. 2005, 171: 1365-1376. 10.1534/genetics.105.043828.

- 15.
Jansen RC: Controlling the type I and type II errors in mapping quantitative trait loci. Genetics. 1994, 138: 871-881.

- 16.
Zeng ZB: Precision mapping of quantitative trait loci. Genetics. 1994, 136: 1457-1468.

- 17.
Satagopan JM, Yandell BS, Newton MA, Osborn TC: A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics. 1996, 144: 805-816.

- 18.
Yi N, Xu S: Bayesian mapping of quantitative trait loci under complicated mating designs. Genetics. 2001, 157: 1759-1771.

## Author information

### Affiliations

### Corresponding author

## Additional information

### Authors' contributions

RQY coordinated the study, developed the foundational principle of the method and wrote the computing program and the paper. FM was responsible for the simulation experiment and carried out the analysis of results.

## Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## About this article

### Cite this article

Yang, R., Fang, M. Mapping quantitative trait loci in line cross with repeat records.
*BMC Genet* **8, **47 (2007). https://doi.org/10.1186/1471-2156-8-47

Received:

Accepted:

Published:

### Keywords

- Residual Error
- Mapping Quantitative Trait Locus
- Dominance Effect
- Conditional Density
- Likelihood Ratio Statistic