Multitrait analysis of quantitative trait loci using Bayesian composite space approach

Fang, Ming; Jiang, Dan; Pu, Li Jun; Gao, Hui Jiang; Ji, Peng; Wang, Hong Yi; Yang, Run Qing

doi:10.1186/1471-2156-9-48

Methodology article
Open access
Published: 18 July 2008

Multitrait analysis of quantitative trait loci using Bayesian composite space approach

Ming Fang¹,
Dan Jiang²,
Li Jun Pu¹,
Hui Jiang Gao^3,4,
Peng Ji⁵,
Hong Yi Wang⁵ &
…
Run Qing Yang⁶

BMC Genetics volume 9, Article number: 48 (2008) Cite this article

4920 Accesses
5 Citations
Metrics details

Abstract

Background

Multitrait analysis of quantitative trait loci can capture the maximum information of experiment. The maximum-likelihood approach and the least-square approach have been developed to jointly analyze multiple traits, but it is difficult for them to include multiple QTL simultaneously into one model.

Results

In this article, we have successfully extended Bayesian composite space approach, which is an efficient model selection method that can easily handle multiple QTL, to multitrait mapping of QTL. There are many statistical innovations of the proposed method compared with Bayesian single trait analysis. The first is that the parameters for all traits are updated jointly by vector or matrix; secondly, for QTL in the same interval that control different traits, the correlation between QTL genotypes is taken into account; thirdly, the information about the relationship of residual error between the traits is also made good use of. The superiority of the new method over separate analysis was demonstrated by both simulated and real data. The computing program was written in FORTRAN and it can be available for request.

Conclusion

The results suggest that the developed new method is more powerful than separate analysis.

Background

Multitrait analysis is defined as a method that includes all traits simultaneously in a single model [1], and can take into account the correlation among all traits. Many methods have been developed for mapping QTL by combining information of multiple traits. Jiang and Zeng [2] proposed a maximum likelihood approach, and concluded that joint analysis could improve the precision of parameter estimates and had higher QTL detecting power than separate analysis. A multitrait least-square approach was proposed by Knott and Haley [3] to detect QTL. It is a method that programs easily and computes fast, and compared with separate analysis of each trait, can increase the power to detect a pleiotropic QTL and improve the precision of the location estimate. Xu et al. [1] developed a maximum likelihood approach for jointly mapping multiple binary traits, which is implemented via EM algorithm. They found that the QTL detecting power of joint analysis was higher than the sum of those of separate analysis. But after the QTL detecting power for separate analysis was redefined more reasonably by a combined power (see also [1]), the power of joint analysis was almost equal to the combined power, that is, joint analysis had almost the same power as separate analysis. For QTL parameter estimation, joint analysis can improve the precision of the QTL position estimates, but the QTL effects and their standard deviations have no obvious difference. Another class of approaches for multitrait analysis that use a dimension reduction technique was proposed by Korol et al. [4]. Mangin et al. [5] used this technique to analyze independent PCA (principal components analysis) trait, and used the PCA test values to detect QTL, which was proved to be asymptotically equivalent to the multivariate maximum-likelihood ratio test. However, the parameters of this kind of methods are often too difficult to interpret biologically. A maximum-likelihood method for multitrait mapping of QTL under outbred population was developed by Eaves et al. [6], which based on identity-by-descent (IBD) variance components model approach, and QTL effects were treated as random.

All the joint mapping approaches mentioned above were based on one-QTL model. Recently, Bayesian methodology has been used for mapping QTL [7–17], and the main advantage is that it can easily handle multiple QTL simultaneously. Currently, Bayesian reversible jump MCMC (RJMCMC) has become a usual method for mapping multiple QTL. Liu et al. [7] applied the method to multitrait mapping of QTL in outbred population under random effect model. However, because the dimension of RJMCMC is variable, it is always subject to poor mixing and hard to converge. Godsill [18] developed an effective Bayesian composite space method for model selection which keeps the model dimension fixed in each round of updating, and therefore it converges faster and is much easier to program. Yi et al. [15–17] successfully applied the novel approach to map QTL. In this article, we extend Bayesian composite space approach to multitrait analysis under inbred line crosses, and use both simulated data and real data to demonstrate the advantages and disadvantages of the proposed method.

Results

Simulation Study

We simulated 200 backcross individuals, and each has marker information and phenotypic records for three traits. One chromosome with length of 600 cM was investigated. Twenty-one markers were put on the genome with an average distance of 20 cM. Marker genotypes were observed for all the individuals. Thirteen QTL were added onto the genome, of which locus 96, 423, 487 and 584 had pleiotropic effects, and locus 250, 253 and 256, and locus 535 and 537 were closely linked and controlled different traits respectively. The positions and the effects of QTL for each trait are listed in Table 1. The population means for all traits were set to zero. The residual (co)variances are listed in Table 2. The heritability of each trait can be calculated as 0.728 for trait 1, 0.691 for trait 2 and 0.598 for trait 3.

Table 1 QTL Parameters and their estimates obtained from the simulated data

Full size table

Table 2 The true values and their estimates of residual error (co)variance obtained from the simulated data

Full size table

In order to investigate the performance of our approach, two methods were used to analyze the simulated data. The first method was the proposed multitrait analysis; the second is single-trait analysis. In single-trait analysis, we use the method 1 of [16], for the proposed method was a direct extension from it. In both multitrait analysis and single-trait analysis, the prior variance and degree of freedom of the residual error was set to zero, because no prior information was available. The prior expected number of QTL l_k was 3 and the maximum number of QTL L_k equaled to the number of marker intervals (30). Therefore, the prior inclusion probability of the model indicator variable equaled to 0.1. For both methods, the MCMC ran for 1000 cycles as burn-in period (deleted) and then for additional 20,000 cycles after the burn-in. The chain was then thinned to reduce serial correlation by one observation saved every 10 cycles. The posterior sample contained 2000 (20, 000/10 = 2000) observations for the post-MCMC analysis.

The estimates of the QTL parameters for multitrait analysis and separate analysis are listed in Table 1 and Table 2. The results showed that there were no clear differences of the two methods in the estimates of the QTL positions, QTL effects and the corresponding standard deviation. Both methods can estimate QTL positions and effects, all closed to the true values.

Figure 1 and 2 respectively show the profiles of the posterior probability of the QTL positions and the 2log_eBF statistic for multitrait analysis, and Figure 3 and 4 for separate analysis. From these figures, we found that both profiles of the posterior probability of QTL positions and the 2log_eBF statistic for multitrait analysis are generally higher than those for separate analysis. Moreover, two additional QTL located at 483 and 245 were detected by multitrait analysis. These suggested that multitrait analysis may be more powerful than separate analysis.

Real data analysis

We applied the new method to analyze the data from the North American Barley Genome Mapping Project [22]. The DH population included 150 lines (n = 150), each of which was genotyped for 223 codominant markers. These markers covered ~1500 cM of the genome along seven linkage groups with an average marker interval of ~7 cM. Eight traits, grain yield, lodging, height, heading data, grain protein, alpha amylase, diastatic power, and malt extract, were investigated in this project. Agronomic traits were measured in 16 areas, and malting quality traits in 9 areas. In our research, only three traits were studied, grain yield, height, and alpha amylase, and only the records in Crookston and Minnesota were used.

In the analysis, the prior expected number of QTL was taken as 3 for each trait, then the maximum number of QTL was calculated as L_k≈ 3 + 3· ${\sqrt{l}}_{k}$ or L_k= 8. Therefore, the prior inclusion probability of the model indicator variable equals to 0.375. To reduce the model space, we assumed each chromosome contain at most one QTL, except that the 7th was divided into two parts at the middle point and each part contains one QTL, for the results of other analysis (IM, CIM) always show signals of two QTL on 7th chromosome for some traits. Also two methods, multitrait analysis and Bayesian single-trait analysis (method 1 in [16]), were used to analyze the real data. The MCMC ran for 5 × 10⁴ cycles after the first 2000 was discarded. The chain was thinned by every 10 cycles one observation being saved, which yielded 5000 samples for posterior Bayesian analysis.

Figure 5 and Figure 6 show the profiles of 2log_eBF statistic with real data by multitrait analysis and separate analysis. The profiles of Figure 5 are generally higher than that of Figure 6. For trait 1 (grain yield), no QTL was detected by separate analysis (Figure 6a), while eight QTL were detected by multitrait analysis (Figure 5a); for trait 2 (height), three QTL located on chromosomes 1, 2, and 7 were detected by separate analysis, however by multitrait analysis, not only much stronger signals of these three QTL, but also four additional QTL on chromosome 3, 4, 5 and 6 were detected; for trait 3 (alpha amylase), two additional QTL located on chromosome 1, 3 were detected by multitrait analysis. The results of real data analysis also supported the conclusion that multitrait analysis was more powerful than separate analysis.

Discussion

The selection of hyper-parameter of the QTL effect is important in Bayesian analysis, which can influence the efficiency of the model selection. For example, with Bayesian shrinkage method [14], the hyper-parameter is a variable and assigned a special distribution so that no model selection is need. In Bayesian composite space approach, the updating of model indicator variables is closely dependent on QTL effects, but the selection of hyper-parameter is not much strict as Bayesian shrinkage analysis. Many approaches have been proposed for selection of hyper-parameter, and our method is only an extension of the approach of Yi et al. [15]. Moreover, we followed the approaches developed by Yi et al. [15] to obtain the prior probability for model indicator variables. However, we didn't investigate the influence of different prior probability on the results, because the proposed method is very computationally intensive. In addition, we suggested to use CIM-based multitrait analysis [2] to obtain the prior of variance-covariance of residual, but if prior information is not indeed known, we may take the noninformative prior [19], $p (Σ_{e}) \propto Σ_{e}^{- 1}$ . In this simulation study, the noninformative prior is used and proved to be able to bring a precise estimate for variance-covariance of residual error.

The proposed multitrait analysis is based on Bayesian composite space approach, while other popular model selection approaches such as Bayesian shrinkage method [14] and Bayesian SSVS method [23] are also very easily extended, and the details will be demonstrated in another paper. We used BC and DH population as examples to demonstrate the efficiency of the method. The new method can be modified to be applied to other experiment designs, such as RIL, F2 design, etc. In addition, we only take the main effect into account, while the epistatic effect also can be included into the model. In that case, the model should be written as: $y_{i} = b_{0} + \sum_{q = 1}^{p} Φ_{q} X_{i q} b_{q} + \sum_{q_{1} < q_{2}}^{p} Φ_{q_{1} q_{2}} X_{i q_{1}} X_{i q_{2}} w_{q_{1} q_{2}} + e_{i}$ , where q is main effect, q₁ and q₂ is two interacting QTL, and $w_{q_{1} q_{2}}$ is (1 × m) column vectors of epistatic effect between QTL q₁ and q₂. Certainly, the implementation will be complicated and quite time-consuming, but nevertheless, the extension is feasible and expected to be very efficient for mapping interacting QTL.

In this paper, we have not given a test procedure to distinguish closely linked and pleiotropic QTL which cause the genetic correlations between each trait. There have been some of literatures about it, and generally, the likelihood ratio (LR) statistic [1, 2] and Bayesian factor (BF) statistic [7] always have been used to solve the problem [7]. In our multitrait analysis, although the LR testing procedure in [2] is completely applicable, it is not optimal, because it is based on single-QTL model. Also Bayesian approach can be used for such testing, but the computing time is a big factor of concern. Hopefully, an efficient and fast approach will be developed that could solve the problem nicely.

Conclusion

Bayesian composite space approach [18] is an effective method for model selection. Yi [16] firstly used it for QTL mapping and proved it to be effective for mapping multiple QTL. In this article, we extended this novel statistical method to multitrait mapping of QTL. Compared with separate analysis, joint analysis is optimal, because the parameters are updated by vector or matrix and the correlation information between multiple traits can be made good use of. The powerful of the proposed multitrait method also be proved by both simulation experiments and real data analysis, and they all showed that the multitrait analysis tends to give higher statistical power than the single trait analysis.

Methods

Multivariate linear model

Consider n individuals derived from a backcross population crossed from two inbred lines with observations on some densely distributed codominant markers and on m quantitative traits. Supposed that the maximum number of QTL is p, the phenotypic value y_kiof individual i for k th trait can be described by the following multivariate linear model:

y_{k i} = b_{k 0} + \sum_{j = 1}^{p} γ_{k j} x_{k i j} b_{k j} + e_{k i},

(1)

for i = 1, 2, ..., n and k = 1, 2, ..., m, where γ_kjis model indicator variable, indicating the j th QTL of k th trait included (1) or excluded (0) from the model; b_{k 0}is population mean; b_kjis QTL effect; x_kijis QTL genotype, if QTL genotype is homozygote x_kij= 1, otherwise -1; e_kiis residual error and assumed to follow multivariate normal distribution. If we denote equation (1) by matrix, it can be expressed as:

y_{i} = b_{0} + \sum_{j = 1}^{p} Φ_{j} X_{i j} b_{j} + e_{i},

(2)

for i = 1, 2, ..., n, where y_i= [y_1i, y_2i, ..., y_mi]^T, b₀ = [b₁₀, b₂₀, ..., b_{m 0}]^T, b_j= [b_1j, b_2j, ..., b_mj]^T, e_i= [e_1i, e_2i, ..., e_mi]^T. They are all (1 × m) column vectors. Equation (3) is QTL genotype matrix and Equation (4) is model indicator matrix, they are all (m × m) diagonal matrix.

X_{i j} = [\begin{matrix} x_{1 i j} & 0 & \dots & 0 \\ 0 & x_{2 i j} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & x_{m i j} \end{matrix}]

(3)

Φ_{j} = [\begin{matrix} γ_{1 j} & 0 & \dots & 0 \\ 0 & γ_{2 j} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & γ_{m j} \end{matrix}]

(4)

Prior specification

The prior distribution of each QTL effect vector b_jis multivariate normal distribution, p(b_j) ~ N(0, $Σ_{B_{j}}$ ), where $Σ_{B_{j}}$ is the hyper-parameter, and We take $Σ_{B_{j}} = {[X ._{j}^{T} Σ_{e}^{- 1} X ._{j}]}^{- 1} \cdot n$ , which is simply an extension from Bayesian single trait analysis [15]. The importance of the choice of the hyper-parameter will be discussed later. In a large backcross population and under the definition of x_mij(-1 or 1), $Σ_{B_{j}}$ can be simplified as $Σ_{B_{j}}$ = Σ_e. The prior of the covariance matrix of residual error follows Inverse Wishart distribution, Σ_e~ Wishart^-1(v_e, $S_{e}^{2}$ ), where, v_eand $S_{e}^{2}$ are prior degree of freedom and covariance matrix of residual error, respectively, and can be obtained from other method, such as CIM based multitrait analysis [2], etc. The prior distribution of population mean b₀ is normal distribution with mean and variance equal to those calculated by phenotypic values. The prior probability distribution of QTL position λ_kjis uniform distribution with bounds of two flanking markers, p(λ_kj) = 1/d_j, where d_jis length of the interval where j th QTL is confined. Assuming that epistatic effect is absent, the prior inclusion probability for j th effect can be expressed as p(γ_kj= 1) = 1 - l_k/L_k]^1/N(see also [15]), where l_kis the prior expected number of main-effect QTL, and could be roughly estimated with the use of standard genome scans; N is the number of possible main effects for each QTL and equal to 1 in BC family [15]; L_kis the upper bound of QTL number, and equals to the number of marker interval in our simulation study, while in another approach suggested by Yi [15]L_kis taken as 3 + 3· ${\sqrt{l}}_{k}$ , which causes the model space to reduce dramatically [15].

Joint posterior density

The observable variables include phenotypic values, $y = {y_{i}}_{i = 1}^{n}$ and marker information, $m = {m_{i j}}_{i = 1, j = 1}^{n, p}$ . The unobservable variables include population mean, $b_{0} = {b_{k 0}}_{k = 1}^{m}$ ; QTL effects, $b = {b_{j}}_{j = 1}^{p}$ ; QTL genotypes, $X = {X_{i j}}_{i = 1, j = 1}^{n, p}$ ; model indicator variables, $Φ = {Φ_{j}}_{j = 1}^{p}$ ; (co)variance of residual error, Σ_e, and QTL positions, $λ = {λ_{k j}}_{k = 1, j = 1}^{m, p}$ . Let θ be the vector of hyper-parameters, Θ = {b₀, b, Σ_e, λ, X, Φ}, then the joint prior density of the unobservable variables is denoted by p(Θ|θ). The joint posterior probability of Θ, given the observable variables y and m, can be expressed as:

p(Θ|y, m) ∝ p(Θ|θ)·p(y, m|Θ), (2)

where, p(y, m|Θ) is the likelihood and can be written as:

p(y, m|Θ) = p(y|Θ)·p(m|Θ), (6)

where p(y|Θ) is multivariate normal density, and p(m|Θ) can be derived from a Markov model [14].

MCMC sampling

MCMC algorithm generates samples from Markov chains which converge to the posterior distribution of parameters, without the constant of proportionality being calculated. From these posterior samples, summary statistic of the posterior distribution can be calculated. MCMC algorithm proceeds as follows:

a. Initialize all parameters with values in their legal domain.

b. Update the population mean b₀.

c. Update the QTL effects vectors ${b_{j}}_{j = 1}^{p}$ .

d. Update the variance-covariance matrix Σ_eof the residual error.

e. Update the QTL genotype indicator matrices ${X_{i j}}_{i = 1}^{n}$ and the QTL location vectors ${λ_{k j}}_{k = 1}^{m}$ jointly, for j = 1, 2,..., p.

f. Update the model indicator variable matrices ${Φ_{j}}_{j = 1}^{p}$ .

The conditional posterior distribution of the population mean b₀ is multivariate normal with mean

{\bar{b}}_{0} = {[\sum_{i = 1}^{n} (Σ_{e}^{- 1})]}^{- 1} \sum_{i = 1}^{n} Σ_{e}^{- 1} (y_{i} - \sum_{j = 1}^{p} Φ_{j} X_{i j} b_{j}),

(7)

and variance-covariance matrix

Σ_{b_{0}} = {[\sum_{i = 1}^{n} (Σ_{e}^{- 1})]}^{- 1} .

(8)

The conditional posterior distribution of the QTL effect b_jis sampled from multivariate normal distribution with mean

{\bar{b}}_{j} = {[Σ_{B}^{- 1} + \sum_{i = 1}^{n} (X_{i j}^{T} Φ_{j}^{T} Σ_{e}^{- 1} Φ_{j} X {}_{i j})]}^{- 1} \sum_{i = 1}^{n} X_{i j}^{T} Φ_{j}^{T} Σ_{e}^{- 1} (y_{i} - \sum_{j \neq 1}^{p} Φ_{j} X_{i j} b_{j} - b_{0}),

(9)

and variance-covariance matrix

Σ_{b_{j}} = {[Σ_{B}^{- 1} + \sum_{i = 1}^{n} (X_{i j}^{T} Φ_{j}^{T} Σ_{e}^{- 1} Φ_{j} X {}_{i j})]}^{- 1} .

(10)

The posterior distribution of the residual error follows inverted Wishart distribution,

Σ_{e} ~ W i s h a r t^{- 1} (d f_{e} + ν_{e}, Ω^{T} Ω + S_{e}^{2}),

(11)

where $Ω = y_{i} - \sum_{j = 1}^{p} Φ_{j} X_{i j} b_{j} - b_{0}$ and df_e= n.

In step e, the QTL locations and QTL genotype matrices are updated jointly. For locus j, we can firstly sample a new QTL position for each trait from their prior distribution (described later), then sample the QTL genotype matrices ${X_{i j}}_{i = 1}^{n}$ on the new position using equation (15), and finally, they are updated by the efficient Metropolis-Hastings algorithm [20, 21]. Because the sampling of X_ijis too complicate and we are going to firstly describe it. Due to the QTL genotype x_kijhas two possible values (-1 or 1) in BC line, if m traits are investigated jointly, X_ijhas 2^mkinds of possible formations, and the general pattern of X_ijcan be written as:

H_{i j, z_{1} z_{2} \dots z_{m}} = [\begin{matrix} x_{1 i j} = z_{1} & 0 & \dots & 0 \\ 0 & x_{2 i j} = z_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & x_{m i j} = z_{m} \end{matrix}],

(12)

where, z₁, z₂, ..., z_m∈ {-1, 1}. For clarity, we omit the subscript ij from $H_{i j, z_{1} z_{2} \dots z_{m}}$ and present formulas $H_{z_{1} z_{2} \dots z_{m}}$ to denote the genotype matrix of i th individual and j th loci. Because the QTL genotypes x_kijof i th individual in the j th interval for all traits may be correlated, the joint prior probability of the genotype matrix X_ijcan't be simply expressed by the following equation:

\begin{matrix} p (X_{i j} = H_{z_{1} z_{2} \dots z_{m}} | λ_{j}, m_{i, j}, m_{i, j + 1}) = p (x_{1 i j} = z_{1}, x_{2 i j} = z_{2}, \dots, x_{m i j} = z_{m} | λ_{j}, m_{i, j}, m_{i, j + 1}) \\ = \prod_{k = 1}^{m} p (x_{k i j} = z_{k} | m_{i, j}, m_{i, j + 1}) \end{matrix} .

(13)

Instead, it can be derived from the Markov model (see Equation 14), assuming that the order of markers and QTL is M_jQ₁Q₂ ... Q_mM_j+1(see Figure 7), where, Q₁, Q₂, ..., and Q_mdenote the QTL respectively affecting trait 1, trait 2, ..., and trait m in j th marker interval. Indicator variables x_1ij, x_2ij, ..., and x_mijdenote the genotypes of these QTL.

\begin{array}{l} p (X_{i j} = H_{z_{1} z_{2} \dots z_{m}} | m_{i, j}, λ_{j}, m_{i, j + 1}) = p (x_{1 i j} = z_{1}, x_{2 i j} = z_{2}, \dots, x_{m i j} = z_{m} | m_{i, j}, λ_{j}, m_{i, j + 1}) \\ \begin{matrix} = p (x_{1 i j} = z_{1} | m_{i, j}, λ_{1 j}, m_{i, j + 1}) \cdot p (x_{2 i j} = z_{2} | m_{i, j}, λ_{2 j}, x_{1 i j}, m_{i, j + 1}) \\ \times \cdot \dots \cdot p (x_{m i j} = z_{m} | m_{i, j}, x_{1 i j}, x_{2 i j}, \dots, x_{(m - 1) i j}, λ_{m j}, m_{i, j + 1}), \end{matrix} \end{array}

(14)

If no segregation interference is considered, the joint prior probability can be factorized into equation (14), and each term in equation (14) can be derived from Haldane map function. Only the first term in equation (14) is conditional on two flanking markers; others are not only conditional on two flanking markers but also on the genotypes of all the QTL prior to the interested one. If double recombination is ignored [2], each term in equation (14) can be inferred only by the genotype of the left nearest loci (marker or QTL) and the right marker, then equation (14) can be simplified as:

\begin{array}{l} p (X_{i j} = H_{z_{1} z_{2} \dots z_{m}} | m_{i, j}, λ_{j}, m_{i, j + 1}) = p (x_{1 i j} = z_{1}, x_{2 i j} = z_{2}, \dots, x_{m i j} = z_{m} | m_{i, j}, λ_{j}, m_{i, j + 1}) \\ = p (x_{1 i j} = z_{1} | m_{i, j}, λ_{1 j}, m_{i, j + 1}) \cdot p (x_{2 i j} = z_{2} | x_{1 i j}, λ_{2 j}, m_{i, j + 1}) \\ \times \cdot \dots \cdot p (x_{m i j} = z_{m} | x_{(m - 1) i j}, λ_{m j}, m_{i, j + 1}), \end{array}

(15)

Each term in equation (15) can be easily inferred.

It is worth mentioning that we assume the sequence of markers and QTL is M_jQ₁Q₂ ... Q_mM_j+1, and in fact, the sequence of QTL may be variable in each round of updating. Therefore, we should firstly ascertain the sequence in each round, and then construct the appropriate formula to calculate the joint prior probability of the QTL genotype p(X_ij= $H_{z_{1} z_{2} \dots z_{m}}$ |m_i,j,λ j,m_i,j+1) according above rules. For clarity, we take an example to demonstrate it. Consider 3 QTL Q₁, Q₂, and Q₃ that affect 3 traits respectively in an interval. Assuming that in a certain round the sequence of markers and QTL is M_jQ₃Q₁Q₂M_j+1, then the formula for calculating the joint prior probability of the QTL genotype can be written as:

\begin{array}{l} p (X_{i j} = H_{z_{1} z_{2} z_{3}} | m_{i, j}, λ_{j}, m_{i, j + 1}) = p (x_{1 i j} = z_{1}, x_{2 i j} = z_{2}, x_{3 i j} = z_{3} | m_{i, j}, λ_{j}, m_{i, j + 1}) \\ \begin{matrix} = p (x_{3 i j} = z_{3} | m_{i, j}, λ_{3 j}, m_{i, j + 1}) \cdot p (x_{1 i j} = z_{1} | x_{3 i j}, λ_{1 j}, m_{i, j + 1}) \\ \times p (x_{2 i j} = z_{2} | x_{1 i j}, λ_{2 j}, m_{i, j + 1}) . \end{matrix} \end{array}

Once we obtain the joint prior probability of the QTL genotype, the joint conditional posterior probability of X_ijcan be expressed as:

p (X_{i j} = H_{z_{1} z_{2} \dots z_{m}} | y_{i}, \dots) = \frac{f (y_{i} | X_{i j} = H_{z_{1} z_{2} \dots z_{m}}, \dots) p (X_{i j} = H_{z_{1} z_{2} \dots z_{m}} | λ_{j}, m_{i j}, m_{i, j + 1})}{\sum_{h_{1} = - 1}^{1} \sum_{h_{2} = - 1}^{1} \dots \sum_{h_{m} = - 1}^{1} f (y_{i} | X_{i j} = H_{h_{1} h_{2} \dots h_{m}}, \dots) p (X_{i j} = H_{h_{1} h_{2} \dots h_{m}} | λ_{j}, m_{i j}, m_{i, j + 1})}

(16)

where $f (y_{i} | X_{i j} = H_{z_{1} z_{2} \dots z_{m}}, \dots)$ is likelihood, and follows multivariable normal distribution,

f (y_{i} | X_{i j} = H_{z_{1} z_{2} \dots z_{m}}, \dots) = \frac{1}{{(2 π)}^{m / 2} {| Σ_{e} |}^{1 / 2}} \exp {- \frac{1}{2} {(y_{i} - \sum_{j = 1}^{p} Φ_{j} X_{i j} b_{j} - b_{0})}^{T} Σ_{e}^{- 1} (y_{i} - \sum_{j = 1}^{p} Φ_{j} X_{i j} b_{j} - b_{0})}

(17)

Once we have calculated 2^mpossible posterior probabilities for the corresponding QTL genotype matrices, we are going to sample one genotype matrix according to their posterior probabilities. We firstly constructed the cumulative probability function F(d) by accumulating the 2^mprobabilities in an arbitrary sequence for d = 1, 2, ..., 2^mand F(0) = 0, which is a discrete distribution; then sampled a random number from uniform distribution, u ~ U[0,1]; and compared u with F(d), if F(d - 1) <u ≤ F(d), then the d th genotype matrix is accepted.

The new sampled QTL genotype matrices ${X_{i j}}_{i = 1}^{n}$ are only the proposal value, which should be updated along with the proposal QTL position vector λ_j= [λ_1j, λ_2j, ..., λ_mj] by the Metropolis-Hastings algorithm [20, 21]. For each trait, the new proposal position is sampled around the existing one from uniform distributions, $λ_{k j}^{*}$ ~ [λ_kj- δ, λ_kj+ δ), where δ is tuning parameter, usually taking a value of 1 or 2 cM. The new position vector is denoted by $λ_{j}^{*} = [λ_{1 j}^{*}, λ_{2 j}^{*}, \dots, λ_{m j}^{*}]$ ; then the new QTL genotype matrix $X_{i j}^{*}$ is sampled conditionally on the new position using equation (16); finally, the position vector $λ_{j}^{*}$ and genotype matrices ${X_{i j}}_{i = 1}^{n}$ are accepted jointly with probability equal to min(1,α), where

α = \frac{\prod_{i = 1}^{n} p (y_{i} | X_{i j}^{*}, λ_{j}^{*}, \dots) p (X_{i j}^{*} | λ_{j}^{*}, \dots) p (λ_{j}^{*})}{\prod_{i = 1}^{n} p (y_{i} | X_{i j}, λ_{j}, \dots) p (X_{i j} | λ_{j}, \dots) p (λ_{j})} \cdot \frac{q (X_{i j} | y_{i}, \dots) q (λ_{j})}{q (X_{i j}^{*} | y_{i}, \dots) q (λ_{j}^{*})},

(18)

p( $λ_{j}^{*}$ ) and p(λ_j) is the prior probability of new and old position respectively, and they are cancelled out under uniform prior distribution; $p (X_{i j}^{*} | λ_{j}^{*}, \dots)$ and p(X_ij|λ_j, ...) is the prior probability of QTL genotype conditional on new and old position, which has been described detailed previously; $\frac{q (X_{i j} | y_{i}, \dots)}{q (X_{i j}^{*} | y_{i}, \dots)} = \frac{p (X_{i j} | y_{i}, \dots)}{p (X_{i j}^{*} | y_{i}, \dots)}$ and $\frac{q (λ_{j})}{q (λ_{j}^{*})} = \frac{\prod_{k = 1}^{m} p (λ_{k j})}{\prod_{k = 1}^{m} p (λ_{k j}^{*})}$ , are all proposal ratio.

In step f, block sampling of the indicator variable matrix Φ_jis expected to have a better performance than separately updating each γ_kjin Φ_j. Due to there are two possible values (0 or 1) for each model indicator γ_kj, if m traits are investigated jointly, each model indicator matrix Φ_jhas 2^mkinds of formations. The general formula of it can be written as:

W_{j, w_{1} w_{2} \dots w_{m}} = [\begin{matrix} γ_{1 j} = w_{1} & 0 & \dots & 0 \\ 0 & γ_{2 j} = w_{2} & 0 & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & γ_{m j} = w_{m} \end{matrix}],

(19)

where, w_k∈ {0,1}, for k = 1, 2, ..., m. Because the prior probability of each γ_kjis independent, the joint prior probability for all possible formations can be written as $p (Φ_{j} = W_{l}) = \prod_{k = 1}^{m} p (γ_{k j} = w_{k})$ . Then the conditional posterior probability of Φ_jcan be written as

p (Φ_{j} = W_{j, w_{1} w_{2} \dots w_{m}} | \dots) = \frac{p (Φ_{j} = W_{j, w_{1} w_{2} \dots w_{m}}) \prod_{i = 1}^{n} f (y_{i} | Φ_{j} = W_{j, w_{1} w_{2} \dots w_{m}}, \dots)}{\sum_{g_{1} \in {0, 1}} \sum_{g_{2} \in {0, 1}} \dots \sum_{g_{m} \in {0, 1}} (p (Φ_{j} = W_{j, g_{1} g_{2} \dots g_{m}}) \prod_{i = 1}^{n} f (y_{i} | Φ_{j} = W_{j, g_{1} g_{2} \dots g_{m}}, \dots))} .

(20)

The approach to sample Φ_jis similar to QTL genotypes sampling previously mentioned.

Post-MCMC analysis

For summarizing the posterior sample, we use the mean of the posterior sample to estimate the QTL effect and the residual (co)variance, and the mode of the posterior probability or the peak of the 2log_eBF statistic to localize QTL. 2log_eBF statistic was introduced by Yi et al.[17] into QTL mapping, and BF statistic is defined as the ratio of the posterior odds to the prior odds for inclusion against exclusion of the locus [24]. The critical value of BF is 3 or 2log_eBF = 2.1 for declaring the existence of a QTL.

In single-trait analysis, we can pick the QTL by plotting the profile of the posterior probability or 2log_eBF statistic against the genome. In multitrait analysis, if only two traits are considered jointly, we can use a three-dimension graph to summarize the statistic for all traits jointly (e.g., Figure 2 in [19]). However, if the number of trait is greater than 2, we can't plot them in one graph. Instead, we can solve the problem by plotting the marginal posterior probability distribution. If we divide the genome into H bins, and denote each bin of k th trait with ζ_kg, for g = 1,2, ..., H, then the marginal posterior probability distribution of ζ_kgis defined as p(ζ_kg|y) = p[(ζ_kg= λ_kq) ∩ (γ_kq= 1)], where, q indicates the q th interval that locus ζ_kgresides in. Then $BF (ζ_{k g}) = \frac{p (ζ_{k g} | y)}{1 - p (ζ_{k g} | y)} \cdot \frac{1 - p (ζ_{k g})}{p (ζ_{k g})}$ , which can be calculated at each possible locus for each trait, respectively.

References

Xu CW, Li ZK, Xu S: Joint mapping of quantitative trait loci for multiple binary characters. Genetics. 2005, 169: 1045-1059. 10.1534/genetics.103.019406.
Article PubMed Central CAS PubMed Google Scholar
Jiang C, Zeng ZB: Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics. 1995, 140: 1111-1127.
PubMed Central CAS PubMed Google Scholar
Knott SA, Haley CS: Multitrait least squares for quantitative trait loci detection. Genetics. 2000, 156: 899-911.
PubMed Central CAS PubMed Google Scholar
Korol AB, Ronin YT, Itskovich AM, Peng J, Nevo E: Enhanced efficiency of quantitative trait loci mapping analysis based on multivariate complexs of quantitative traits. Genetics. 2001, 157: 1789-1803.
PubMed Central CAS PubMed Google Scholar
Mangin B, Thoquet P, Grimslev N: Pleiotropic QTL analysis. Biometrics. 1998, 54: 88-99. 10.2307/2533998.
Article Google Scholar
Eaves LJ, Neale MC, Maes H: Multivariate multipoint linkage analysis of quantitative trait loci. Behav Genet. 1996, 26: 519-525. 10.1007/BF02359757.
Article CAS PubMed Google Scholar
Liu JF, Liu YJ, Liu XG, Deng H-W: Bayesian mapping of quantitative trait loci for multiple complex traits with the use of variance components. Am J Hum Genet. 2007, 81: 304-320. 10.1086/519495.
Article PubMed Central CAS PubMed Google Scholar
Satagopan JM, Yandell BS, Newton MA, Osborn TC: A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics. 1996, 144: 805-816.
PubMed Central CAS PubMed Google Scholar
Yi N, Xu S: Bayesian mapping of quantitative trait loci for complex binary traits. Genetics. 2000, 155: 1391-1403.
PubMed Central CAS PubMed Google Scholar
Yi N, George V, Allison DB: Stochastic search variable selection for identifying multiple quantitative trait loci. Genetics. 2003, 164: 1129-1138.
PubMed Central CAS PubMed Google Scholar
Yi N, Xu S, Allison DB: Bayesian model choice and search strategies for mapping multiple epistatic quantitative trait loci. Genetics. 2003, 165: 867-883.
PubMed Central CAS PubMed Google Scholar
Yi N, Xu S, Allison DB: Bayesian model choice and search strategies for mapping interacting quantitative trait loci. Genetics. 2003, 165: 867-883.
PubMed Central CAS PubMed Google Scholar
Xu S: Derivation of the shrinkage estimates of quantitative trait locus effects. Genetics. 2007, 177: 1255-1258. 10.1534/genetics.107.077487.
Article PubMed Central PubMed Google Scholar
Wang H, Zhang YM, Li XM, Masinde GL, Mohan S, Baylink DJ, Xu S: Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics. 2005, 170: 465-480. 10.1534/genetics.104.039354.
Article PubMed Central CAS PubMed Google Scholar
Yi N, Yandell BS, Churchill GA, Allison DB, Eisen EJ, Pomp D: Bayesian model selection for genome-wide epistatic quantitative trait loci analysis. Genetics. 2005, 170: 1333-1344. 10.1534/genetics.104.040386.
Article PubMed Central CAS PubMed Google Scholar
Yi N: A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci. Genetics. 2004, 167: 967-975. 10.1534/genetics.104.026286.
Article PubMed Central CAS PubMed Google Scholar
Yi N, Shriner D, Banerjee S, Mehta T, Pomp D, Yandell BS: An efficient Bayesian model selection approach for interacting quantitative trait loci models with Many Effects. Genetics. 2007, 176: 1865-1877. 10.1534/genetics.107.071365.
Article PubMed Central PubMed Google Scholar
Godsill SJ: On the relationship between MCMC model uncertainty methods. J Comput Graph Stat. 2001, 10: 230-248. 10.1198/10618600152627924.
Article Google Scholar
Gelman A, Carlin J, Stern H, Rubin D: Bayesian Data Analysis. 2004, London, Chapman & Hall
Google Scholar
Hastings WK: Monte Carlo sampling methods using markov chains and their applications. Biometrika. 1970, 57: 97-109. 10.1093/biomet/57.1.97.
Article Google Scholar
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E: Equations of state calculations by fast computing machines. J Chem Phys. 1953, 21: 1087-1091. 10.1063/1.1699114.
Article CAS Google Scholar
Tinker NA, Mather DE, Rossnagel BG, Kasha KJ, Kleinhofs A, Hayes PM, Falk DE, Ferguson T, Shugar LP, Legge WG, Irvine RB, Choo TM, Briggs KG, Ullrich SE, Franckowiak JD, Blake TK, Graf RJ, Dofing SM, Saghai Maroof MA, Scoles GJ, Hoffman D, Dahleen LS, Kilian A, Chen F, Biyashev RM, Kudrna DA, Steffenson BJ: Regions of the genome that affect agronomic performance in two-row barley. Crop Sci. 1996, 36: 1053-1062.
Article Google Scholar
Yi N, George V, Allison DB: Stochastic search variable selection for identifying multiple quantitative trait loci. Genetics. 2003, 164: 1129-1138.
PubMed Central CAS PubMed Google Scholar
Kass RE, Raftery AE: Bayes factors. J Am Stat Assoc. 1995, 90: 773-795. 10.2307/2291091.
Article Google Scholar

Download references

Acknowledgements

We deeply thank four anonymous reviewers for their criticisms and comments which have greatly improved the presentation of the manuscript. This work was partly supported by Heilongjiang August First Land Reclamation University.

Author information

Authors and Affiliations

Life Science College, Heilongjiang August First Land Reclamation University, Daqing, 163319, PR China
Ming Fang & Li Jun Pu
College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, PR China
Dan Jiang
College of Animal Science and Technology, Northeast Agricultural University, Harbin, 150030, PR China
Hui Jiang Gao
College Animal Science and Technology, China Agricultural University, Beijing, 100094, PR China
Hui Jiang Gao
College of Plant Science and Technology, Heilongjiang August First Land Reclamation University, Daqing, 163319, PR China
Peng Ji & Hong Yi Wang
School of Agriculture and Biology, Shanghai Jiaotong University, Shanghai, 201101, PR China
Run Qing Yang

Authors

Ming Fang
View author publications
You can also search for this author in PubMed Google Scholar
Dan Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Li Jun Pu
View author publications
You can also search for this author in PubMed Google Scholar
Hui Jiang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Peng Ji
View author publications
You can also search for this author in PubMed Google Scholar
Hong Yi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Run Qing Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Fang.

Additional information

Authors' contributions

MF coordinated the study, developed the foundational principle of the method and wrote the computing program and the paper. Others were responsible for the simulation experiment, carried out the analysis of results and helped to consummate the whole paper.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Fang, M., Jiang, D., Pu, L.J. et al. Multitrait analysis of quantitative trait loci using Bayesian composite space approach. BMC Genet 9, 48 (2008). https://doi.org/10.1186/1471-2156-9-48

Download citation

Received: 19 September 2007
Accepted: 18 July 2008
Published: 18 July 2008
DOI: https://doi.org/10.1186/1471-2156-9-48

Multitrait analysis of quantitative trait loci using Bayesian composite space approach