- Methodology article
- Open Access
- Published:

# Multitrait analysis of quantitative trait loci using Bayesian composite space approach

*BMC Genetics*
**volume 9**, Article number: 48 (2008)

## Abstract

### Background

Multitrait analysis of quantitative trait loci can capture the maximum information of experiment. The maximum-likelihood approach and the least-square approach have been developed to jointly analyze multiple traits, but it is difficult for them to include multiple QTL simultaneously into one model.

### Results

In this article, we have successfully extended Bayesian composite space approach, which is an efficient model selection method that can easily handle multiple QTL, to multitrait mapping of QTL. There are many statistical innovations of the proposed method compared with Bayesian single trait analysis. The first is that the parameters for all traits are updated jointly by vector or matrix; secondly, for QTL in the same interval that control different traits, the correlation between QTL genotypes is taken into account; thirdly, the information about the relationship of residual error between the traits is also made good use of. The superiority of the new method over separate analysis was demonstrated by both simulated and real data. The computing program was written in FORTRAN and it can be available for request.

### Conclusion

The results suggest that the developed new method is more powerful than separate analysis.

## Background

Multitrait analysis is defined as a method that includes all traits simultaneously in a single model [1], and can take into account the correlation among all traits. Many methods have been developed for mapping QTL by combining information of multiple traits. Jiang and Zeng [2] proposed a maximum likelihood approach, and concluded that joint analysis could improve the precision of parameter estimates and had higher QTL detecting power than separate analysis. A multitrait least-square approach was proposed by Knott and Haley [3] to detect QTL. It is a method that programs easily and computes fast, and compared with separate analysis of each trait, can increase the power to detect a pleiotropic QTL and improve the precision of the location estimate. Xu et al. [1] developed a maximum likelihood approach for jointly mapping multiple binary traits, which is implemented via EM algorithm. They found that the QTL detecting power of joint analysis was higher than the sum of those of separate analysis. But after the QTL detecting power for separate analysis was redefined more reasonably by a combined power (see also [1]), the power of joint analysis was almost equal to the combined power, that is, joint analysis had almost the same power as separate analysis. For QTL parameter estimation, joint analysis can improve the precision of the QTL position estimates, but the QTL effects and their standard deviations have no obvious difference. Another class of approaches for multitrait analysis that use a dimension reduction technique was proposed by Korol et al. [4]. Mangin et al. [5] used this technique to analyze independent PCA (principal components analysis) trait, and used the PCA test values to detect QTL, which was proved to be asymptotically equivalent to the multivariate maximum-likelihood ratio test. However, the parameters of this kind of methods are often too difficult to interpret biologically. A maximum-likelihood method for multitrait mapping of QTL under outbred population was developed by Eaves et al. [6], which based on identity-by-descent (IBD) variance components model approach, and QTL effects were treated as random.

All the joint mapping approaches mentioned above were based on one-QTL model. Recently, Bayesian methodology has been used for mapping QTL [7–17], and the main advantage is that it can easily handle multiple QTL simultaneously. Currently, Bayesian reversible jump MCMC (RJMCMC) has become a usual method for mapping multiple QTL. Liu et al. [7] applied the method to multitrait mapping of QTL in outbred population under random effect model. However, because the dimension of RJMCMC is variable, it is always subject to poor mixing and hard to converge. Godsill [18] developed an effective Bayesian composite space method for model selection which keeps the model dimension fixed in each round of updating, and therefore it converges faster and is much easier to program. Yi et al. [15–17] successfully applied the novel approach to map QTL. In this article, we extend Bayesian composite space approach to multitrait analysis under inbred line crosses, and use both simulated data and real data to demonstrate the advantages and disadvantages of the proposed method.

## Results

### Simulation Study

We simulated 200 backcross individuals, and each has marker information and phenotypic records for three traits. One chromosome with length of 600 cM was investigated. Twenty-one markers were put on the genome with an average distance of 20 cM. Marker genotypes were observed for all the individuals. Thirteen QTL were added onto the genome, of which locus 96, 423, 487 and 584 had pleiotropic effects, and locus 250, 253 and 256, and locus 535 and 537 were closely linked and controlled different traits respectively. The positions and the effects of QTL for each trait are listed in Table 1. The population means for all traits were set to zero. The residual (co)variances are listed in Table 2. The heritability of each trait can be calculated as 0.728 for trait 1, 0.691 for trait 2 and 0.598 for trait 3.

In order to investigate the performance of our approach, two methods were used to analyze the simulated data. The first method was the proposed multitrait analysis; the second is single-trait analysis. In single-trait analysis, we use the method 1 of [16], for the proposed method was a direct extension from it. In both multitrait analysis and single-trait analysis, the prior variance and degree of freedom of the residual error was set to zero, because no prior information was available. The prior expected number of QTL *l*_{k} was 3 and the maximum number of QTL *L*_{k} equaled to the number of marker intervals (30). Therefore, the prior inclusion probability of the model indicator variable equaled to 0.1. For both methods, the MCMC ran for 1000 cycles as burn-in period (deleted) and then for additional 20,000 cycles after the burn-in. The chain was then thinned to reduce serial correlation by one observation saved every 10 cycles. The posterior sample contained 2000 (20, 000/10 = 2000) observations for the post-MCMC analysis.

The estimates of the QTL parameters for multitrait analysis and separate analysis are listed in Table 1 and Table 2. The results showed that there were no clear differences of the two methods in the estimates of the QTL positions, QTL effects and the corresponding standard deviation. Both methods can estimate QTL positions and effects, all closed to the true values.

Figure 1 and 2 respectively show the profiles of the posterior probability of the QTL positions and the 2log_{e}BF statistic for multitrait analysis, and Figure 3 and 4 for separate analysis. From these figures, we found that both profiles of the posterior probability of QTL positions and the 2log_{e}BF statistic for multitrait analysis are generally higher than those for separate analysis. Moreover, two additional QTL located at 483 and 245 were detected by multitrait analysis. These suggested that multitrait analysis may be more powerful than separate analysis.

### Real data analysis

We applied the new method to analyze the data from the North American Barley Genome Mapping Project [22]. The DH population included 150 lines (*n* = 150), each of which was genotyped for 223 codominant markers. These markers covered ~1500 cM of the genome along seven linkage groups with an average marker interval of ~7 cM. Eight traits, grain yield, lodging, height, heading data, grain protein, alpha amylase, diastatic power, and malt extract, were investigated in this project. Agronomic traits were measured in 16 areas, and malting quality traits in 9 areas. In our research, only three traits were studied, grain yield, height, and alpha amylase, and only the records in Crookston and Minnesota were used.

In the analysis, the prior expected number of QTL was taken as 3 for each trait, then the maximum number of QTL was calculated as *L*_{
k
}≈ 3 + 3·{\sqrt{l}}_{k} or *L*_{
k
}= 8. Therefore, the prior inclusion probability of the model indicator variable equals to 0.375. To reduce the model space, we assumed each chromosome contain at most one QTL, except that the 7th was divided into two parts at the middle point and each part contains one QTL, for the results of other analysis (IM, CIM) always show signals of two QTL on 7th chromosome for some traits. Also two methods, multitrait analysis and Bayesian single-trait analysis (method 1 in [16]), were used to analyze the real data. The MCMC ran for 5 × 10^{4} cycles after the first 2000 was discarded. The chain was thinned by every 10 cycles one observation being saved, which yielded 5000 samples for posterior Bayesian analysis.

Figure 5 and Figure 6 show the profiles of 2log_{e}BF statistic with real data by multitrait analysis and separate analysis. The profiles of Figure 5 are generally higher than that of Figure 6. For trait 1 (grain yield), no QTL was detected by separate analysis (Figure 6a), while eight QTL were detected by multitrait analysis (Figure 5a); for trait 2 (height), three QTL located on chromosomes 1, 2, and 7 were detected by separate analysis, however by multitrait analysis, not only much stronger signals of these three QTL, but also four additional QTL on chromosome 3, 4, 5 and 6 were detected; for trait 3 (alpha amylase), two additional QTL located on chromosome 1, 3 were detected by multitrait analysis. The results of real data analysis also supported the conclusion that multitrait analysis was more powerful than separate analysis.

## Discussion

The selection of hyper-parameter of the QTL effect is important in Bayesian analysis, which can influence the efficiency of the model selection. For example, with Bayesian shrinkage method [14], the hyper-parameter is a variable and assigned a special distribution so that no model selection is need. In Bayesian composite space approach, the updating of model indicator variables is closely dependent on QTL effects, but the selection of hyper-parameter is not much strict as Bayesian shrinkage analysis. Many approaches have been proposed for selection of hyper-parameter, and our method is only an extension of the approach of Yi et al. [15]. Moreover, we followed the approaches developed by Yi et al. [15] to obtain the prior probability for model indicator variables. However, we didn't investigate the influence of different prior probability on the results, because the proposed method is very computationally intensive. In addition, we suggested to use CIM-based multitrait analysis [2] to obtain the prior of variance-covariance of residual, but if prior information is not indeed known, we may take the noninformative prior [19], p({\mathbf{\Sigma}}_{e})\propto {\mathbf{\Sigma}}_{e}^{-1}. In this simulation study, the noninformative prior is used and proved to be able to bring a precise estimate for variance-covariance of residual error.

The proposed multitrait analysis is based on Bayesian composite space approach, while other popular model selection approaches such as Bayesian shrinkage method [14] and Bayesian SSVS method [23] are also very easily extended, and the details will be demonstrated in another paper. We used BC and DH population as examples to demonstrate the efficiency of the method. The new method can be modified to be applied to other experiment designs, such as RIL, F2 design, *etc*. In addition, we only take the main effect into account, while the epistatic effect also can be included into the model. In that case, the model should be written as: {y}_{i}={b}_{0}+{\displaystyle \sum _{q=1}^{p}{\mathbf{\Phi}}_{q}{X}_{iq}{b}_{q}}+{\displaystyle \sum _{{q}_{1}<{q}_{2}}^{p}{\mathbf{\Phi}}_{{q}_{1}{q}_{2}}{X}_{i{q}_{1}}{X}_{i{q}_{2}}{w}_{{q}_{1}{q}_{2}}}+{e}_{i}, where *q* is main effect, *q*_{1} and *q*_{2} is two interacting QTL, and {w}_{{q}_{1}{q}_{2}} is (1 × *m*) column vectors of epistatic effect between QTL *q*_{1} and *q*_{2}. Certainly, the implementation will be complicated and quite time-consuming, but nevertheless, the extension is feasible and expected to be very efficient for mapping interacting QTL.

In this paper, we have not given a test procedure to distinguish closely linked and pleiotropic QTL which cause the genetic correlations between each trait. There have been some of literatures about it, and generally, the likelihood ratio (LR) statistic [1, 2] and Bayesian factor (BF) statistic [7] always have been used to solve the problem [7]. In our multitrait analysis, although the LR testing procedure in [2] is completely applicable, it is not optimal, because it is based on single-QTL model. Also Bayesian approach can be used for such testing, but the computing time is a big factor of concern. Hopefully, an efficient and fast approach will be developed that could solve the problem nicely.

## Conclusion

Bayesian composite space approach [18] is an effective method for model selection. Yi [16] firstly used it for QTL mapping and proved it to be effective for mapping multiple QTL. In this article, we extended this novel statistical method to multitrait mapping of QTL. Compared with separate analysis, joint analysis is optimal, because the parameters are updated by vector or matrix and the correlation information between multiple traits can be made good use of. The powerful of the proposed multitrait method also be proved by both simulation experiments and real data analysis, and they all showed that the multitrait analysis tends to give higher statistical power than the single trait analysis.

## Methods

### Multivariate linear model

Consider *n* individuals derived from a backcross population crossed from two inbred lines with observations on some densely distributed codominant markers and on *m* quantitative traits. Supposed that the maximum number of QTL is *p*, the phenotypic value *y*_{
ki
}of individual *i* for *k* th trait can be described by the following multivariate linear model:

for *i* = 1, 2, ..., *n* and *k* = 1, 2, ..., *m*, where *γ*_{
kj
}is model indicator variable, indicating the *j* th QTL of *k* th trait included (1) or excluded (0) from the model; *b*_{k 0}is population mean; *b*_{
kj
}is QTL effect; *x*_{
kij
}is QTL genotype, if QTL genotype is homozygote *x*_{
kij
}= 1, otherwise -1; *e*_{
ki
}is residual error and assumed to follow multivariate normal distribution. If we denote equation (1) by matrix, it can be expressed as:

for *i* = 1, 2, ..., *n*, where **y**_{
i
}= [*y*_{1i}, *y*_{2i}, ..., *y*_{
mi
}]^{T}, **b**_{0} = [*b*_{10}, *b*_{20}, ..., *b*_{m 0}]^{T}, **b**_{
j
}= [*b*_{1j}, *b*_{2j}, ..., *b*_{
mj
}]^{T}, **e**_{
i
}= [*e*_{1i}, *e*_{2i}, ..., *e*_{
mi
}]^{T}. They are all (1 × *m*) column vectors. Equation (3) is QTL genotype matrix and Equation (4) is model indicator matrix, they are all (*m* × *m*) diagonal matrix.

### Prior specification

The prior distribution of each QTL effect vector **b**_{
j
}is multivariate normal distribution, *p*(**b**_{
j
}) ~ *N*(0, {\mathbf{\Sigma}}_{{B}_{j}}), where {\mathbf{\Sigma}}_{{B}_{j}} is the hyper-parameter, and We take {\mathbf{\Sigma}}_{{B}_{j}}={\left[X{.}_{j}^{T}{\mathbf{\Sigma}}_{e}^{-1}X{.}_{j}\right]}^{-1}\cdot n, which is simply an extension from Bayesian single trait analysis [15]. The importance of the choice of the hyper-parameter will be discussed later. In a large backcross population and under the definition of *x*_{
mij
}(-1 or 1), {\mathbf{\Sigma}}_{{B}_{j}} can be simplified as {\mathbf{\Sigma}}_{{B}_{j}} = **Σ**_{
e
}. The prior of the covariance matrix of residual error follows Inverse Wishart distribution, **Σ**_{
e
}~ *Wishart*^{-1}(*v*_{
e
}, {S}_{e}^{2}), where, *v*_{
e
}and {S}_{e}^{2} are prior degree of freedom and covariance matrix of residual error, respectively, and can be obtained from other method, such as CIM based multitrait analysis [2], *etc*. The prior distribution of population mean **b**_{0} is normal distribution with mean and variance equal to those calculated by phenotypic values. The prior probability distribution of QTL position *λ*_{
kj
}is uniform distribution with bounds of two flanking markers, *p*(*λ*_{
kj
}) = 1/*d*_{
j
}, where *d*_{
j
}is length of the interval where *j* th QTL is confined. Assuming that epistatic effect is absent, the prior inclusion probability for *j* th effect can be expressed as *p*(*γ*_{
kj
}= 1) = 1 - *l*_{
k
}/*L*_{
k
}]^{1/N}(see also [15]), where *l*_{
k
}is the prior expected number of main-effect QTL, and could be roughly estimated with the use of standard genome scans; *N* is the number of possible main effects for each QTL and equal to 1 in BC family [15]; *L*_{
k
}is the upper bound of QTL number, and equals to the number of marker interval in our simulation study, while in another approach suggested by Yi [15]*L*_{
k
}is taken as 3 + 3·{\sqrt{l}}_{k}, which causes the model space to reduce dramatically [15].

### Joint posterior density

The observable variables include phenotypic values, y={\left\{{y}_{i}\right\}}_{i=1}^{n} and marker information, m={\left\{{m}_{ij}\right\}}_{i=1,j=1}^{n,p}. The unobservable variables include population mean, {b}_{0}={\left\{{b}_{k0}\right\}}_{k=1}^{m}; QTL effects, b={\left\{{b}_{j}\right\}}_{j=1}^{p}; QTL genotypes, X={\left\{{X}_{ij}\right\}}_{i=1,j=1}^{n,p}; model indicator variables, \mathbf{\Phi}={\left\{{\mathbf{\Phi}}_{j}\right\}}_{j=1}^{p}; (co)variance of residual error, **Σ**_{
e
}, and QTL positions, \mathbf{\lambda}={\left\{{\lambda}_{kj}\right\}}_{k=1,j=1}^{m,p}. Let **θ** be the vector of hyper-parameters, **Θ** = {**b**_{0}, **b**, **Σ**_{
e
}, **λ**, **X**, **Φ**}, then the joint prior density of the unobservable variables is denoted by *p*(**Θ**|**θ**). The joint posterior probability of **Θ**, given the observable variables **y** and **m**, can be expressed as:

*p*(**Θ**|**y**, **m**) ∝ *p*(**Θ**|**θ**)·*p*(**y**, **m**|**Θ**), (2)

where, *p*(**y**, **m**|**Θ**) is the likelihood and can be written as:

*p*(**y**, **m**|**Θ**) = *p*(**y**|**Θ**)·*p*(**m**|**Θ**), (6)

where *p*(**y**|**Θ**) is multivariate normal density, and *p*(**m**|**Θ**) can be derived from a Markov model [14].

### MCMC sampling

MCMC algorithm generates samples from Markov chains which converge to the posterior distribution of parameters, without the constant of proportionality being calculated. From these posterior samples, summary statistic of the posterior distribution can be calculated. MCMC algorithm proceeds as follows:

a. Initialize all parameters with values in their legal domain.

b. Update the population mean **b**_{0}.

c. Update the QTL effects vectors {\left\{{b}_{j}\right\}}_{j=1}^{p}.

d. Update the variance-covariance matrix **Σ**_{
e
}of the residual error.

e. Update the QTL genotype indicator matrices {\left\{{X}_{ij}\right\}}_{i=1}^{n} and the QTL location vectors {\left\{{\mathbf{\lambda}}_{kj}\right\}}_{k=1}^{m} jointly, for *j* = 1, 2,..., *p*.

f. Update the model indicator variable matrices {\left\{{\mathbf{\Phi}}_{j}\right\}}_{j=1}^{p}.

The conditional posterior distribution of the population mean **b**_{0} is multivariate normal with mean

and variance-covariance matrix

The conditional posterior distribution of the QTL effect **b**_{
j
}is sampled from multivariate normal distribution with mean

and variance-covariance matrix

The posterior distribution of the residual error follows inverted Wishart distribution,

where \mathbf{\Omega}={y}_{i}-{\displaystyle \sum _{j=1}^{p}{\mathbf{\Phi}}_{j}{X}_{ij}{b}_{j}}-{b}_{0} and *df*_{
e
}= *n*.

In step e, the QTL locations and QTL genotype matrices are updated jointly. For locus *j*, we can firstly sample a new QTL position for each trait from their prior distribution (described later), then sample the QTL genotype matrices {\left\{{X}_{ij}\right\}}_{i=1}^{n} on the new position using equation (15), and finally, they are updated by the efficient Metropolis-Hastings algorithm [20, 21]. Because the sampling of **X**_{
ij
}is too complicate and we are going to firstly describe it. Due to the QTL genotype *x*_{
kij
}has two possible values (-1 or 1) in BC line, if *m* traits are investigated jointly, **X**_{
ij
}has 2^{m}kinds of possible formations, and the general pattern of **X**_{
ij
}can be written as:

where, *z*_{1}, *z*_{2}, ..., *z*_{
m
}∈ {-1, 1}. For clarity, we omit the subscript *ij* from {H}_{ij,{z}_{1}{z}_{2}\cdots {z}_{m}} and present formulas {H}_{{z}_{1}{z}_{2}\cdots {z}_{m}} to denote the genotype matrix of *i* th individual and *j* th loci. Because the QTL genotypes *x*_{
kij
}of *i* th individual in the *j* th interval for all traits may be correlated, the joint prior probability of the genotype matrix **X**_{
ij
}can't be simply expressed by the following equation:

Instead, it can be derived from the Markov model (see Equation 14), assuming that the order of markers and QTL is *M*_{
j
}*Q*_{1}*Q*_{2} ... *Q*_{
m
}*M*_{j+1}(see Figure 7), where, *Q*_{1}, *Q*_{2}, ..., and *Q*_{
m
}denote the QTL respectively affecting trait 1, trait 2, ..., and trait *m* in *j* th marker interval. Indicator variables *x*_{1ij}, *x*_{2ij}, ..., and *x*_{
mij
}denote the genotypes of these QTL.

If no segregation interference is considered, the joint prior probability can be factorized into equation (14), and each term in equation (14) can be derived from Haldane map function. Only the first term in equation (14) is conditional on two flanking markers; others are not only conditional on two flanking markers but also on the genotypes of all the QTL prior to the interested one. If double recombination is ignored [2], each term in equation (14) can be inferred only by the genotype of the left nearest loci (marker or QTL) and the right marker, then equation (14) can be simplified as:

Each term in equation (15) can be easily inferred.

It is worth mentioning that we assume the sequence of markers and QTL is *M*_{
j
}*Q*_{1}*Q*_{2} ... *Q*_{
m
}*M*_{j+1}, and in fact, the sequence of QTL may be variable in each round of updating. Therefore, we should firstly ascertain the sequence in each round, and then construct the appropriate formula to calculate the joint prior probability of the QTL genotype *p*(**X**_{
ij
}= {H}_{{z}_{1}{z}_{2}\cdots {z}_{m}}|*m*_{i,j},**λ** *j*,*m*_{i,j+1}) according above rules. For clarity, we take an example to demonstrate it. Consider 3 QTL *Q*_{1}, *Q*_{2}, and *Q*_{3} that affect 3 traits respectively in an interval. Assuming that in a certain round the sequence of markers and QTL is *M*_{
j
}*Q*_{3}*Q*_{1}*Q*_{2}*M*_{j+1}, then the formula for calculating the joint prior probability of the QTL genotype can be written as:

Once we obtain the joint prior probability of the QTL genotype, the joint conditional posterior probability of **X**_{
ij
}can be expressed as:

where f({y}_{i}|{X}_{ij}={H}_{{z}_{1}{z}_{2}\cdots {z}_{m}},\cdots ) is likelihood, and follows multivariable normal distribution,

Once we have calculated 2^{m}possible posterior probabilities for the corresponding QTL genotype matrices, we are going to sample one genotype matrix according to their posterior probabilities. We firstly constructed the cumulative probability function *F*(*d*) by accumulating the 2^{m}probabilities in an arbitrary sequence for *d* = 1, 2, ..., 2^{m}and *F*(0) = 0, which is a discrete distribution; then sampled a random number from uniform distribution, *u* ~ *U*[0,1]; and compared *u* with *F*(*d*), if *F*(*d* - 1) <*u* ≤ *F*(*d*), then the *d* th genotype matrix is accepted.

The new sampled QTL genotype matrices {\left\{{X}_{ij}\right\}}_{i=1}^{n} are only the proposal value, which should be updated along with the proposal QTL position vector **λ**_{
j
}= [*λ*_{1j}, *λ*_{2j}, ..., *λ*_{
mj
}] by the Metropolis-Hastings algorithm [20, 21]. For each trait, the new proposal position is sampled around the existing one from uniform distributions, {\mathbf{\lambda}}_{kj}^{\ast} ~ [*λ*_{
kj
}- *δ*, *λ*_{
kj
}+ *δ*), where *δ* is tuning parameter, usually taking a value of 1 or 2 cM. The new position vector is denoted by {\mathbf{\lambda}}_{j}^{\ast}=\left[{\mathbf{\lambda}}_{1j}^{\ast},{\mathbf{\lambda}}_{2j}^{\ast},\cdots ,{\mathbf{\lambda}}_{mj}^{\ast}\right]; then the new QTL genotype matrix {X}_{ij}^{\ast} is sampled conditionally on the new position using equation (16); finally, the position vector {\mathbf{\lambda}}_{j}^{\ast} and genotype matrices {\left\{{X}_{ij}\right\}}_{i=1}^{n} are accepted jointly with probability equal to min(1,α), where

*p*({\mathbf{\lambda}}_{j}^{\ast}) and *p*(**λ**_{
j
}) is the prior probability of new and old position respectively, and they are cancelled out under uniform prior distribution; p({X}_{ij}^{\ast}|{\mathbf{\lambda}}_{j}^{\ast},\cdots ) and *p*(**X**_{
ij
}|**λ**_{
j
}, ...) is the prior probability of QTL genotype conditional on new and old position, which has been described detailed previously; \frac{q({X}_{ij}|{y}_{i},\cdots )}{q({X}_{ij}^{\ast}|{y}_{i},\cdots )}=\frac{p({X}_{ij}|{y}_{i},\cdots )}{p({X}_{ij}^{\ast}|{y}_{i},\cdots )} and \frac{q({\mathbf{\lambda}}_{j})}{q({\mathbf{\lambda}}_{j}^{\ast})}=\frac{{\displaystyle {\prod}_{k=1}^{m}p({\mathbf{\lambda}}_{kj})}}{{\displaystyle {\prod}_{k=1}^{m}p({\mathbf{\lambda}}_{kj}^{\ast})}}, are all proposal ratio.

In step f, block sampling of the indicator variable matrix **Φ**_{
j
}is expected to have a better performance than separately updating each *γ*_{
kj
}in **Φ**_{
j
}. Due to there are two possible values (0 or 1) for each model indicator *γ*_{
kj
}, if *m* traits are investigated jointly, each model indicator matrix **Φ**_{
j
}has 2^{m}kinds of formations. The general formula of it can be written as:

where, *w*_{
k
}∈ {0,1}, for *k* = 1, 2, ..., *m*. Because the prior probability of each *γ*_{
kj
}is independent, the joint prior probability for all possible formations can be written as p({\mathbf{\Phi}}_{j}={W}_{l})={\displaystyle \mathbf{\prod}_{k=1}^{m}p({\gamma}_{kj}={w}_{k})}. Then the conditional posterior probability of **Φ**_{
j
}can be written as

The approach to sample **Φ**_{
j
}is similar to QTL genotypes sampling previously mentioned.

### Post-MCMC analysis

For summarizing the posterior sample, we use the mean of the posterior sample to estimate the QTL effect and the residual (co)variance, and the mode of the posterior probability or the peak of the 2log_{e}BF statistic to localize QTL. 2log_{e}BF statistic was introduced by Yi et al.[17] into QTL mapping, and BF statistic is defined as the ratio of the posterior odds to the prior odds for inclusion against exclusion of the locus [24]. The critical value of BF is 3 or 2log_{e}BF = 2.1 for declaring the existence of a QTL.

In single-trait analysis, we can pick the QTL by plotting the profile of the posterior probability or 2log_{e}BF statistic against the genome. In multitrait analysis, if only two traits are considered jointly, we can use a three-dimension graph to summarize the statistic for all traits jointly (e.g., Figure 2 in [19]). However, if the number of trait is greater than 2, we can't plot them in one graph. Instead, we can solve the problem by plotting the marginal posterior probability distribution. If we divide the genome into *H* bins, and denote each bin of *k* th trait with *ζ*_{
kg
}, for *g* = 1,2, ..., *H*, then the marginal posterior probability distribution of *ζ*_{
kg
}is defined as *p*(*ζ*_{
kg
}|**y**) = *p*[(*ζ*_{
kg
}= *λ*_{
kq
}) ∩ (*γ*_{
kq
}= 1)], where, *q* indicates the *q* th interval that locus *ζ*_{
kg
}resides in. Then \text{BF}({\zeta}_{kg})=\frac{p({\zeta}_{kg}|y)}{1-p({\zeta}_{kg}|y)}\cdot \frac{1-p({\zeta}_{kg})}{p({\zeta}_{kg})}, which can be calculated at each possible locus for each trait, respectively.

## References

Xu CW, Li ZK, Xu S: Joint mapping of quantitative trait loci for multiple binary characters. Genetics. 2005, 169: 1045-1059. 10.1534/genetics.103.019406.

Jiang C, Zeng ZB: Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics. 1995, 140: 1111-1127.

Knott SA, Haley CS: Multitrait least squares for quantitative trait loci detection. Genetics. 2000, 156: 899-911.

Korol AB, Ronin YT, Itskovich AM, Peng J, Nevo E: Enhanced efficiency of quantitative trait loci mapping analysis based on multivariate complexs of quantitative traits. Genetics. 2001, 157: 1789-1803.

Mangin B, Thoquet P, Grimslev N: Pleiotropic QTL analysis. Biometrics. 1998, 54: 88-99. 10.2307/2533998.

Eaves LJ, Neale MC, Maes H: Multivariate multipoint linkage analysis of quantitative trait loci. Behav Genet. 1996, 26: 519-525. 10.1007/BF02359757.

Liu JF, Liu YJ, Liu XG, Deng H-W: Bayesian mapping of quantitative trait loci for multiple complex traits with the use of variance components. Am J Hum Genet. 2007, 81: 304-320. 10.1086/519495.

Satagopan JM, Yandell BS, Newton MA, Osborn TC: A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics. 1996, 144: 805-816.

Yi N, Xu S: Bayesian mapping of quantitative trait loci for complex binary traits. Genetics. 2000, 155: 1391-1403.

Yi N, George V, Allison DB: Stochastic search variable selection for identifying multiple quantitative trait loci. Genetics. 2003, 164: 1129-1138.

Yi N, Xu S, Allison DB: Bayesian model choice and search strategies for mapping multiple epistatic quantitative trait loci. Genetics. 2003, 165: 867-883.

Yi N, Xu S, Allison DB: Bayesian model choice and search strategies for mapping interacting quantitative trait loci. Genetics. 2003, 165: 867-883.

Xu S: Derivation of the shrinkage estimates of quantitative trait locus effects. Genetics. 2007, 177: 1255-1258. 10.1534/genetics.107.077487.

Wang H, Zhang YM, Li XM, Masinde GL, Mohan S, Baylink DJ, Xu S: Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics. 2005, 170: 465-480. 10.1534/genetics.104.039354.

Yi N, Yandell BS, Churchill GA, Allison DB, Eisen EJ, Pomp D: Bayesian model selection for genome-wide epistatic quantitative trait loci analysis. Genetics. 2005, 170: 1333-1344. 10.1534/genetics.104.040386.

Yi N: A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci. Genetics. 2004, 167: 967-975. 10.1534/genetics.104.026286.

Yi N, Shriner D, Banerjee S, Mehta T, Pomp D, Yandell BS: An efficient Bayesian model selection approach for interacting quantitative trait loci models with Many Effects. Genetics. 2007, 176: 1865-1877. 10.1534/genetics.107.071365.

Godsill SJ: On the relationship between MCMC model uncertainty methods. J Comput Graph Stat. 2001, 10: 230-248. 10.1198/10618600152627924.

Gelman A, Carlin J, Stern H, Rubin D: Bayesian Data Analysis. 2004, London, Chapman & Hall

Hastings WK: Monte Carlo sampling methods using markov chains and their applications. Biometrika. 1970, 57: 97-109. 10.1093/biomet/57.1.97.

Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E: Equations of state calculations by fast computing machines. J Chem Phys. 1953, 21: 1087-1091. 10.1063/1.1699114.

Tinker NA, Mather DE, Rossnagel BG, Kasha KJ, Kleinhofs A, Hayes PM, Falk DE, Ferguson T, Shugar LP, Legge WG, Irvine RB, Choo TM, Briggs KG, Ullrich SE, Franckowiak JD, Blake TK, Graf RJ, Dofing SM, Saghai Maroof MA, Scoles GJ, Hoffman D, Dahleen LS, Kilian A, Chen F, Biyashev RM, Kudrna DA, Steffenson BJ: Regions of the genome that affect agronomic performance in two-row barley. Crop Sci. 1996, 36: 1053-1062.

Yi N, George V, Allison DB: Stochastic search variable selection for identifying multiple quantitative trait loci. Genetics. 2003, 164: 1129-1138.

Kass RE, Raftery AE: Bayes factors. J Am Stat Assoc. 1995, 90: 773-795. 10.2307/2291091.

## Acknowledgements

We deeply thank four anonymous reviewers for their criticisms and comments which have greatly improved the presentation of the manuscript. This work was partly supported by Heilongjiang August First Land Reclamation University.

## Author information

### Authors and Affiliations

### Corresponding author

## Additional information

### Authors' contributions

MF coordinated the study, developed the foundational principle of the method and wrote the computing program and the paper. Others were responsible for the simulation experiment, carried out the analysis of results and helped to consummate the whole paper.

## Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

## Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## About this article

### Cite this article

Fang, M., Jiang, D., Pu, L.J. *et al.* Multitrait analysis of quantitative trait loci using Bayesian composite space approach.
*BMC Genet* **9**, 48 (2008). https://doi.org/10.1186/1471-2156-9-48

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/1471-2156-9-48

### Keywords

- Alpha Amylase
- Conditional Posterior Distribution
- Genotype Matrix
- Bayesian Factor
- Multitrait Analysis