Simulation model
Datasets were simulated for a pure bred population and a single trait (average daily gain of pigs, g/day). The phenotype of individuals was composed of heritable and non-heritable effects. We assumed the associative effect to be (partially) genetic and expressed in the performance of the penmates. The simulation model included one fixed effect (sex) and three random effects: genetic direct effect, genetic associative effect and random error [16].
Y = Xb + Z
d
d + Z
a
a + e
Y represented the average daily gain of the pig; b indicated the effect of sex on the average daily gain (+20 for males and -20 for females); d represented the genetic direct effect; a represented the genetic associative effect, affecting the growth of penmates; e represented the independent random residual of the individual pig; X, Z
d
, Z
a
were incidence matrix of each effect corresponding to each individual.
It was assumed that the random effects follow a multivariate normal distribution:
, were the variances of genetic direct effect and genetic associative effect respectively; σ
da
was the covariance between d &a; was the variance of the random residuals, accounting for the non-genetic effect. It was assumed that the residuals of each pig are independent. A represented the numerator relationship matrix, which accounted for the pedigree information of animals. Σ was the variance-covariance matrix of random effects d, a and e. G was the variance-covariance matrix of genetic effects d and a.
For an individual pig i, the average daily gain was simulated as
d
i
denoted the genetic direct effect of the individual i; a
ji
denoted the genetic associative effect coming from the jthpenmate of the ithindividual; p denoted the number of penmates of the ithindividual; e
i
denoted the random residual of the ithindividual.
Based on the Cholesky decomposition [19], G was separated into a product of a matrix and its transpose. For the animals from the base generation, d and a were simulated using the partitioned matrix obtained from G multiplied by standard random normal deviates. For the following generations (progeny), d and a were the sum of the average of the d and a of the parents and a random Mendelian sampling term. e was simulated as the deviation of random residuals multiplied by a standard random normal deviate.
Simulation structure
Simulation included five generations of pigs, and breeding animals for each generation were selected randomly. Litter size followed a normal distribution varying from 2 to 20 with mean equal to 9 and standard deviation equal to 2.6. Sex effect was simulated assuming a uniform distribution, i.e. the probability of male or female was 50%. Since litter size was not fixed, the total number of animals in each replication was not fixed.
In order to investigate the impact of the data structure on the estimation of variance components, a number of data sets were simulated. The influence of pen composition (assignment of pigs to pens), group size (pigs/pen), number of breeding animals and the covariance (cov) between d & a were examined. Three different scenarios were used to assign pigs to pens. The first scenario was to assign pigs randomly (p1). The second one was to assign half of the pens with full sibs and the other half non full sibs (p2) and the third one was to assign full sibs per pen (p3) [6, 20]. Within each pen assignment scenario, 162 and 324 breeding animals (S162 and S324), two different group sizes (5 and 10) and three different correlations between d & a (-0.5, +0.1, +0.5) were specified.
Statistical methods
Variance components were analysed using the Multiple Trait Gibbs Sampler for Animal Models (MTGSAM) [11] and applying the same statistical model as the one used for simulation. In its original version, MTGSAM could only handle models with direct and maternal genetic effects. For this study we modified the program and extended it to include the genetic associative effect.
Two Gibbs chains using different sets of starting values were run to check the convergence. The Geweke and Gelman diagnostic [21, 22] were applied to determine the convergence of two Markov chains. For further analysis, results were summarized over 30 replicates (mean and standard errors were calculated from the 30 groups of posterior means of variance components). Both the Wilcoxon rank sum test and Student t-test were used to test the deviation of the mean from the real parameter values. Root Mean Square Error (RMSE) was used to measure the uncertainty of the estimated values. The RMSE of the estimator was calculated as [23]:
where is the estimated value of parameters obtained in each replication; θ is the true value used for simulation and n is the number of replications.
The accuracy of the parameter estimates was quantified by the percent relative bias. The percent relative bias was calculated as [24]. To study the effects of pen assignment, group size, number of breeding animals and the correlation between d &a simultaneously on the percent relative bias of the four variance components (d, cov, a and e), multivariate analysis of variance (MANOVA) was used (GLM, SAS 9.1.3) [25].
Theoretical justification for including genetic associative effect
Ignoring the genetic associative effect (if present in the data) will often result in biases of variances of genetic direct effect () and random residual () [13]. Looking into the genetic covariance between animals may help to explain the bias. Suppose animal A has penmates P1,...,P4, animal B has penmates Q1,...,Q4, and A and B are from two different pens. Based on the statistical model that we used the genetic covariance between A and B was:
If we consider A and B to be half sibs, then the genetic relationship between A and B is 0.25, i.e. A
AB
= 0.25. To make sure that the covariance between A and B is an unbiased estimation of , should be zero. If σ
ad
and are not equal to zero, the coefficients related to σ
ad
and should be zero. That means animal A is unrelated to the penmates of B and animal B is unrelated to the penmates of A. Moreover, there should be no genetic relationships between penmates P1,...,P4 and Q1,...,Q4. However, it is unrealistic to assume that all animals are genetically unrelated. Thus the terms related to σ
ad
and will be non-zero and affect the estimation of . The magnitude of bias depends on the magnitude of the terms related to σ
ad
and . Only when σ
ad
and have the same magnitude and opposite sign, the bias is zero.
To investigate the bias of , consider animals A and B have no genetic relationship, i.e. A
AB
= 0. If the genetic associative effect is ignored, the variation caused by σ
ad
and will flow into the variance of the residual variance leading to bias of . In order to reduce the bias introduced by σ
ad
and , one possibility is to have and equal to zero, which is unrealistic as mentioned before. Similarly, the bias of estimation of depends on the magnitude of σ
ad
and . It could be zero when the magnitudes of them are equal with opposite sign.
To confirm this, we took the simulation data obtained from the parameter combination of group size 5, correlation 0.5, 324 breeding animals and mixed full sib assignment. The estimation with a model considering only genetic direct effect and random residual was compared to the true value of simulation parameters.