Genetic analyses of longitudinal phenotype data: a comparison of univariate methods and a multivariate approach

Yang, Qiong; Chazaro, Irmarie; Cui, Jing; Guo, Chao-Yu; Demissie, Serkalem; Larson, Martin; Atwood, Larry D; Cupples, L Adrienne; DeStefano, Anita L

doi:10.1186/1471-2156-4-S1-S29

Volume 4 Supplement 1

Genetic Analysis Workshop 13: Analysis of Longitudinal Family Data for Complex Diseases and Related Risk Factors

Proceedings
Open access
Published: 31 December 2003

Genetic analyses of longitudinal phenotype data: a comparison of univariate methods and a multivariate approach

Qiong Yang^1,2,
Irmarie Chazaro^1,3,
Jing Cui⁴,
Chao-Yu Guo¹,
Serkalem Demissie¹,
Martin Larson³,
Larry D Atwood^1,2,
L Adrienne Cupples¹ &
…
Anita L DeStefano^1,2

BMC Genetics volume 4, Article number: S29 (2003) Cite this article

1355 Accesses
7 Citations
Metrics details

Abstract

Background

We explored three approaches to heritability and linkage analyses of longitudinal total cholesterol levels (CHOL) in the Genetic Analysis Workshop 13 simulated data without knowing the answers. The first two were univariate approaches and used 1) baseline measure at exam one or 2) summary measures such as mean and slope from multiple exams. The third method was a multivariate approach that directly models multiple measurements on a subject. A variance components model (SOLAR) was employed in the univariate approaches. A mixed regression model with polynomials was employed in the multivariate approach and implemented in SAS/IML.

Results

Using the baseline measure at exam 1, we detected all baseline or slope genes contributing a substantial amount (0.08) of variance (LOD > 3). Compared to the baseline measure, the mean measures yielded slightly higher LOD at the slope genes, and a lower LOD at the baseline genes. The slope measure produced a somewhat lower LOD for the slope gene than did the mean measure. Descriptive information on the pattern of changes in gene effects with age was estimated for three linked loci by the third approach.

Conclusion

We found simple univariate methods may be effective to detect genes affecting longitudinal phenotypes but may not fully reveal temporal trends in gene effects. The relative efficiency of the univariate methods to detect genes depends heavily on the underlying model. Compared with the univariate approaches, the multivariate approach provided more information on temporal trends in gene effects at the cost of more complicated modelling and more intense computations.

Background

In genetic studies, subjects may be measured repeatedly over a period of time to monitor how the quantitative traits change with age (or other time measure). These types of data offer great opportunity to evaluate whether a gene's influence on traits changes with age. Univariate variance components approaches that use a single measurement or summary statistics such as mean and slopes are easy to implement and the results have a straightforward interpretation. However, the univariate approaches may not be extracting the full information content of the data and may not provide information about differing genetic effects with age. Multivariate variance components approaches that directly model all measurements on one subject by estimating covariance structures within or between subjects may better utilize the information in the data set and provide age-specific estimates of genetic effects at the cost of greater computational burden and more complex interpretation of the linkage information.

In this work, we compared three approaches (two univariate and one multivariate) to analyze repeated measures in genetic studies. The first two approaches used univariate phenotypes that were either based on a single exam measurement or summaries from multiple exam measurements. Variance components models for univariate phenotypes were applied [1]. The third method used multiple available measurements on each subject as a multivariate phenotype. We modelled the random genetic and subject-specific random environmental effects as orthogonal polynomials of age in a mixed regression model and implemented it in SAS/IML.

We applied the three approaches to analyze total cholesterol levels (CHOL) in replicate 8 of the Genetic Analysis Workshop 13 simulated data without prior knowledge of the answers.

Methods

Univariate Approaches

Baseline Measure

Baseline measure of CHOL at Exam 1 of both cohorts was used as the dependent variable in variance components model analyses implemented in SOLAR [1]. Total heritability (h²) was estimated as the proportion of the total phenotypic variance due to the additive polygenic variance. SOLAR calculates a LOD score by taking log₁₀ of the ratio of the maximum likelihood of a linkage model (containing a quantitative trait loci (QTL) variance and a residual polygenic variance component) to that of a purely polygenic model. The QTL h² was computed as the proportion of the QTL variance to the total phenotypic variance. In multipoint analyses, linkage to adjacent markers was also considered to evaluate the linkage to the current marker using a regression approach [1]. Covariates including gender, age, systolic blood pressure, and height were adjusted for in regression models prior to the heritability and linkage analyses.

Summary Measures

In calculating summary measures of the repeated CHOL measurements, we looked at three definitions of the mean by imposing restrictions on the selection of the subjects and their measurements. Definition 1 (D1) required that subjects had CHOL measured for at least three exams. This definition resulted in subjects with a wide range of observations used, from 3 to 15. We were concerned that the different number of exams, and hence different standard error associated with the mean measure, would affect the genetic analysis and explored definitions in which each summary measure was based on a similar number of exams. To obtain, approximately, an equal number of exams for both cohorts, definition 2 (D2) included only the first five exams of both cohorts, and all subjects had to have CHOL measured for at least two exams. For D2, Cohort 1 and 2 members had measures taken at approximately the same age (45 years). To obtain measures taken at approximately the same chronological time in the two cohorts, definition 3 (D3) included only exams 10, 14, 15, and 20 for Cohort 1 and exams 1–5 for Cohort 2, and required all subjects have CHOL measured for at least two exams. A slope of CHOL versus age was computed for each individual satisfying D1. Heritability and linkage analyses were conducted in the same way as for the baseline measure.

Multivariate Approach

We set up a mixed regression models as follows

y_ij= X_ijβ + g_ij+ r_ij+ ε_ij,

where y_ijis the CHOL at the age j for subject i, X_ijand β are vectors of covariates and coefficients of fixed effects, g_ijand r_ijare subject-specific additive genetic and environmental effects (i.e., repeated measurement effects) respectively, and ε_ijis the residual environmental effect of subject i. To allow age-varying effects, g and r are modelled by Legendre polynomials similar to the approach in Meyer [2]:

where {α_im| m = 0, ..., k_A- 1} ~ N(0, ∑_α) and {γ_im| m = 0, ..., k_R- 1} ~ N(0, ∑_γ) are random regression coefficients of additive genetic and environmental effects for subjects i, φ_m() is the m^th Legendre polynomial [3] evaluated at (which is age j standardized to the interval [-1,1] by the age range observed in the data), k_Aand k_Rare the order of the corresponding polynomials. The covariance between two observations of two subjects is then equal to equation (1), assuming g and r independent of each other,

It can be further simplified by assuming Cov(α_im), α_i'l= 2Φ_ii'Cov(α_m), α_l) and Cov(γ_im, γ_{i'l )}= 2δ_ii'Cov(γ_m, γ_l), where Φ_ii, is the kinship coefficient, δ_ii'= 1, if i = i' and 0 otherwise, and , if i = i' and j = j', and 0 otherwise. The total h² at a standardized age t* is therefore

We extended the model to incorporate the effect of a QTL by adding a Legendre polynomial with random coefficients η_m, m = 1, ..., k_Q, ~ N(0, ∑_η). The covariance contribution from this QTL to equation (1), assuming its independence of g and r, is , where π_ii'is the multipoint shared by the two subjects at the QTL. Then the QTL h² due to this locus is

We utilized kinship coefficients and multipoint identity by descent (IBD) computed in SOLAR and read these values into a matrix using SAS/IML. The other parameters

were estimated via a nonlinear maximization procedure NLPQN in SAS/IML [4].

Since computational load increased quickly with the number of observed ages, we divided the 70 distinct ages (ranging from 20 to 93) into five intervals: below age 30, with 10-year increments from age 30 to 60 and greater than 60. Order of polynomials was set as 2 (i.e., k_A= k_R= k_Q= 3) for both polygenic and subject-specific environmental effects and 1 for QTL effects. For those individuals who had more than one exam in an age interval, the average phenotype and covariates measured during that age interval were used in the analyses. Since it was time consuming to carry out genome-wide analyses, we only implemented this analysis at the three linked loci (S7, B30, B32) found in the univariate analyses.

Results

Univariate Approaches

We compared our results to the simulating model in Table 1. Since there was no substantial difference in heritability or multipoint LOD scores between the three definitions of means, we only presented the results for mean D2. The total h² of baseline, mean D2, and slope measures were estimated as 0.55, 0.60, and 0.42, respectively. Using the baseline measure, we detected (LOD > 3.0) one of the three slope genes, S7 (QTL h² = 0.20), and three of the four baseline genes, B30 (QTL h² = 0.27), B31 (QTL h² = 0.21), and B32 (QTL h² = 0.30). Using mean measure D2, we were able to detect the slope gene S7 (QTL h² = 0.33) and the baseline gene B32 (QTL h² = 0.26). Using the slope measure, only slope gene S7 (QTL h² = 0.33) was detected. There were one, two, and one false positives for the Exam 1, mean D2, and slope measures, respectively, and the LOD scores of the false positives were between 3.6 and 4.3 (Table 1).

Table 1 Linkage analyses results for mean, slope and baseline measure at Exam 1

Full size table

Multivariate Approach

Among the 2701 subjects who had at least one measurement of CHOL, there were 70, 670, 1950, and 10 subjects who had one to four repeated measurements respectively, taken over the five age intervals. The estimated total h² was 0.57, 0.59, 0.60, 0.59, and 0.55 in the five age groups. The QTL h² for S7 ranged from 0.35 to 0.56. The QTL h² for B30 and B32 ranged from 0.39 to 0.45 and 0.35 to 0.49, respectively. The total and QTL h² estimates were presented in Figure 1. The total h² and the QTL h² curves of B30 and B32 were relatively flat and slightly declining with age. The slope gene, S7, had a monotonic increase in its QTL h² with age.

Discussion

We have presented two univariate and one multivariate approach to analyze longitudinal phenotype data. The univariate approaches were successful in identifying genes for this generating model. The multivariate approach provided additional descriptive information on changes in gene effects with age.

We found the relative efficiency in the first two approaches (baseline or summary measures) depended heavily on the generating model. Since CHOL were generated using a basic linear model of age (CHOL = Chol_base + Chol_slope * age + random_error), using baseline measure at Exam 1 in which the age of subjects spanned between 20 and 85 enabled us to detect all slope and baseline CHOL genes except three genes with a variance <0.08. The mean measure seemed to contain more noise than Exam 1 data for detecting the baseline genes, but produced a slightly higher LOD than the slope measure to detect slope genes. This observation was confirmed in an experiment: when there is considerable residual random error in the trait, the slope measure could be inferior to the mean measure in power to detect a slope gene [5].

The results of the three definitions of means were not very different for this generating model, though they were designed to avoid possible shortcomings in the other definitions (See Methods section). In practice, one definition may be better than the others depending on the characteristics of the data.

The total h² estimations from the multivariate approach did not vary much with age and were close to those estimated from the univariate approaches using Exam 1 or mean D2. The QTL h² for B30 and B32 estimated from multivariate analyses were higher than those obtained from univariate analyses, especially at younger ages. The difference at younger ages may be caused by more aged subjects in Exam 1 and mean D2 measures that resulted in lower proportion of total phenotypic variance (increasing with age) explained by the baseline genes for this generating model. The QTL h² for slope gene S7 estimated using slope measure was close to that estimated using multivariate measure for those aged 30 or less. In theory, QTL variance for S7 from the multivariate measure should be approximately equal to that from slope measures multiplied by age² for this generating model, which explains the monotonic increase of QTL h² for S7 observed from the multivariate approach.

Compared with the univariate approaches, the multivariate approach provided more information regarding the temporal trend of gene effects during aging. We were not able to tell which gene(s) affected the baseline or slope using the univariate approaches, since the univariate measures overlapped with each other in the ability to detect slope and baseline genes. Using the third approach, the QTL h² for the two baseline genes were nearly flat and slightly declining with age, but that of the slope gene showed a clear trend of monotonic increase with age, which distinguished the slope gene from the baseline genes.

In conclusion, we found univariate approaches were capable of discovering some of the important trait genes with simple modelling and feasible computational load. The multivariate approaches can provide additional information on age-varying effects of genes but generally involves heavy computation and complex modelling. More work is needed to further develop the multivariate approach in areas such as a sensible test of significance. Nevertheless, the multivariate approach shows promise for genetic analyses of longitudinal measures in linkage studies.

References

Almasy L, Blangero J: Multipoint quantitative-trait linkage analysisin general pedigrees. Am J Hum Genet. 1998, 62: 1198-1211. 10.1086/301844.
Article PubMed Central CAS PubMed Google Scholar
Meyer K: Estimating covariance functions for longitudinal data usinga random regression model. Genet Select Evol. 1998, 30: 221-240.
Article Google Scholar
Abramowitz M, Stegun IA: Handbook of Mathematical Functions. New York, Dover. 1965
Google Scholar
SAS Institute Inc.: SAS OnlineDoc, Version 8. Cary, NC, SAS Institute, Inc. 2000
Google Scholar
Gauderman WJ, Macgregor S, Briollais L, Scurrah K, Tobin M, Park T, Wang D, Rao S, John S, Bull S: Longitudinal data analysis in pedigree studies. Genet Epidemiol.

Download references

Acknowledgments

This work was supported in part by NIH grant P50-HL55001 (to ALD and CJ).

Author information

Authors and Affiliations

Departments of Biostatistics, Boston University, Boston, Massachusetts, USA
Qiong Yang, Irmarie Chazaro, Chao-Yu Guo, Serkalem Demissie, Larry D Atwood, L Adrienne Cupples & Anita L DeStefano
Departments of Neurology, Boston University, Boston, Massachusetts, USA
Qiong Yang, Larry D Atwood & Anita L DeStefano
Departments of Mathematics and Statistics, Boston University, Boston, Massachusetts, USA
Irmarie Chazaro & Martin Larson
Departments of Medicine, Boston University, Boston, Massachusetts, USA
Jing Cui

Authors

Qiong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Irmarie Chazaro
View author publications
You can also search for this author in PubMed Google Scholar
Jing Cui
View author publications
You can also search for this author in PubMed Google Scholar
Chao-Yu Guo
View author publications
You can also search for this author in PubMed Google Scholar
Serkalem Demissie
View author publications
You can also search for this author in PubMed Google Scholar
Martin Larson
View author publications
You can also search for this author in PubMed Google Scholar
Larry D Atwood
View author publications
You can also search for this author in PubMed Google Scholar
L Adrienne Cupples
View author publications
You can also search for this author in PubMed Google Scholar
Anita L DeStefano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiong Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Q., Chazaro, I., Cui, J. et al. Genetic analyses of longitudinal phenotype data: a comparison of univariate methods and a multivariate approach. BMC Genet 4 (Suppl 1), S29 (2003). https://doi.org/10.1186/1471-2156-4-S1-S29

Download citation

Published: 31 December 2003
DOI: https://doi.org/10.1186/1471-2156-4-S1-S29

Genetic Analysis Workshop 13: Analysis of Longitudinal Family Data for Complex Diseases and Related Risk Factors

Genetic analyses of longitudinal phenotype data: a comparison of univariate methods and a multivariate approach