Skip to main content

Multilevel modeling for the analysis of longitudinal blood pressure data in the Framingham Heart Study pedigrees

Abstract

Background

The data arising from a longitudinal familial study have a complex correlation structure that cannot be modeled using classical methods for the analysis of familial data at a single time point.

Methods

To fit the longitudinal systolic blood pressure (SBP) pedigree data arising from the Framingham Heart Study, we proposed to use multilevel modeling. That approach was used to distinguish multiple levels of information with individual repeated measurements (Level 1) being made within individuals (Level 2), and individuals clustered within pedigrees (Level 3). Residuals from the subject-specific and pedigree-specific regression models were summed both for the mean SBP and slope of SBP change over time, in order to define two new outcomes that were then used in a genome-wide linkage analysis.

Results

Evidence for linkage for the two outcomes (mean SBP and slope) was found in several chromosomal regions with a maximum LOD score of 3.6 on chromosome 8 and 3.5 on chromosome 17 for the mean SBP, and 2.5 on chromosome 1 for SBP slope. However, the linkage on chromosome 8 was only detected when the sample was restricted to subjects between age 25 and 75 and with at least four exams (Cohort 1) or 3 exams (Cohort 2).

Discussion

Multilevel modeling is a powerful approach to detect genes involved in complex traits when longitudinal data are available. It allows for complex hierarchical data structure to be taken into account and therefore, a better partitioning of random within-individual variation from other sources of variability (genetic or nongenetic).

Background

The Framingham Heart Study provides long-term repeated measurements of blood pressure and other phenotypes in two large cohorts of related individuals. Longitudinal studies are efficient designs for the investigation of individual changes over time. In the context of familial studies, such designs might be of particular interest to assess the proportion of the trait variability explained by within-individual variation or other sources of variation. However, the data arising from a longitudinal familial study have a complex correlation structure that cannot be modeled using classical methods for the analysis of familial data at a single time point. In this study, we proposed to use multilevel modeling to fit the complex data structure arising from the Framingham Heart Study. Multilevel modeling, also known as hierarchical regression, generalizes ordinary regression modeling to distinguish multiple levels of information in a model [1]. It might be appropriate to model the Framingham Heart Study data that form a natural hierarchy with individual repeated measurements (Level 1) being made within individuals (Level 2), and individuals clustered within pedigrees (Level 3). The use of appropriate random effects at each level allows one to adjust for the influence of a wide variety of correlation structures and to estimate variance, covariance, and correlation which are of particular interest in familial studies. In this paper, multilevel models are first used to fit the repeated systolic blood pressure (SBP) measurements. Residuals from the subject-specific and pedigree-specific regression models were summed both for the mean SBP and slope of SBP change over time, in order to define two new outcomes that were then used in a genome-wide linkage analysis. Both phenotypes are of interest because genes involved in the variation of SBP with time could differ from genes affecting long-term mean SBP.

Methods

Data

The Framingham Heart Study data includes 330 pedigrees originally selected for a genome-scan analysis. The pedigrees consisted of 4692 subjects, of whom 2885 have participated in the Framingham Heart Study. Longitudinal SBP data were analyzed for 25,263 examinations on 2662 individuals. Height, weight, gender, age, and hypertensive treatment information were required but if height was missing, the most recent measurement was imputed. Because there might be important variation in individual SBP measurement among younger and older subjects, we also restricted the sample to individuals aged between 25 and 75 years, as in Levy et al. [2]. The following selection criteria were also defined: 1) There had to be at least 10 years between a subject's initial and final examinations within the age range; 2) at least four examinations within the age range were required for the original cohort and at least three for offspring cohort participants [2]. Data from 24,840 examinations on 2530 individuals were available in the selected sample. For the genome-wide scan analysis, 1702 genotyped individuals were included (394 from the Cohort 1 and 1308 from the Cohort 2).

Multilevel analysis of the longitudinal SBP model

Let the random variable Y ijk denote the SBP measurement at the ith examination for the jth individual in pedigree k. We then assume that Y ijk satisfies the following general multilevel model:

Within-subject model – Level 1

where i = 1,...,21 for Cohort 1 subjects and i = {11, 15, 17, 19, 21} for Cohort 2 subjects. Age ijk , BMI ijk , Treat ijk are the age, body mass index and hypertension treatment (1 for subjects treated and 0 for subjects untreated) at the ith exam for the jth individual in pedigree k, and are the mean values across all exams for the jth individual, and ε ijk are the error components that account for the within-individual variability. The ε ijk are assumed to be normally distributed with mean vector zero and variance-covariance matrix Σ defined by a first-order autoregressive structure. The intercept b0jkrepresents the average SBP for an untreated subject of average age and BMI across all of the subject's examinations. The regression coefficient b1jkis used to model the linear variation of SBP with age. We found that every individual profile could be well approximated by a quadratic function of time, measured by the age at examination. We also tested a cubic effect, but it was not significant when we allowed for the individual's linear time trend to differ in each treatment group (interaction between age and treatment). Random effects were added to reflect the natural heterogeneity in the population. In this model, both the intercept and the linear effect for age were allowed to vary across individuals and the individual-specific regression coefficients (random effects) were defined at the second level:

Subject random-intercept model – Level 2

Subject random-slope model – Level 2

and are the sample means for age and body mass index, Sex and Cohort are two indicator variables, coded 1 for males, 0 for females and 0 for Cohort 1 subjects, 1 for Cohort 2 subjects. The random components u0jkand u1jkmeasure the variation of each individual's mean SBP and slope from their average in pedigree k. The intercept b00krepresents the average SBP in pedigree k for males in Cohort 1 with average age and BMI and the intercept b10krepresents the average slope in pedigree k for males in Cohort 1 with average BMI. To account for the correlation of individuals within a pedigree, these two intercepts were allowed to vary between pedigrees. The random effects at different levels of the model are assumed independent.

Pedigree random-intercept model – Level 3

b00k= β000 + v00k, k = 1,...,N

Pedigree random-slope model – Level 3

The random components v00kand v01kmeasure the variation of each pedigree's mean SBP and mean slope from their average in the whole sample.

Statistical tests in the multilevel model

Analyses were conducted in both the unselected and selected samples and with and without adjustment for BMI. Multilevel models were fitted using SAS PROC MIXED [3]. Parameter estimates are obtained by restricted maximum likelihood estimation (REML). An F-statistic was used to test the significance of the fixed effects with number of degrees of freedom computed using the containment method [4]. The likelihood ratio statistic based on REML likelihoods was used to test the significance of the random effects. The null distribution of this statistic is a mixture of and with equal weights 0.5, where q and q + 1 are the number of random effects estimated under H0 and H1, respectively.

Genome-wide linkage analysis

We used the estimates of the random effects at the subject and pedigree levels to define two new outcomes that were used in the genome-wide linkage analysis. The two outcomes were defined as and , which measure the random variation of each individual's SBP mean and slope, respectively, from the sample average after adjustment for the fixed effects. A third outcome was also defined using the residuals from a sample-wide regression in which each individual's mean SBP (across all exams) was regressed on his mean age (centered), mean BMI (centered), gender and cohort, as in Levy et al.'s paper [2]. Estimation of heritability and two-point linkage analyses were performed on the pedigree data using the variance component models implemented in the SOLAR package [5].

Results

Multivariate analysis of longitudinal SBP

All fixed effects included in the model were highly significant in the subject random slope model (Table 1) except for gender. Most of the SBP variability (316.8 in Model 1a, Table 1) was explained by within-subject (140.8, 44%) and between-subject (146.2, 46%) variability in the mean SBP and to a lesser extent by between-pedigree variability (27.6, <9%). Much less variability was explained by variability in the slope (0.17+0.008, <0.06%). Pedigree effects of mean SBP and SBP slope were more significant when the multilevel analyses were adjusted for body mass index. As shown in Figure 1, the multilevel model fit well the data while the sample-wide regression does not capture all the SBP variability.

Table 1 Estimates of multilevel model fixed effects and random effects variances (± SE) in the selected and unselected samples with or without adjustment for BMI
Figure 1
figure 1

Comparisons of observed SBP and predicted values for the selected sample A, B, D, E are based on the multilevel model. C and F are based on the sample-wide regression for the selected sample [2].

Heritability

Heritability estimates were 54.3% (SE = 3.1) and 55.6% (SE = 3.4) for the mean SBP, 31.9% (SE = 3.5) and 28.9% (SE = 3.5) for SBP slope over time in the unselected and selected samples, respectively. The heritability estimates for the subject-specific residuals from the sample-wide regression of the mean SBP were 47.7% (SE = 3.4) and 49.7% (SE = 3.8) in the unselected and selected samples, respectively.

Genome-wide linkage analysis

Evidence for linkage for the two outcomes (mean SBP and slope) was found in several chromosomal regions with a maximum LOD score of 3.6 on chromosome 8 and 3.5 on chromosome 17 for the mean SBP and 2.5 on chromosome 1 for SBP slope (Table 2). However, linkage on chromosome 8 for the mean SBP was only found in the selected sample. The decrease in LOD score in the unselected sample on chromosome 17 was important in several pedigrees that included individuals with a single SBP measurement, as illustrated in Figure 2. Adjusting the analyses for BMI showed stronger evidence for linkage, which could suggest that BMI is determined by other genetic factors (Table 3). Not adjusting the analysis for treatment effect did not change the results of the mean SBP, but yielded lower LOD scores for SBP slope (Table 3).

Table 2 Results of two-point linkage analysis (LOD scores ≥ 2 are in bold)
Figure 2
figure 2

Observed and predicted SBP profiles for pedigree 32,445 members

Table 3 Results of two-point linkage analysis in the selected sample using residuals from the multilevel model (LOD scores ≥ 2 are in bold)

Discussion

Our study demonstrates the value of multilevel modeling in the search for genetic determinants of complex traits when longitudinal pedigree data are available. For the mean SBP, we were able to replicate the linkage result on chromosome 17 previously reported by Levy et al. [2] and detect a new linkage on chromosome 8 that was not reported before. For SBP slope, we also found suggestive results for linkage for both mean SBP and SBP slope on several other chromosomal regions, including chromosomes 1, 2, 3, 11, and 13. Using residuals from the multilevel model in a genome-wide linkage analysis gave stronger evidence for linkage than using residuals from a sample-wide regression as in the Levy et al.'s paper [2]. This might be because this latter approach does not correctly account for within-individual and between-individual variability. Multilevel modeling, which can take into account the hierarchical structure of the data, may help disentangle the proportion of the trait variability explained by fundamental variation in the mean SBP and in the SBP slope from the proportion explained by random within-individual variability. A more general hierarchical structure could have included a nuclear family level nested within the pedigree level. However, such a multilevel model would be more difficult to fit. In our analysis we only included a fixed cohort effect that could account for differences between generations within a pedigree. Treating the pedigrees as random effects also allowed for between-pedigree heterogeneity in our model, which improved the accuracy of the random effect estimates at the individual level. Although there may be some concern about using a two-stage approach for detecting linkage, other studies based on similar strategies using linear mixed models in simulated data did not report an inflation of type I error for the test of linkage in the context genome-wide linkage analysis [6, 7]. The linkage on chromosome 17 for mean SBP was only found in the selected sample. A important decrease in LOD score (>0.1) in the unselected sample was observed in several pedigrees comprising individuals with a single extreme SBP measurement, as illustrated in Figure 2. This suggests that a single SBP measurement may not provide a reliable characterization for an individual, especially when a familial study of SBP is designed. Adjusting the analyses for BMI showed stronger evidence for linkage, which could suggest that BMI is determined by other genetic factors. No correction was applied to the SBP value of subjects who received a hypertensive treatment. The analyses with the multilevel model were adjusted for treatment effect so that the residuals obtained from this model correspond to the untreated group. Taking into account an interaction between age and treatment in the multilevel model may also have reduced the bias due to treatment effect. However, our linkage results were insensitive to whether the analyses were adjusted for treatment effect. The multilevel modeling approach is also known to be robust to missing data, under the assumption that they are missing at random [4]. Future work could include the development of an integrated approach to perform linkage analysis within the multilevel framework.

References

  1. Leyland AH, Goldstein H, Eds: Multilevel Modelling of Health Statistics. Chichester, John Wiley and Sons. 2001

    Google Scholar 

  2. Levy D, DeStefano AL, Larson MG, O'Donnell CJ, Lifton RP, Gavras H, Cupples LA, Myers RH: Evidence for a gene influencing blood pressure on chromosome 17. Genome scan linkage results for longitudinal blood pressure phenotypes in subjects from the Framingham Heart Study. Hypertension. 2000, 36: 477-483.

    Article  CAS  PubMed  Google Scholar 

  3. Littell R, Milliken G, Stroup W, Wolfinger R: SAS System for mixed models. Cary, NC, SAS Institute, Inc. 1996

    Google Scholar 

  4. Verbeke G, Molenberghs G: Linear Mixed Models in Practice. A SAS-Oriented Approach. New York, Springer. 1997

    Book  Google Scholar 

  5. Almasy L, Blangero J: Multipoint quantitative trait linkage analysis in general pedigrees. Am J Hum Genet. 1998, 62: 1198-1211. 10.1086/301844.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  6. Palmer L, Jacobs K, Scurrah K, Xu X, Horvath S, Weiss S: Genome-wide linkage analysis in a general population sample using sigma 2A random effects (SSARs) fitted by Gibbs sampling. Genet Epidemiol. 2001, 21 (suppl 1): S674-S679.

    PubMed  Google Scholar 

  7. Scurrah K, Tobin T, Burton P: Longitudinal variance components models for systolic blood pressure, fitted using Gibbs sampling. BMC Genetics. 2003, 4 (suppl 1): S25-10.1186/1471-2156-4-S1-S25.

    Article  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgments

This research was partially supported by a project grant from the Network of Centres of Excellence in Mathematics (Canada). SBB is a Senior Investigator of the Canadian Institutes for Health Research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laurent Briollais.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Briollais, L., Tzontcheva, A. & Bull, S. Multilevel modeling for the analysis of longitudinal blood pressure data in the Framingham Heart Study pedigrees. BMC Genet 4 (Suppl 1), S19 (2003). https://doi.org/10.1186/1471-2156-4-S1-S19

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1471-2156-4-S1-S19

Keywords