General approach
Let Y
ij
denote the SBP of the ith subject at the jth study visit, T
ij
be the corresponding age of the subject at the visit, and let X
ij
denote a matrix of time-dependent covariates. We propose using a first-stage model of the form
Y
ij
= a
i
+ b
i
(T
ij
-
) + γ' X
ij
+ e
ij
, (1)
where
is the overall mean age in the sample and e
ij
are residuals, assumed to be independent and normally distributed with mean 0 and variance ω2. The goal of this first-stage model is to estimate the subject-specific intercepts (a
i
) and slopes on age (b
i
), and their corresponding standard errors. We denote these standard errors by s
ai
for intercepts and s
bi
for slopes. Note that the intercepts have the interpretation as the predicted mean Y at age
when all X values are zero. We center any continuous X variables on their corresponding sample means to increase interpretability of the intercepts.
The second-stage model utilizes the first-stage intercepts a
i
and slopes b
i
as continuous phenotype data in a genetic analysis. We first perform segregation analysis to determine the evidence for a Mendelian gene and to estimate the associated model parameters. For analysis of the intercepts, the penetrance model used in the segregation analysis has the form
a
i
= α + βG
i
+ η' X
i
+ e
i
, (2)
where G
i
is a covariate based on an unobserved major gene g
i
, and X
i
is a matrix of time-independent covariates. An analogous model was used for the slopes. The residual e
i
is assumed to be normally distributed with mean 0 and variance (σ2 + s
ai
2), where s
ai
2 is the square of the first-stage standard error of the intercept, and σ2 is the between-subject residual variance to be estimated. Note that this variance expression has the effect of weighting each subject's contribution to the genetic analysis based on the precision (standard error) of their intercept estimate. We thus denote the use of this variance for e
i
as a 'weighted' analysis. Generally speaking, these first-stage standard errors will be smallest for those with many measurements, and with measurements at ages that span the overall average age
. One could also perform an 'unweighted' analysis by assuming that the variance of e
i
was simply σ2, which would treat the intercepts for all subjects as equally informative.
To estimate the parameters of the above model, we maximized the likelihood
where the F indexes family, gF is a vector of unobserved major genotypes, and YF and XF are the trait and covariate data for family F. The parameters Ω = {α, β, η, σ } are the parameters of the penetrance model, qA is the population frequency of the variant allele 'A', and τ = {τAA, τAa, τaa} are the probabilities that a parent with the subscripted genotype transmits an 'A' to their offspring. Computation of the above likelihood requires use of the peeling algorithm [5, 6]. We considered six models in the segregation analysis: four Mendelian models (dominant, recessive, additive, and codominant), a no-major-gene model that included only measured covariates, and a general transmission model. In the general transmission model, τAA, τAa, and τaa were treated as free parameters to be estimated. This general model was compared to the Mendelian models, in which τAA, τAa, and τaa were constrained to their theoretical values of 1.0, 0.5, and 0.0, respectively. Likelihood ratio tests (LRTs) were used to compare the general model to the Mendelian models, and also to the no-major-gene model. We also computed Akaike's Information Criteria (AIC) for each model as -2(log-likelihood at the maximum likelihood estimator (MLE)) + 2(number of model parameters estimated). A lower AIC indicates a more parsimonious model.
Application to the Genetic Analysis Workshop 13 (GAW13) Framingham Heart Study data set
The GAW13 data set of the Framingham Heart Study included a total of 4692 subjects, of which 1213 subjects provided longitudinal observation data from the first cohort, and 1672 subjects from the offspring cohort. The outcome variable of interest in this paper was systolic blood pressure (SBP). A natural log transform was used to linearize the SBP relationship with age; thus Y
ij
in equation (1) is ln(SBP
ij
). Only observations with age in the range 30 to 80 were utilized, to further linearize the relationship between ln(SBP) and age. The average age was
= 53.7. Time-dependent covariates defining X
ij
in equation (1) included body mass index (BMI), calendar year (CY), CY2, hypertension treatment (HRX), CY × HRX, CY × male, CY × cohort, CY × BMI, CY × age, male × age, and BMI × HRX. The continuous variables BMI and CY were centered on their respective sample means, while HRX and male were indicators of treatment status and male sex, respectively. The CY2 term was included to account for observed nonlinearity between SBP and CY. The intercepts from the first-stage model have interpretation as the subject-specific mean ln(SBP) adjusted to a female, untreated person of average age (53.7 years) and BMI (26.3 kg/m2) in calendar year 1969.5. PROC MIXED in SAS, Release 8.2 (SAS Institute, Cary NC), was used to fit the first-stage model and obtain person-specific intercepts and slopes, and their respective standard errors.
A total of 2883 person-specific intercepts (a
i
values) and 2787 person-specific slopes (b
i
values) were obtained from the first-stage analysis. These estimates were used as trait data in the second-stage segregation and linkage analyses. Covariates X
i
in equation (2) included male sex and cohort, the latter an indicator of membership in Cohort 2. We fit the segregation and linkage models using a version of the Genetic Analysis Package (GAP, Epicenter Software, Pasadena, CA), modified by one of the authors (WJG) to utilize s
ai
2 (and s
bi
2) in a weighted analysis. As will be demonstrated below, a Mendelian model was supported for the intercepts, but not for the slopes. We therefore focused our linkage analysis only on the intercepts. We fixed the segregation-model parameters to their MLEs from the weighted analysis, and performed two-point LOD-score linkage analysis to estimate the recombination fraction (θ) between g and each of 399 markers. Allele frequencies at each marker locus were fixed to the values provided with the data. For comparison, we also performed an unweighted linkage analysis, in which a segregation analysis was re-run without standard error weights, and these MLEs then used in linkage analysis.