Suppose that yi denotes a quantitative trait for individual i = 1, ⋯, n, Xi = (Xi1, ⋯, Xiq)′ is a vector of q covariates, and Gi = (Gi1, ⋯, Gip)′ is a vector of p SNPs in a gene for individual i. A LMM is constructed as
$$ {y}_i={\boldsymbol{X}}_i\boldsymbol{\alpha} +{\boldsymbol{G}}_i\boldsymbol{\beta} +{b}_i+{\varepsilon}_i $$
(1)
where α and β are the unknown regression coefficient vectors for the corresponding covariates and SNPs, bi and εi are a random intercept and an error term that are independent with each other. We further assume that the error terms εis are independently distributed, but bis are not. Specifically,
$$ \boldsymbol{b}={\left({b}_1,\dots, {b}_n\right)}^{\prime}\sim \mathcal{N}\left(\mathbf{0},\tau \cdot \boldsymbol{\Psi} \right)\mathrm{and}\kern0.3em \boldsymbol{\varepsilon} ={\left({\upvarepsilon}_1\dots, {\upvarepsilon}_n\right)}^{\prime}\sim \mathcal{N}\left(\mathbf{0},\phi \cdot \boldsymbol{I}\right) $$
(2)
where Ψ is a known n × n genetic relationship matrix, which reflects the genetic relatedness among the subjects in the data. The null hypothesis to be tested for association between the group of the SNPs and the trait is H0 : β = 0.
Fitting (generalized) LMMs can be computationally demanding. However, using penalized quasi-likelihood (PQL) to fit the model enables us to extract the test statistic for score-based tests including the aSPU test [6]. It is known that maximizing PQL is equivalent to maximizing the likelihood for quantitative traits. Specifically, we first need to fit the LMM under the null hypothesis.
$$ {y}_i={\boldsymbol{X}}_i\boldsymbol{\alpha} +{b}_i+{\varepsilon}_i, $$
(3)
from which, the score vector U = (U1, ⋯, Up)′, to be used to construct various gene-level score-based tests, can be expressed as
$$ {U}_j={\sum}_{i=1}^n{G}_{ij}\left(\frac{y_i-\left({\boldsymbol{X}}_i\hat{\boldsymbol{\alpha}}-{\hat{b}}_i\right)}{\hat{\phi}}\right) $$
(4)
The aSPU test statistic can be obtained using the score vector U and its covariance matrix V under the null hypothesis, which can also be written in a closed form. Because the score vector follows asymptotic normal distribution with mean zero under the null hypothesis, one can use the Monte Carlo method to compute p-values. Note that both U and V depend only on the null model (3), which provides computational efficiency when the number of tests is large as in a genome-wide scan. We can use an R package GMMAT to derive U and V [7].
We briefly introduce the idea of the aSPU test here. All score-based association tests require U and V, and each nonadaptive test has its own advantages and disadvantages. For example, consider these 2 cases: (a) every SNP encoded in a gene is associated with an equal effect size and direction, and (b) only one or a small proportion of the SNPs are associated. The burden test, which takes \( {\sum}_{j=1}^p{U}_j \) as a test statistic, is desired in the first case, but it will lose power in the second case. On the other hand, the UminP test, which takes max{|U1|, ⋯, |Up|} as a test statistic when the variances of the score elements are the same, is advantageous in the second case but not in the first case. Thus, applying a single and nonadaptive score-based test might not be powerful in gene-level analysis. The aSPU test offers a way to combine various score-based tests; it is based on a class of the sum of powered score (SPU) tests indexed by a positive integer γ. Specifically, the SPU(γ) test statistic is.
$$ {T}_{SPU\left(\gamma \right)}={\sum}_{j=1}^p{U}_j^{\gamma}\mathrm{and}\;{T}_{SPU\left(\infty \right)}=\max \left\{\left|{U}_1\right|,\dots, \left|{U}_p\right|\right\} $$
(5)
It is easy to see that the burden test and the sum of squared score (SSU) test are equivalent to the SPU(1) and SPU(2) tests respectively. It was also shown that SPU(2) is equivalent to sequence kernel association test (SKAT) with the linear kernel and to Multivariate Distance Matrix Regression (MDMR) with the Euclidean distance (under the framework of LMM) [8]. Furthermore, assuming the equal variance of the score elements, the UminP test is equal to SPU test with γ = ∞. One can treat γ as a factor that decides the weight on each score element. The aSPU test uses the minimum p value of the SPU tests as the test statistic, which provides a general data-adaptive method to test for associations. The set of γ ∈ {1, 2, ⋯, 8, ∞} was proposed by Pan et al. based on experiences [2].