 Methodology article
 Open Access
 Published:
A statistical measure for the skewness of X chromosome inactivation for quantitative traits and its application to the MCTFR data
BMC Genomic Data volume 22, Article number: 24 (2021)
Abstract
Background
X chromosome inactivation (XCI) is that one of two chromosomes in mammalian females is silenced during early development of embryos. There has been a statistical measure for the degree of the skewness of XCI for qualitative traits. However, no method is available for such task at quantitative trait loci.
Results
In this article, we extend the existing statistical measure for the skewness of XCI for qualitative traits, and the likelihood ratio, Fieller’s and delta methods for constructing the corresponding confidence intervals, and make them accommodate quantitative traits. The proposed measure is a ratio of two linear regression coefficients when association exists. Noting that XCI may cause variance heterogeneity of the traits across different genotypes in females, we obtain the point estimate and confidence intervals of the measure by incorporating such information. The hypothesis testing of the proposed methods is also investigated. We conduct extensive simulation studies to assess the performance of the proposed methods. Simulation results demonstrate that the median of the point estimates of the measure is very close to the prespecified true value. The likelihood ratio and Fieller’s methods control the size well, and have the similar test power and accurate coverage probability, which perform better than the delta method. So far, we are not aware of any association study for the Xchromosomal loci in the Minnesota Center for Twin and Family Research data. So, we apply our proposed methods to these data for their practical use and find that only the rs792959 locus, which is simultaneously associated with the illicit drug composite score and behavioral disinhibition composite score, may undergo XCI skewing. However, this needs to be confirmed by molecular genetics.
Conclusions
We recommend the Fieller’s method in practical use because it is a noniterative procedure and has the similar performance to the likelihood ratio method.
Background
In genomewide association study (GWAS), many human diseases have been found to be associated with Xchromosomal genes, such as autoimmune diseases [1, 2], asthma [3], Duchenne muscular dystrophy [4, 5], adrenoleukodystrophy [6], WiskottAldrich syndrome [7] and some cancers [8,9,10,11,12]. However, development of methods for identifying association with genetic variants on X chromosome still lags behind that on autosomes due to the unique inheritance pattern of X chromosome [13]. The number of X chromosomes is different between males and females in mammals. There are two copies of X chromosome in mammalian females, one of which is paternal and the other is maternal, while mammalian males have only one maternal X chromosome. To compensate for this X chromosome dosage difference between sexes, one of two chromosomes in females is silenced during the early development of embryos, which is called X chromosome inactivation (XCI) [14,15,16,17,18]. Random XCI (XCIR) is a process that either the paternal or the maternal allele at an Xchromosomal locus is randomly chosen to be silenced in all cells, which is common in most females [19]. However, skewed XCI (XCIS) is also observed in a proportion of females, which is a nonrandom process and is defined as the observation of inactivation of the same allele in more than 75% cells [9, 20,21,22,23]. In addition, not all of the Xlinked genes undergo XCI and the pseudoautosomal region on both sex chromosomes does not require dosage compensation. In humans, over 15% Xlinked genes have been shown to escape from XCI (XCIE) [24, 25].
In population genetics, there has been an increasing interest in the incorporation of the information on XCI into association analysis for qualitative traits [26,27,28,29,30] and quantitative traits [31,32,33,34], which may greatly improve the test power. For qualitative traits, Clayton [26] first took account of XCI in detecting the association between Xchromosomal markers and diseases by regarding males as homozygous females. However, the Clayton’s method only considers the XCIR pattern and does not incorporate the XCIS and XCIE patterns. So, Wang et al. [27] developed a resamplingbased approach for casecontrol data simultaneously combining the information on three XCI patterns (XCIR, XCIS and XCIE) by coding three genotypes in females as 0, γ and 2, where γ is an unknown parameter, takes possible values between 0 and 2, and can be used to measure the degree of the skewness of XCI. For Xlinked quantitative trait loci (QTL), Zhang et al. [31] proposed a familybased association test, where the quantitative trait under study is required to follow a normal distribution. Although the involved variances of the trait value for males and females are assumed to be different, those for three genotypes in females are fixed to be the same. However, according to Ma et al. [32], XCI may lead to variance heterogeneity of the traits across different genotypes in females and the variance of the trait in heterozygous females is generally higher than that in homozygous females. So, based on only unrelated females, Ma et al. [32] suggested a test for Xlinked association via inflated variance in heterozygous females, a weighted test for Xlinked association which considers different variances, and the combined test of these two tests using the Stouffer’s Zscore method. Gao et al. [33] further developed the XWAS software toolset to facilitate GWAS on X chromosome, which includes the three test statistics proposed by Ma et al. [32]. Deng et al. [34] put forward a sexspecific Levene’s test, and a generalized Levene’s test based on a twostage regression model accounting for sexspecific mean and variance effects, to test for association. The original Levene’s test is robust to certain types of nonnormal distribution, particularly when data are nonnormal but symmetric [34], while the generalized Levene’s test may not. It should be noted that the above methods for QTL only incorporate the XCIR and XCIE patterns and do not consider the XCIS pattern. On the other hand, Wang et al. [35] has recently proposed a statistical measure available for the degree of XCI skewing for casecontrol data and developed three methods (likelihood ratio (LR), Fieller’s and delta) to construct the corresponding confidence intervals (CIs). However, they are only applicable to qualitative traits and are not suitable for quantitative traits.
Therefore, in this article, we first extend the existing statistical measure for the degree of XCI skewing (i.e., γ) for qualitative traits [35] and make it accommodate quantitative traits. It is shown that the proposed γ is a ratio of two linear regression coefficients in the presence of association between the traits under study and the genotypes. We estimate the linear regression coefficients by incorporating the information on the variance heterogeneity across different genotypes in females and then obtain the point estimate of γ. Then, we extend the existing LR, Fieller’s and delta methods for constructing the CIs of γ and make them suitable for quantitative traits. The simulation studies under various simulation settings are conducted to assess the performance of the proposed methods. We also apply the proposed methods to the Minnesota Center for Twin and Family Research (MCTFR) data for their practical use. Note that so far, we are not aware of any association study for the Xchromosomal markers in the MCTFR data, although there have been some previous association studies which only focused on autosomal markers [36,37,38,39,40,41,42,43].
Results
Sizes and powers
The empirical type I error rates of the corresponding tests for the proposed LR, Fieller’s and delta methods based on the sample size n = 1,000 and 2,000 are respectively given in Tables 1 and 2, where the additive effect size a = 0.1 and 0.3, the allele frequency p = 0.1 and 0.3, and the inbreeding coefficient ρ = 0. Under all the situations considered, the sizes of the proposed LR and Fieller’s methods stay close to the prespecified nominal level of 5%, irrespective of the values of n, a and p, which verifies their validity. However, the delta method has the inflated or conservative type I error rates in most scenarios. Additional file 1: Tables S1 and S2 show the sizes for the proposed LR, Fieller’s and delta methods with ρ = 0.05 based on the sample size n = 1,000 and 2,000, respectively, which are similar to those in Tables 1 and 2. This demonstrates that the HardyWeinberg disequilibrium almost has no effect on the sizes.
Note that the delta method does not control the sizes well. So, we only simulate the powers of the LR and Fieller’s methods. Figures 1, 2 and 3 display the estimated powers for the LR and Fieller’s methods against γ (γ ≠ γ_{0}) with a = 0.1 and 0.3, p = 0.1 and 0.3, n = 1,000, and ρ = 0 when γ_{0} = 0, 1 and 2, respectively. Figures 4, 5 and 6 plot the corresponding estimated powers with a = 0.1 and 0.3, p = 0.1 and 0.3, n = 2,000, and ρ = 0 when γ_{0} = 0, 1 and 2, respectively. The other power results are shown in Additional file 1: Figures S1S14. It can be seen from these figures that the power of the LR method is almost the same as that of the Fieller’s method. The powers of the LR and Fieller’s methods gradually but asymmetrically become larger with ∣γ − γ_{0}∣ increasing. When other parameters are unchanged, the powers with p = 0.3 are bigger than those with p = 0.1 (e.g., Fig. 1b vs. Fig. 1a, Fig. 1d vs. Fig. 1c). However, note that in \( {\sigma}_1^2=\theta \left(1\theta \right){a}^2+1.1 \), θ(1 − θ)a^{2} attains its maximum 0.25a^{2} when θ = 0.5 (i.e., γ = 1). The corresponding values of \( {\sigma}_1^2 \) for a = 0.1 and 0.3 are 1.1025 and 1.1225, respectively, which are not so different from each other. Furthermore, when γ = 0 or 2, θ(1 − θ)a^{2} = 0, which is not related to the value of a. So, the powers with a = 0.1 and those with a = 0.3 are close to each other (e.g., Fig. 1a vs. Fig. 1c, Fig. 1b vs. Fig. 1d). When the sample size n is changed from 1,000 to 2,000, the LR and Fieller’s methods are more powerful (e.g., Fig. 4 vs. Fig. 1). Finally, we find that the HardyWeinberg disequilibrium has little influence on the power results, e.g., by comparing Fig. 1 (ρ = 0) with Additional file 1: Figure S3 (ρ = 0.05).
Median of point estimate and statistical properties of confidence intervals
Tables 3 and 4 show the estimated median of the point estimates of γ, CP, ML, MR, ML/(ML + MR), DP and EP of the twosided 95% CIs of γ for the LR, Fieller’s and delta methods against γ, with a = 0.1 and 0.3, p = 0.1 and 0.3, and ρ = 0 based on 10,000 replicates for n = 1,000 and 2,000, respectively. From these two tables, we find that in all the cases considered, the median of \( \hat{\gamma} \) maintains very close to the true value of γ. As for the CI, the LR and Fieller’s methods have similar performance in the CP and the CPs of both methods are controlled around 95%, regardless of the values of a, p, γ and n. However, the CP of the delta method is underestimated or overestimated in most of the considered situations. The values of the ML/(ML + MR) for the LR and Fieller’s methods generally stay close to 0.5, except for the cases of p = 0.1 and n = 1,000, and the situations of γ = 0 and 2, while the ML/(ML + MR) of the delta method always gets far way from 0.5. This indicates that the LR and Fieller’s methods achieve more balance between ML and MR than the delta method. The LR and Fieller’s methods have comparable performance in the DP and EP. The values of the DP of both methods are zero or close to zero, except for p = 0.1 and γ = 0, which is indicative of few discontinuous CIs to occur. However, the EP results of the LR and Fieller’s methods show that there still exist a few CIs which are empty sets or reduced to be a point. On the other hand, the DP and EP of the delta method are zero for all the simulation settings. This is because the CI based on the delta method is always bounded and is a continuous interval. The ML/(ML + MR), DP and EP of the LR and Fieller’s methods appear not to be greatly affected by the values of a (0.1 or 0.3), while the LR and Fieller’s methods perform worse in the ML/(ML + MR) and the DP when p = 0.1, compared to those with p = 0.3. When the sample size increases from 1,000 (Table 3) to 2,000 (Table 4), the LR and Fieller’s methods have more balance of two tail errors and the values of the DP for p = 0.1 and γ = 0 are less. When γ= 0.5, 1 and 1.5, the values of the EP of both methods with p = 0.3 are less than those with p = 0.1, and the LR and Fieller’s methods with n = 2,000 have smaller EP values than n = 1,000. However, when γ= 0 and 2, the corresponding values of the EP with p = 0.3 are a little larger than those with p = 0.1 and the values of the EP with n = 2,000 are a little bigger than those with n = 1,000. This may be because γ= 0 and 2 are the endpoints of the interval [0, 2], which are the extreme cases. The corresponding results of the median of \( \hat{\gamma} \), CP, ML, MR, ML/(ML + MR), DP and EP of the 95% CIs of γ for the LR, Fieller’s and delta methods with ρ = 0.05 for n = 1,000 and 2,000 are given in Additional file 1: Tables S3 and S4, respectively. By comparing Table 3 with Additional file 1: Table S3 (or comparing Table 4 with Additional file 1: Table S4), we can see that the results in both tables are similar to each other, which means that the HardyWeinberg disequilibrium has no great effect on the point estimation and the interval estimation of γ.
Application to MCTFR data
We applied the proposed LR, Fieller’s and delta methods to the data from the MCTFR GWAS of Behavioral Disinhibition for their practical use, and considered the following five quantitative traits: the nicotine composite score, alcohol consumption composite score, alcohol dependence composite score (DEP), illicit drug composite score (DRG) and behavioral disinhibition composite score (BD). The MCTFR data are made available for download from the database of Genotypes and Phenotypes (accession number: phs000620.v1.p1). In the MCTFR data, there are 2183 families (7377 subjects consisting of 3546 males and 3831 females), including 182 families with 1 member (182 subjects), 290 families with 2 members (580 subjects), 294 families with 3 members (882 subjects), 1352 families with 4 members (5408 subjects), and 65 families with 5 members (325 subjects). Among them, nuclear families are composed of the parents and two offspring who are monozygotic twins, full biological nontwin siblings, adopted siblings and mixed siblings with 1 biological offspring and 1 adopted offspring. Figure 7 shows more details of the family structure in the MCTFR data. Twelve thousand three hundred fiftyfour single nucleotide polymorphisms (SNPs) on the X chromosome were genotyped. Note that our proposed methods are applicable in the presence of association between the SNPs and the quantitative traits of interest. So, we first conducted the association analysis for each locus and each trait. When only analyzing a single trait for all the 12,354 SNPs, the significance level was set to be α^{′} = 0.05/12,354 = 4.047 × 10^{−6} based on Bonferroni correction. When simultaneously analyzing multiple traits, Deng et al. [34] and McGue et al. [41] fixed the significance level at 1 × 10^{−3} for their association analysis. Therefore, in this application, we also used this significance level for the association study when simultaneously considering multiple traits. Then, we calculated the point estimate and the corresponding CIs of the skewness of XCI at the 95% confidence level for all the SNPs which are associated with a single trait at the 4.047 × 10^{−6} level or are simultaneously associated with two or more traits at the 1 × 10^{−3} level. However, we found that all these traits, and the transformed traits (e.g., log(y + 1)) do not satisfy the normality assumption. As such, we used the existing Levene’s test [34] to detect the association between the SNPs and the traits, which is robust to certain types of nonnormal distribution.
The following quality control rules are used to filter the data. First, note that the proposed three methods for the interval estimation of γ only utilize unrelated females. On the other hand, although the adopted offspring in the nuclear families are biologically independent of their adopted parents, they might come from a subpopulation which is different from that of their parents. So, we deleted all the males in the data and all the offspring in the nuclear families, including the biological offspring and the adopted offspring. Second, genotyped female individuals with missing genotype rate over 10% were excluded. Third, the SNPs with missing genotype rate over 10% were deleted. Finally, we applied the PLINK software to carry out the HWE tests for SNPs [44] and the significance level is set to be 1 × 10^{−4} [45]. The SNPs with the minor allele frequency (MAF) less than 5% or those out of HWE were also excluded. As such, a total of 1955 unrelated females and 11,355 SNPs were included in this application.
The Levene’s test identified one SNP (rs17261621) which is only associated with the DRG trait at the 4.047 × 10^{−6} level, two SNPs (rs792959 and rs17261621) which are associated with both the DRG and BD traits and three SNPs (rs4825722, rs4825726 and rs2196260) which are associated with both the DEP and BD traits at the 1 × 10^{−3} level. The corresponding P values of the Levene’s test and the HWE test together with the position, the MAF, the point estimates and the CIs of γ based on the LR, Fieller’s and delta methods for these five SNPs are given in Table 5. For the DRG trait and the rs792959 locus, the point estimate of γ, and the 95% CIs of the LR, Fieller’s and delta methods are 2, (1.0294, 2], (1.0293, 2] and [0, 2], respectively. For the BD trait and the rs792959 locus, the point estimate of γ, and the corresponding 95% CIs are 2, (1.0306, 2], (1.0304, 2] and [0, 2], respectively. The CIs of the LR and Fieller’s methods for the DRG and BD traits are very similar and do not contain 1. Thus, \( \hat{\gamma} \) being 2 indicates that at rs792959, 100% (2/2) of cells in heterozygous females have allele G active, and 0% of cells express allele A, which demonstrates the XCIS towards allele G. However, the CIs of the delta method at rs792959 contain 1 (i.e., XCIR). The conclusions drawn from the LR and Fieller’s methods here are similar to those drawn from our simulation study. However, the truncated point estimate \( \hat{\gamma} \) is 2, which is the right endpoint of the interval [0, 2]. This may be because the proposed LR and Fieller’s methods require that the traits under study follow a normal distribution, while the DRG and BD traits are not normally distributed. Further, all the CIs for the other four SNPs contain 1, indicating random XCI. Particularly, for the BD trait and the rs4825722 locus, the CIs of the LR, Fieller’s and delta methods are [0, 2], which provides no information on the XCI pattern.
Discussion
In this article, we extended the existing statistical measure for the degree of XCI skewing (i.e., γ) and the existing LR, Fieller’s and delta methods for constructing the CIs of γ for qualitative traits [35], and made them suitable for quantitative traits. The proposed γ is a ratio of two linear regression coefficients in the presence of association between the traits under study and the genotypes. According to Ma et al. [32], XCI may cause variance heterogeneity of the traits across different genotypes in females. As such, we estimated the linear regression coefficients by incorporating the information on the variance heterogeneity and then obtained the point estimate of γ. The Fieller’s and delta methods for calculating the CIs are simple and noniterative procedures, while the LR method is an iterative one which needs more computing time. On the other hand, the hypothesis testing of the LR, Fieller’s and delta methods was also investigated. We conducted extensive simulation studies (two different values of additive effect, two groups of allele frequencies, five different values of γ, two different sample sizes, and two different values of inbreeding coefficient) to assess the validity of the proposed methods. Simulation results demonstrate that the median of the point estimates of γ is very close to the prespecified true value of γ. The LR and Fieller’s methods have similar performance in the CP, ML/(ML + MR), DP and EP. The CPs of both methods are controlled around 95% for all the simulated scenarios, and the values of the ML/(ML + MR) for both methods generally maintain close to 0.5, except for the cases of p = 0.1 and n = 1,000, and the situations of γ = 0 and 2. Besides, both methods perform better than the delta method in the CP and ML/(ML + MR). On the other hand, the LR and Fieller’s methods control the size well and almost have the same test powers. However, the type I error rate of the delta method is inflated or conservative under most simulation settings. This may be because the distribution of the point estimate \( \hat{\gamma} \) is asymmetric after being cut off by 0 and 2, and then \( \frac{\hat{\gamma}{\gamma}_0}{\sqrt{\hat{\mathrm{Var}}\left(\hat{\gamma}\right)}}\sim N\left(0,1\right) \) is not so strictly correct. Another possible reason why the delta method performs so poorly is that the first order Taylor expansion of \( \hat{\gamma} \) does not suffice. To investigate the performance of the delta method with higher order Taylor expansion, we used the second order Taylor expansion of \( \hat{\gamma} \) to calculate the asymptotic variance of \( \hat{\gamma} \), which can be implemented in the “propagate” package in R software [46]. However, most of the estimated type I error rates for the delta method are still inflated or conservative, even though they appear to be controlled better than those in Tables 1 and 2 (data not shown for brevity). Therefore, in practical use, we recommend the Fieller’s method because it is a noniterative procedure and has the similar performance to the LR method.
So far, we are not aware of any association study for the Xchromosomal SNPs in the MCTFR data. In fact, we also found that all the five traits in the MCTFR data are not normally distributed. On the other hand, when simultaneously analyzing multiple traits for the Xchromosomal SNPs, Deng et al. [34] fixed the significance level at 1 × 10^{−3} for their association analysis. So, in our real data application, we used the existing Levene’s test [34] to test for the association between the Xchromosomal SNPs and the five traits at the significance level of 1 × 10^{−3}, which does not require the normality assumption for the traits. However, when only analyzing a single trait for all the 12,354 SNPs, the significance level is set to be α^{′} = 0.05/12,354 = 4.047 × 10^{−6} based on Bonferroni correction. One SNP (rs17261621) is shown to be only associated with the DRG trait at the 4.047 × 10^{−6} level, two SNPs (rs792959 and rs17261621) are identified to be associated with both the DRG and BD traits, and three SNPs (rs4825722, rs4825726 and rs2196260) are found to be associated with both the DEP and BD traits at the 1 × 10^{−3} level. In addition, we applied the proposed LR, Fieller’s and delta methods to these five SNPs and calculated the CIs of the skewness of XCI at the 95% confidence level. The CIs based on the LR and Fieller’s methods show that only rs792959 undergoes XCIS. However, these conclusions need to be further confirmed by molecular genetics. On the other hand, the proposed LR and Fieller’s methods require that the traits under study follow a normal distribution, while the DEP, DRG and BD traits are not normally distributed. Since we have no suitable data of this kind available, it is of future interest to apply the three proposed methods to datasets with traits following normal distributions and to further confirm their practical use.
Besides, the proposed methods have the following issues to discuss. First, to make the point estimate and the CIs of γ more interpretable, we simply use the interval [0, 2] to truncate the original point estimate and the original CIs, which may cause potential loss of information, and may also lead to the truncated CIs being empty sets when the original CIs lie outside [0, 2]. Fortunately, from our simulation study, the proportion of the CIs being empty sets or being reduced to be a point among all the simulation replications is all less than 2.7%. On the other hand, to incorporate the interval constraint of [0, 2] into statistical inference, we will develop a future Bayesian method to estimate the skewness of XCI by considering such constraint as prior information. Second, the proposed methods require the association between the traits and the SNPs being present. As such, in genomewide association study, we could regard the screening of the associated SNPs as a preliminary step before estimating the skewness of XCI. If such association is not statistically significant, the LR and Fieller’s methods may result in the discontinuous CIs, which is difficult to interpret. Third, the normality assumption of the traits under study is needed in the proposed methods. In future, we will extend them to accommodate the traits not normally distributed. Finally, the proposed methods are only applicable to unrelated female subjects. Thus, we will extend the proposed methods and make them suitable for data with family or pedigree structure in future studies.
Conclusions
We recommend the Fieller’s method in practical use because it is a noniterative procedure and almost has the same performance as the LR method. On the other hand, only rs792959, which is identified to be associated with both the DRG and BD traits, may undergo XCIS, which needs to be confirmed by molecular genetics.
Methods
Notations and point estimate of γ
Consider an Xlinked diallelic QTL. Let D and d represent the mutant allele and the normal allele at the QTL with the frequencies being p and q (p + q = 1), respectively. Note that XCI is unrelated to males and we only consider females here. Then, there are three possible genotypes dd, Dd and DD at the QTL in females. The corresponding frequencies are g_{0} = q^{2} + ρpq, g_{1} = 2(1 − ρ)pq and g_{2} = p^{2} + ρpq, respectively, where ρ is the inbreeding coefficient. ρ = 0 means that HardyWeinberg equilibrium (HWE) holds in the female population, while ρ ≠ 0 denotes HardyWeinberg disequilibrium. Suppose that Y and G are the value of the quantitative trait under study and the genotype of a female subject, respectively. Notice that XCI may lead to variance heterogeneity of Y across different genotypes [32]. So, we assume that \( {\left.Y\right}_{G= dd}\sim N\left({\mu}_0,{\sigma}_0^2\right) \), \( {\left.Y\right}_{G= Dd}\sim N\left({\mu}_1,{\sigma}_1^2\right) \) and \( {\left.Y\right}_{G= DD}\sim N\left({\mu}_2,{\sigma}_2^2\right). \) Further, let X_{1} = I_{{G = Dd or DD}} and X_{2} = I_{{G = DD}}. As such, X_{1} denotes that this female carries at least one mutant allele D and X_{2} indicates that she is a homozygote DD. Then, to construct the statistical measure of the skewness of XCI, we consider the following linear regression model
where Z is a vector of covariates which need to be adjusted, β_{0} is the intercept, β_{1} and β_{2} respectively are the regression coefficients of X_{1} and X_{2}, and b is a vector of the regression coefficients for Z. From Eq. (1), we have μ_{0} = β_{0} + b^{T}Z, μ_{1} = β_{0} + β_{1} + b^{T}Z and μ_{2} = β_{0} + β_{1} + β_{2} + b^{T}Z. Under XCIR, μ_{1} should lie midway between μ_{0} and μ_{2}, which is \( {\beta}_0+\frac{\beta_1+{\beta}_2}{2}+{\boldsymbol{b}}^{\mathrm{T}}\boldsymbol{Z} \). Hence, for heterozygous females, any statistically significant deviation from such value can be regarded as an evidence of XCIS. This is equivalent to that \( \frac{\mu_1{\mu}_0}{\mu_2{\mu}_0}=\frac{\beta_1}{\beta_1+{\beta}_2} \) is far away from 0.5. Therefore, we define the following parameter γ to measure the skewness of XCI
with β_{1} + β_{2} ≠ 0. And θ = γ/2, on the average, is indicative of the proportion of cells in a Dd female keeping the mutant allele D active. Thus, γ = 1 denotes XCIR. 1 < γ ≤ 2 means the XCIS towards D and 0 ≤ γ < 1 represents the XCIS against D. For example, when γ = 1.5, then θ = 75%, which means that 75% cells have mutant allele D active and the other 25% cells express the normal allele d.
Let β = (β_{1} + β_{2})/2, then γ = β_{1}/β. In this regard, we obtain β_{1} = γβ and β_{2} = (2 − γ)β, where β ≠ 0. Let X = γX_{1} + (2 − γ)X_{2}, then Eq. (1) becomes
Here, the genotypic value X equals 0, γ and 2 for genotypes dd, Dd and DD, respectively, which implies that the definition of γ coincides with the coding strategy of Wang et al. [27] for XCI. On the other hand, from Eq. (2), we observe that γ can be well defined when the association between Y and the allele of interest is present (i.e., β ≠ 0).
Assume that we collect a sample of n independent females. Let n_{0}, n_{1} and n_{2} be the number of females with genotypes dd, Dd and DD, respectively. So, n_{0} + n_{1} + n_{2} = n. Let y_{ij}, x_{ij1}, x_{ij2} and z_{ij} denote the values of Y, X_{1}, X_{2} and Z of female j (j = 1, 2, ⋯, n_{i}), where i = 0, 1, 2 respectively correspond to genotypes dd, Dd and DD. According to Eq. (1), the loglikelihood function of the sample is
Then, by maximizing the above equation, the maximum likelihood estimates \( {\hat{\beta}}_0,{\hat{\beta}}_1,{\hat{\beta}}_2,{\hat{\sigma}}_0,{\hat{\sigma}}_1,{\hat{\sigma}}_2 \) and \( \hat{\boldsymbol{b}} \) of β_{0}, β_{1}, β_{2}, σ_{0}, σ_{1}, σ_{2} and b can be derived. As such, according to Eq. (2), the point estimate of γ can be given as \( \frac{2{\hat{\beta}}_1}{{\hat{\beta}}_1+{\hat{\beta}}_2} \). Notice that γ is bounded in [0, 2]. Then, the final point estimate of γ is cut off by 0 and 2, and denoted by \( \hat{\gamma} \).
Confidence interval of γ
Here, we extend the three methods proposed by Wang et al. [35] to construct the CI of γ to quantitative traits as follows. To obtain the CI of γ based on the LR method, we first develop a likelihood ratio test for testing the null hypothesis H_{0} : γ = γ_{0} below, where γ_{0} ∈ [0, 2] is a prespecified constant, e.g., γ_{0} = 1 (XCIR). From Eq. (3), the loglikelihood function of the sample under H_{0} is
By maximizing the above equation, the maximum likelihood estimates \( {\overset{\sim }{\beta}}_0,\overset{\sim }{\beta },{\overset{\sim }{\sigma}}_0,{\overset{\sim }{\sigma}}_1,{\overset{\sim }{\sigma}}_2 \) and \( \overset{\sim }{\boldsymbol{b}} \) of β_{0}, β, σ_{0}, σ_{1}, σ_{2} and b under H_{0} can be obtained. So, the likelihood ratio test is
which asymptotically follows the chisquare distribution with the degree of freedom being one (i.e., \( {\chi}_1^2 \)).
Then, the 100(1 − α)% CI of γ based on λ(γ_{0}) is \( \left\{{\gamma}_0:P\left(\lambda \left({\gamma}_0\right)<{\chi}_{1\alpha, 1}^2\right)\right\} \) and the confidence limits satisfy
That is, the 100(1 − α)% CI of γ is the interval satisfying f(γ_{0}) < 0. Note that γ should be bounded in [0, 2]. As such, the bisection method is applied to find all the roots of Eq. (4) within [0, 2] by using the “rootSolve” package in R software [46]. This indicates that the LR method is an iterative procedure. If Eq. (4) has no root in [0, 2] and f(γ_{0}) < 0, the CI is taken to be [0, 2]. On the contrary, when Eq. (4) has no root in [0, 2], but f(γ_{0}) > 0, the resulting CI is an empty set. When Eq. (4) has only one root γ^{LR} in [0, 2] and f(γ_{0}) ≥ 0, the CI is reduced to be a point. When Eq. (4) has only one root γ^{LR} in [0, 2] and f(0)f(2) < 0, there are two different situations. If f(0) > 0 and f(2) < 0, then f(γ_{0}) < 0 will be satisfied within (γ^{LR}, 2] and the CI is taken as (γ^{LR}, 2]; otherwise, the CI is set to be [0, γ^{LR}). When Eq. (4) has two unequal roots \( {\gamma}_L^{LR} \) and \( {\gamma}_U^{LR} \) in [0, 2] with \( {\gamma}_L^{LR}<{\gamma}_U^{LR} \), f(γ_{0}) < 0, \( {\gamma}_0\in \left({\gamma}_L^{LR},{\gamma}_U^{LR}\right) \) means that the CI is \( \left({\gamma}_L^{LR},{\gamma}_U^{LR}\right) \). Otherwise, the CI is \( \left[0,{\gamma}_L^{LR}\right)\cup \left({\gamma}_U^{LR},2\right] \), the union of two disjoint intervals, which is a discontinuous CI.
Since \( \hat{\gamma} \) is a ratio estimate, borrowing the idea of Wang et al. [35], we find that the standard error of \( \hat{\gamma} \) can be approximated by using the delta method. Specifically, take a first order Taylor expansion of γ around the point (β_{1}, β) and evaluate it at \( \left({\hat{\beta}}_1,\hat{\beta}\right) \), which yields \( \hat{\gamma}\approx \frac{\beta_1}{\beta }+\left({\hat{\beta}}_1{\beta}_1\right)\frac{1}{\beta }\left(\hat{\beta}\beta \right)\frac{\beta_1}{\beta^2} \), where \( \hat{\beta}=\frac{{\hat{\beta}}_1+{\hat{\beta}}_2}{2} \). Then,
where \( \mathrm{Var}\left(\hat{\beta}\right)=\mathrm{Var}\left(\frac{{\hat{\beta}}_1+{\hat{\beta}}_2}{2}\right)=\frac{1}{4}\left[\mathrm{Var}\left({\hat{\beta}}_1\right)+\mathrm{Var}\left({\hat{\beta}}_2\right)+2\mathrm{Cov}\left({\hat{\beta}}_1,{\hat{\beta}}_2\right)\right] \) and \( \mathrm{Cov}\left({\hat{\beta}}_1,\hat{\beta}\right)=\mathrm{Cov}\left({\hat{\beta}}_1,\frac{{\hat{\beta}}_1+{\hat{\beta}}_2}{2}\right)=\frac{1}{2}\mathrm{Var}\left({\hat{\beta}}_1\right)+\frac{1}{2}\mathrm{Cov}\left({\hat{\beta}}_1,{\hat{\beta}}_2\right) \). Here, \( \mathrm{Var}\left({\hat{\beta}}_1\right) \), \( \mathrm{Var}\left({\hat{\beta}}_2\right) \) and \( \mathrm{Cov}\left({\hat{\beta}}_1,{\hat{\beta}}_2\right) \) are the elements of the variancecovariance matrix of \( {\hat{\beta}}_1 \) and \( {\hat{\beta}}_2 \), which can be computed by using the “glm” function in R software [46]. Respctively replacing β_{1} and β by \( {\hat{\beta}}_1 \) and \( \hat{\beta} \) in Eq. (5), we get the estimate of the variance of \( \hat{\gamma} \) as \( \hat{\mathrm{Var}}\left(\hat{\gamma}\right)=\frac{1}{{\hat{\beta}}^2}\mathrm{Var}\left({\hat{\beta}}_1\right)+\frac{{\hat{\beta}}_1^2}{{\hat{\beta}}^4}\mathrm{Var}\left(\hat{\beta}\right)\frac{2{\hat{\beta}}_1}{{\hat{\beta}}^3}\mathrm{Cov}\left({\hat{\beta}}_1,\hat{\beta}\right) \). From the fact that \( \frac{\hat{\gamma}{\gamma}_0}{\sqrt{\hat{\mathrm{Var}}\left(\hat{\gamma}\right)}}\sim N\left(0,1\right) \) asymptotically, the 100(1 − α)% CI of γ based on the delta method is \( \left(\hat{\gamma}{z}_{\alpha /2}\sqrt{\hat{\mathrm{Var}}\Big(\hat{\gamma}}\Big),\hat{\gamma}+{z}_{\alpha /2}\sqrt{\hat{\mathrm{Var}}\Big(\hat{\gamma}}\Big)\right)\cap \left[0,2\right] \), where z_{α/2} is the upper α/2 quantile of the standard normal distribution.
Now, we consider the CI of γ based on the Fieller’s method, just like Wang et al. [35]. Under H_{0} : γ = γ_{0}, we have β_{1} − γ_{0}β = 0. Then, we can construct the following Wald test for testing H_{0} : γ = γ_{0}
The confidence limits of the 100(1 − α)% CI based on the Fieller’s method satisfy \( \frac{{\hat{\beta}}_1{\gamma}_0\hat{\beta}}{\sqrt{\mathrm{Var}\left({\hat{\beta}}_1\right)+{\gamma}_0^2\mathrm{Var}\left(\hat{\beta}\right)2{\gamma}_0\mathrm{Cov}\left({\hat{\beta}}_1,\hat{\beta}\right)}}={z}_{\alpha /2} \), which is equivalent to the quadratic equation \( A{\gamma}_0^2+B{\gamma}_0+C=0 \) with respect to γ_{0}. Here, \( A={\hat{\beta}}^2{z}_{\alpha /2}^2\mathrm{Var}\left(\hat{\beta}\right) \), \( B=2{z}_{\alpha /2}^2\mathrm{Cov}\left({\hat{\beta}}_1,\hat{\beta}\right)2{\hat{\beta}}_1\hat{\beta} \) and \( C={{\hat{\beta}}_1}^2{z}_{\alpha /2}^2\mathrm{Var}\left({\hat{\beta}}_1\right) \). Assume that Δ = B^{2} − 4AC. From Fieller’s theorem, A > 0 implies Δ > 0. Further, A > 0 and \( \left\hat{\beta}/\sqrt{\mathrm{Var}\left(\hat{\beta}\right)}\right>{z}_{\alpha /2} \) are equivalent to each other, which mean that the association is present at the significance level of α. In addition, when Δ = 0 or A = 0, the CI is reduced to a point. As such, the 100(1 − α)% CI based on the Fieller’s method is
It should be noted that if Δ > 0, the above CI may be an empty set. When Δ > 0 and A < 0, the corresponding CI may be the union of two disjoint intervals, which is the discontinuous CI.
Simulation settings
For simplicity, we do not include any covariate in the model. The frequency p of allele D at the locus on X chromosome is fixed at 0.1 and 0.3. The inbreeding coefficient ρ is taken as 0 and 0.05 to respectively simulate the situation of HWE and that of HardyWeinberg disequilibrium. Let β_{0} = 0.1 and β = 0.3. Further, β_{1} = γβ and β_{2} = (2 − γ)β are calculated from β = 0.3 and γ, where γ takes values of 0, 0.5, 1, 1.5 and 2. γ_{0} is also assigned to be 0, 0.5, 1, 1.5 and 2 with γ = γ_{0} to simulate the type I error rates of the proposed methods and γ ≠ γ_{0} for simulating the their test powers. As mentioned in Ma et al. [32], the variance \( {\sigma}_1^2 \) of the trait value for heterozygous females is generally larger than \( {\sigma}_0^2 \) and \( {\sigma}_2^2 \) for homozygous females due to XCI. So, we set \( {\sigma}_0^2={\sigma}_2^2=1 \), and \( {\sigma}_1^2=\theta \left(1\theta \right){a}^2+1.1 \), where a is the additive effect of the QTL, θ = γ/2 is the inactivation ratio as mentioned before, and the variance caused by other factors is fixed to be 1.1. Here, a is set to be 0.1 and 0.3. The sample size n is selected to be 1,000 and 2,000. The genotype of each female is simulated according to the allele frequency p and the inbreeding coefficient ρ. Then, the trait value Y of this female given her genotype is generated by \( {\left.Y\right}_{G= dd}\sim N\left({\beta}_0,{\sigma}_0^2\right) \), \( {\left.Y\right}_{G= Dd}\sim N\left({\beta}_0+{\beta}_1,{\sigma}_1^2\right) \) or \( {\left.Y\right}_{G= DD}\sim N\left({\beta}_0+{\beta}_1+{\beta}_2,{\sigma}_2^2\right) \). For each simulation setting, the simulations are conducted based on K = 10,000 replications and the significance level α is fixed at 5%. The simulation study is implemented in R software (version 3.2.5) [46].
Notice that the distribution of the point estimate \( \hat{\gamma} \) may be asymmetric. So, we list the median of \( \hat{\gamma} \) ’s over K replications to describe the central tendency of this skewed distribution. We assess the statistical properties of the CIs of γ by the following indexes. The coverage probability (CP) is the proportion that the CIs contain the true value γ among K replications, irrespective of the CI being continuous or discontinuous. DP and EP are the proportion of the discontinuous CIs and that of the CIs being an empty set or being reduced to be a point among K replications, respectively. Simulation study is also carried out to investigate the probabilities of the CI missing the true value γ on the left (ML) and on the right (MR), and the value of the ratio ML/(ML + MR), which is close to 0.5 when the balance between ML and MR is achieved. Here, \( \mathrm{ML}=\frac{\#\left[\left(\gamma <{\gamma}_L\right)\cap \left({\gamma}_L\le \hat{\gamma}\le {\gamma}_U\right)\right]}{K} \) and \( \mathrm{MR}=\frac{\#\left[\left(\gamma >{\gamma}_U\right)\cap \left({\gamma}_L\le \hat{\gamma}\le {\gamma}_U\right)\right]}{K} \), where # is the counting measure and (γ_{L}, γ_{U}) ’s are the continuous CIs. We only consider the continuous CIs when computing the ML and MR, because we cannot distinguish between the left side and the right side of the discontinuous CIs.
Availability of data and materials
The Minnesota Center for Twin and Family Research data used for the analyses described in this article can be found on the database of Genotypes and Phenotypes with accession number phs000620.v1.p1 (https://www.ncbi.nlm.nih.gov/projects/gap/cgibin/study.cgi?study_id=phs000620.v1.p1).
Abbreviations
 BD:

Behavioral disinhibition composite score
 CI:

Confidence interval
 CP:

Coverage probability
 DEP:

Alcohol dependence composite score
 DP:

Proportion of the discontinuous CIs
 DRG:

Illicit drug composite score
 EP:

Proportion of the CIs being empty sets or being reduced to be a point
 GWAS:

Genomewide association study
 HWE:

HardyWeinberg equilibrium
 LR:

Likelihood ratio
 MAF:

Minor allele frequency
 MCTFR:

Minnesota Center for Twin and Family Research
 ML:

Left tail error
 MR:

Right tail error
 QTL:

Quantitative trait loci
 SNP:

Single nucleotide polymorphism
 XCI:

X chromosome inactivation
 XCIE:

Escape from X chromosome inactivation
 XCIR:

Random X chromosome inactivation
 XCIS:

Skewed X chromosome inactivation
References
 1.
Chabchoub G, Uz E, Maalej A, Mustafa CA, Rebai A, Mnif M, et al. Analysis of skewed Xchromosome inactivation in females with rheumatoid arthritis and autoimmune thyroid diseases. Arthritis Res Ther. 2009;11(4):R106. https://doi.org/10.1186/ar2759.
 2.
Ortona E, Pierdominici M, Maselli A, Veroni C, Aloisi F, Shoenfeld Y. Sexbased differences in autoimmune diseases. Ann Ist Super Sanita. 2016;52(2):205–12. https://doi.org/10.4415/ANN_16_02_12.
 3.
BraschAndersen C, Møller MU, Haagerup A, Vestbo J, Kruse TA. Evidence for an asthma risk locus on chromosome Xp: a replication linkage study. Allergy. 2008;63(9):1235–8. https://doi.org/10.1111/j.13989995.2008.01699.x.
 4.
Jacobs PA, Hunt PA, Mayer M, Bart RD. Duchenne muscular dystrophy (DMD) in a female with an X/autosome translocation: further evidence that the DMD locus is at Xp21. Am J Hum Genet. 1981;33(4):513–8.
 5.
Quan F, Janas J, TothFejel S, Johnson DB, Wolford JK, Popovich BW. Uniparental disomy of the entire X chromosome in a female with Duchenne muscular dystrophy. Am J Hum Genet. 1997;60(1):160–5.
 6.
Migeon BR, Moser HW, Moser AB, Axelman J, Sillence D, Norum RA. Adrenoleukodystrophy: evidence for X linkage, inactivation, and selection favoring the mutant allele in heterozygous cells. Proc Natl Acad Sci U S A. 1981;78(8):5066–70. https://doi.org/10.1073/pnas.78.8.5066.
 7.
Goodship J, Carter J, Espanol T, Boyd Y, Malcolm S, Levinsky RJ. Carrier detection in WiskottAldrich syndrome: combined use of M27 beta for Xinactivation studies and as a linked probe. Blood. 1991;77(12):2677–81. https://doi.org/10.1182/blood.V77.12.2677.2677.
 8.
Spatz A, Borg C, Feunteun J. Xchromosome genetics and human cancer. Nat Rev Cancer. 2004;4(8):617–29. https://doi.org/10.1038/nrc1413.
 9.
Kristiansen M, Knudsen GP, Maguire P, Margolin S, Pedersen J, Lindblom A, et al. High incidence of skewed X chromosome inactivation in young patients with familial nonBRCA1/BRCA2 breast cancer. J Med Genet. 2005;42(11):877–80. https://doi.org/10.1136/jmg.2005.032433.
 10.
Li G, Su Q, Liu GQ, Gong L, Zhang W, Zhu SJ, et al. Skewed X chromosome inactivation of blood cells is associated with early development of lung cancer in females. Oncol Rep. 2006;16(4):859–64.
 11.
Medema RH, Burgering BM. The X factor: skewing X inactivation towards cancer. Cell. 2007;129(7):1253–4. https://doi.org/10.1016/j.cell.2007.06.008.
 12.
Panning B. X chromosome inactivation and breast cancer: epigenetic alteration in tumor initiation and progression. Toxicon. 2007;54(2):121–7.
 13.
Wise AL, Gyi L, Manolio TA. Exclusion: toward integrating the X chromosome in genomewide association analyses. Am J Hum Genet. 2013;92(5):643–7. https://doi.org/10.1016/j.ajhg.2013.03.017.
 14.
Lyon MF. Gene action in the Xchromosome of the mouse (Mus musculus L.). Nature. 1961;190(4773):372–3. https://doi.org/10.1038/190372a0.
 15.
Lyon MF. Xchromosome inactivation and developmental patterns in mammals. Biol Rev Camb Philos Soc. 1972;47(1):1–35. https://doi.org/10.1111/j.1469185X.1972.tb00969.x.
 16.
Kay GF, Barton SC, Surani MA, Rastan S. Imprinting and X chromosome counting mechanisms determine Xist expression in early mouse development. Cell. 1994;77(5):639–50. https://doi.org/10.1016/00928674(94)900493.
 17.
Wong CC, Caspi A, Williams B, Houts R, Craig IW, Mill J. A longitudinal twin study of skewed X chromosomeinactivation. PLoS One. 2011;6(3):e17873. https://doi.org/10.1371/journal.pone.0017873.
 18.
Manzardo AM, Henkhaus R, Hidaka B, Penick EC, Poje AB, Butler MG. X chromosome inactivation in women with alcoholism. Alcohol Clin Exp Res. 2012;36(8):1325–9. https://doi.org/10.1111/j.15300277.2012.01740.x.
 19.
AmosLandgraf JM, Cottle A, Plenge RM, Friez M, Schwartz CE, Longshore J, et al. X chromosomeinactivation patterns of 1,005 phenotypically unaffected females. Am J Hum Genet. 2006;79(3):493–9. https://doi.org/10.1086/507565.
 20.
Kay GF. Xist and X chromosome inactivation. Mol Cell Endocrinol. 1998;140(1–2):71–6. https://doi.org/10.1016/S03037207(98)00032X.
 21.
Sharp A, Robinson D, Jacobs P. Age and tissuespecific variation of X chromosome inactivation ratios in normal women. Hum Genet. 2000;107(4):343–9. https://doi.org/10.1007/s004390000382.
 22.
Plenge RM, Stevenson RA, Lubs HA, Schwartz CE, Willard HF. Skewed Xchromosome inactivation is a common feature of Xlinked mental retardation disorders. Am J Hum Genet. 2002;71(1):168–73. https://doi.org/10.1086/341123.
 23.
Minks J, Robinson WP, Brown CJ. A skewed view of X chromosome inactivation. J Clin Invest. 2008;118(1):20–3. https://doi.org/10.1172/JCI34470.
 24.
Peeters SB, Cotton AM, Brown CJ. Variable escape from Xchromosome inactivation: identifying factors that tip the scales towards expression. Bioessays. 2014;36(8):746–56. https://doi.org/10.1002/bies.201400032.
 25.
Berletch JB, Ma W, Yang F, Shendure J, Noble WS, Disteche CM, et al. Escape from X inactivation varies in mouse tissues. PLoS Genet. 2015;11(3):e1005079. https://doi.org/10.1371/journal.pgen.1005079.
 26.
Clayton D. Testing for association on the X chromosome. Biostatistics. 2008;9(4):593–600. https://doi.org/10.1093/biostatistics/kxn007.
 27.
Wang J, Yu R, Shete S. Xchromosome genetic association test accounting for Xinactivation, skewed Xinactivation, and escape from Xinactivation. Genet Epidemiol. 2014;38(6):483–93. https://doi.org/10.1002/gepi.21814.
 28.
Wang P, Xu SQ, Wang BQ, Fung WK, Zhou JY. A robust and powerful test for casecontrol genetic association study on X chromosome. Stat Methods Med Res. 2019;28(10–11):3260–72. https://doi.org/10.1177/0962280218799532.
 29.
Liu W, Wang BQ, LiuFu G, Fung WK, Zhou JY. Xchromosome genetic association test incorporating Xchromosome inactivation and imprinting effects. J Genet. 2019;98(4):99. https://doi.org/10.1007/s1204101911466.
 30.
Zhang Y, Xu SQ, Liu W, Fung WK, Zhou JY. A robust test for Xchromosome genetic association accounting for Xchromosome inactivation and imprinting. Genet Res. 2020;102:e2. https://doi.org/10.1017/S0016672320000026.
 31.
Zhang L, Martin ER, Morris RW, Li YJ. Association test for Xlinked QTL in familybased designs. Am J Hum Genet. 2009;84(4):431–44. https://doi.org/10.1016/j.ajhg.2009.02.010.
 32.
Ma L, Hoffman G, Keinan A. Xinactivation informs variancebased testing for Xlinked association of a quantitative trait. BMC Genomics. 2015;16(1):241. https://doi.org/10.1186/s128640151463y.
 33.
Gao F, Chang D, Biddanda A, Ma L, Guo Y, Zhou Z, et al. XWAS: a software toolset for genetic data analysis and association studies of the X chromosome. J Hered. 2015;106(5):666–71. https://doi.org/10.1093/jhered/esv059.
 34.
Deng WQ, Mao S, Kalnapenkis A, Esko T, Mägi R, Paré G, et al. Analytical strategies to include the Xchromosome in variance heterogeneity analyses: evidence for traitspecific polygenic variance structure. Genet Epidemiol. 2019;43(7):815–30. https://doi.org/10.1002/gepi.22247.
 35.
Wang P, Zhang Y, Wang BQ, Li JL, Wang YX, Pan D, et al. A statistical measure for the skewness of X chromosome inactivation based on casecontrol design. BMC Bioinformatics. 2019;20(1):11. https://doi.org/10.1186/s1285901825872.
 36.
McGue M, Keyes M, Sharma A, Elkins I, Legrand L, Johnson W, et al. The environments of adopted and nonadopted youth: evidence on range restriction from the sibling interaction and behavior study (SIBS). Behav Genet. 2007;37(3):449–62. https://doi.org/10.1007/s1051900791427.
 37.
Keyes MA, Malone SM, Elkins IJ, Legrand LN, McGue M, Iacono WG. The enrichment study of the Minnesota twin family study: increasing the yield of twin families at high risk for externalizing psychopathology. Twin Res Hum Genet. 2009;12(5):489–501. https://doi.org/10.1375/twin.12.5.489.
 38.
Hicks BM, Schalet BD, Malone SM, Iacono WG, McGue M. Psychometric and genetic architecture of substance use disorder and behavioral disinhibition measures for gene association studies. Behav Genet. 2011;41(4):459–75. https://doi.org/10.1007/s1051901094172.
 39.
Miller MB, Basu S, Cunningham J, Eskin E, Malone SM, Oetting WS, et al. The Minnesota center for twin and family research genomewide association study. Twin Res Hum Genet. 2012;15(6):767–74. https://doi.org/10.1017/thg.2012.62.
 40.
Vrieze SI, McGue M, Iacono WG. The interplay of genes and adolescent development in substance use disorders: leveraging findings from GWAS metaanalyses to test developmental hypotheses about nicotine consumption. Hum Genet. 2012;131(6):791–801. https://doi.org/10.1007/s0043901211671.
 41.
McGue M, Zhang Y, Miller MB, Basu S, Vrieze S, Hicks B, et al. A genomewide association study of behavioral disinhibition. Behav Genet. 2013;43(5):363–73. https://doi.org/10.1007/s105190139606x.
 42.
Vrieze SI, McGue M, Miller MB, Hicks BM, Iacono WG. Three mutually informative ways to understand the genetic relationships among behavioral disinhibition, alcohol use, drug use, nicotine use/dependence, and their cooccurrence: twin biometry, GCTA, and genomewide scoring. Behav Genet. 2013;43(2):97–107. https://doi.org/10.1007/s105190139584z.
 43.
Derringer J, Corley RP, Haberstick BC, Young SE, Demmitt BA, Howrigan DP, et al. Genomewide association study of behavioral disinhibition in a selected adolescent sample. Behav Genet. 2015;45(4):375–81. https://doi.org/10.1007/s105190159705y.
 44.
Purcell S, Neale B, ToddBrown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for wholegenome association and populationbased linkage analyses. Am J Hum Genet. 2007;81(3):559–75. https://doi.org/10.1086/519795.
 45.
Chung RH, Ma D, Wang K, Hedges DJ, Jaworski JM, Gilbert JR, et al. An X chromosomewide association study in autism families identifies TBL1X as a novel autism spectrum disorder candidate gene in males. Mol Autism. 2011;2(1):18. https://doi.org/10.1186/20402392218.
 46.
Team RC. R: a language and environment for statistical computing, vol. 2020. Vienna: R Foundation for Statistical Computing; 2013. http://www.rproject.org.
Acknowledgements
The authors would like to thank the editor and two anonymous reviewers for their valuable comments which highly improved the presentation of the article.
Funding
This work was supported by the National Natural Science Foundation of China 81773544, and the Science and Technology Planning Project of Guangdong Province 2020B1212030008. Minnesota Center for Twin and Family Research (MCTFR) was supported by the National Institute on Drug Abuse U01 DA024417. The sample ascertainment and data collection in MCTFR were supported by the National Institute on Drug Abuse R37 DA05147, R01 DA13240, the National Institute on Alcohol Abuse and Alcoholism R01 AA09367, R01 AA11886, and the National Institute of Mental Health R01 MH66140. All the funding supporters had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author information
Affiliations
Contributions
BHL, WYY and JYZ all contributed to the data analysis, the interpretation of the results of the data analysis, and the writing and the revision of the manuscript. BHL and WYY conducted the simulation study. JYZ helped design the study and directed its implementation. All authors read and approved this version of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Tables S1S2.
Estimated sizes for testing H_{0} : γ = γ_{0} for the LR, Fieller’s and delta methods with a = 0.1 and 0.3, p = 0.1 and 0.3, and ρ = 0.05 based on 10,000 replicates and 5% significance level when n = 1,000 and 2,000, respectively. Tables S3S4. Estimated median of the point estimates of γ, CP, ML, MR, ML/(ML + MR), DP and EP of twosided 95% CIs of γ for the LR, Fieller’s and delta methods against γ, with a = 0.1 and 0.3, p = 0.1 and 0.3, and ρ = 0.05 based on 10,000 replicates when n = 1,000 and 2,000, respectively. Figures S1S2. Estimated powers for the LR and Fieller’s methods against γ based on 10,000 replicates and 5% significance level with a = 0.1 and 0.3, p = 0.1 and 0.3, ρ = 0 and n = 1,000 when γ_{0} = 0.5 and 1.5, respectively. Figures S3S7. Estimated powers for the LR and Fieller’s methods against γ based on 10,000 replicates and 5% significance level with a = 0.1 and 0.3, p = 0.1 and 0.3, ρ = 0.05 and n = 1,000 when γ_{0} = 0, 0.5, 1, 1.5 and 2, respectively. Figures S8S9. Estimated powers for the LR and Fieller’s methods against γ based on 10,000 replicates and 5% significance level with a = 0.1 and 0.3, p = 0.1 and 0.3, ρ = 0 and n = 2,000 when γ_{0} = 0.5 and 1.5, respectively. Figures S10S14. Estimated powers for the LR and Fieller’s methods against γ based on 10,000 replicates and 5% significance level with a = 0.1 and 0.3, p = 0.1 and 0.3, ρ = 0.05 and n = 2,000 when γ_{0} = 0, 0.5, 1, 1.5 and 2, respectively.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Li, BH., Yu, WY. & Zhou, JY. A statistical measure for the skewness of X chromosome inactivation for quantitative traits and its application to the MCTFR data. BMC Genom Data 22, 24 (2021). https://doi.org/10.1186/s1286302100978z
Received:
Accepted:
Published:
Keywords
 X chromosome inactivation
 Skewness
 Quantitative trait
 Variance heterogeneity
 Minnesota Center for Twin and Family Research data