 Methodology article
 Open Access
 Published:
A novel pathspecific effect statistic for identifying the differential specific paths in systems epidemiology
BMC Genetics volume 21, Article number: 85 (2020)
Abstract
Background
Biological pathways play an important role in the occurrence, development and recovery of complex diseases, such as cancers, which are multifactorial complex diseases that are generally caused by mutation of multiple genes or dysregulation of pathways.
Results
We propose a pathspecific effect statistic (PSE) to detect the differential specific paths under two conditions (e.g. case VS. control groups, exposure Vs. nonexposure groups). In observational studies, the pathspecific effect can be obtained by separately calculating the average causal effect of each directed edge through adjusting for the parent nodes of nodes in the specific path and multiplying them under each condition. Theoretical proofs and a series of simulations are conducted to validate the pathspecific effect statistic. Applications are also performed to evaluate its practical performances. A series of simulation studies show that the Type I error rates of PSE with Permutation tests are more stable at the nominal level 0.05 and can accurately detect the differential specific paths when comparing with other methods. Specifically, the power reveals an increasing trends with the enlargement of pathspecific effects and its effect differences under two conditions. Besides, the power of PSE is robust to the variation of parent or child node of the nodes on specific paths. Application to real data of Glioblastoma Multiforme (GBM), we successfully identified 14 positive specific pathways in mTOR pathway contributing to survival time of patients with GBM. All codes for automatic searching specific paths linking two continuous variables and adjusting set as well as PSE statistic can be found in supplementary materials.
Conclusion
The proposed PSE statistic can accurately detect the differential specific pathways contributing to complex disease and thus potentially provides new insights and ways to unlock the black box of disease mechanisms.
Background
Biological pathways play a key role in the occurrence, development and recovery of complex diseases, such as cancers, which are multifactorial complex diseases that are generally caused by mutation of multiple genes or dysregulation of pathways [1]. Besides biological pathways, improving clinical treatment, and discovering drug targets and biomarkers, are a series of actions among molecules (including genes and protein, etc.) in a cell that lead to a certain product or a change in the cell [2,3,4,5,6]. Recently, with the stillongoing development of highthroughput sequencing technology, the price is obviously falling, large numbers of relatedpathway omics data are exponentially growing, thus it has become one of the most important resource to analyze biological pathways via highthroughput omics data [7]. During the past 10 years, several pathway knowledge databases have been built, such as KEGG, BioCyc, MetaCyc, Reactome, RegulonDB and PantherDB [8,9,10,11,12,13]. The establishment of these knowledge databases laid the foundation to study pathways contributing to the occurrence, development and recovery of complex diseases [14]. Pathwayrelated knowledge databases and omics data contain a wealth of diseaserelated knowledge and information, such as information on the relatedpathway genes, molecular interactions in the same pathway, topology structure of pathways, gene expression, and so on. However, how to reveal the mechanism of biological regulation (e.g. SNP, gene and protein) on complex disease from observational pathwayrelated omics databases has become a great challenge.
Recently, some pathway analytical methods have been proposed to study human physiology, systems biology and modern drug development that provided the computational framework for data pathway analysis and biomarker selection [11,12,13,14,15,16,17]. These methods include functional enrichment analysis or gene set analysis (GSA) [14], pathway analysis within a Bayesian Network framework [15], pathway analysis approaches based on the adaptive rank truncated product statistic [16] and a subpathwaybased approach to studying the joint effects of multiple genetic variants [17]. Although, these methods are suitable for omics data analysis in systems epidemiology, most of them fail to take into account the correlation degree and topological structure between nodes (e.g. gene, SNP, etc.) from biological network. Despite, Pathway Effect Measures (PEM) with a casecontrol design [13] fully utilizes the correlation relationship between nodes, it only considers the chainspecific effects and encounters difficulties in nonlinear and interaction models. Specially, the estimation of chainspecific effect is different from the pathspecific effect extracted from a complex network, the former one does not take into account the influence of other adjacent paths or nodes (e.g. parent or child nodes). Besides the chain effect is solely marginal statistical association, but the specific path effect is developed based on causal inference and needs to adjust for necessary covariates affecting specific path. Pearl [18] firstly defined pathspecific effects in the terms of causal diagrams. And Avin et al. [19] provided general necessary and sufficient conditions for their identification for a single exposure and outcome, while Shpitser [20] generalized these definitions and conditions to settings with multiple exposures, multiple outcomes, and possible hidden variables. Miles [21] developed a suite of multiplyrobust, semiparametric efficient estimator for the pathspecific effect. However, these methods tend to require a number of strict assumptions which are difficult to be verified in practical applications, especially for complex network structures in biological fields.
In order to reduce the computational burden, we proposed a series of simplification process for the topology structure of complex networks. Of note, the nodes on specific path are only influenced by their parent nodes according to Markov Independence property. After simplification, the pathspecific effect statistic PSE is estimated under two conditions to detect the differential specific paths. Therefore, the statistic PSE combined the causal effect calculation under causal inference framework with the network comparison in systems epidemiology designs. To assess the performances of the statistic PSE, theoretical proofs and statistical simulations are conducted to evaluate the stability of type I error and power, and a real gene expression data in Mammalian Target Of Rapamycin (mTOR) pathway on survival time of glioblastoma multiforme (GBM) patients are further analyzed to validate the practicability of PSE statistic.
Application
Gliomas are the most common type of primary brain tumor, and are histologically differentiated as astrocytomas, oligodendrogliomas, and ependymomas. The World Health Organization (WHO) classifies central nervous system tumors into four different grades: I, II, III and IV. Grade IV glioblastoma multiforme (GBM) is the most frequent, devastating, and malignant astrocytic glioma. It is characterized by a high degree of cellularity, vascular proliferation, tumor cell chemoresistance, and necrosis. Even after neurosurgical resection, followed by aggressive chemotherapy and radiotherapy, GBM is still considered an incurable malignancy. No effective treatment agent against GBM has been identified [22,23,24].
The proposed PSE statistic was applied to analyze gene expression data in Mammalian target of rapamycin (mTOR) signal pathway (Fig. 1) of 461 white people from TCGA datasets containing 12,071 genes by comparing the survival time (i.e. more VS. less than the mean survival time), and 39 genes are successfully mapped to this signaling pathway. The pathway mTOR, a key mediator of phosphatidylinositol3kinase (PI3K) signaling, has emerged as a compelling molecular target in glioblastoma patients, although clinical efforts to target mTOR have not been successful. Here, we support the evidence demonstrating that mTOR is a compelling molecular target for the survival event with GBM. It was approved by ethical committee of Medical Ethical Committee of Qilu Hospital, Shandong University, China.
Results
Simulation
Type I error rate
Tables 1 and 2 showed the type I error rates of total causal effects (TCE) of Calorific Excess on Myocardial Infarction and the pathspecific effects along selected the specific path: Calorific Excess→Visceral Adiposit→Inflammatory Milieu→Atherosclerosis→Myocardial Infarction (Fig. 2), respectively. Table 1 revealed that the type I error rates of five methods are close to the given nominal level (α = 0.05) when sample size reached 1000 for total causal effects. While Table 2 illustrated that only permutation tests remained stable at the nominal level of 0.05, other methods deviated from the 0.05 nominal level, when sample size reached 1000 for pathspecific effects. Therefore, PSE statistic with permutation tests had better performances for testing total causal effect or pathspecific effect.
Statistical power
Table 3 showed that the powers of five methods almost remained invariant for testing total causal effects when varying across the average causal effects of edges on specific path and given the fixed effect difference δ = 1 (Case group vs. Control group). Table 4 showed the power of permutation tests got larger for pathspecific effects when the average causal effect of each edge on target path became larger.
Besides, Tables 5 and 6 showed that varying across the effect difference δ of pathspecific effects in case and control group, the powers of total causal effect and pathspecific effect obviously elevated. Furthermore, we performed sensitivity analysis to observe whether the PSE statistic would be influenced by the parent nodes or child nodes of nodes on specific path. Tables 7 and 8 revealed that in most cases pathspecific effect statistic PSE was not influenced by effects of their parent nodes or child nodes. According to above simulation performances, our proposed PSE with permutation test had better performances and kept robust in sensitivity analysis.
For the scenario of continuous variables, when comparing with the PEM [13] statistic with Bootstrap tests, our proposed PSE statistics accurately detected the differential pathway effects X_{1} → X_{2} → Y linking X_{1} and Y under two conditions for the fixed effect difference. The PEM with Bootstrap tests detected some false positive specific pathways (Fig. 3).
Application results
Mammalian Target Of Rapamycin (mTOR), a key mediator of phosphatidylinositol3kinase (PI3K) signaling, has emerged as a compelling molecular target in glioblastoma patients, although clinical efforts to target mTOR have not been successful [22,23,24]. Figure 1 showed the mTOR pathway from KEGG dataset (www.kegg.jp) that have been verified to be associated with the survival time of glioblastoma multiforme (GBM). The data (sample size N = 461 white people) of this pathway (Fig. 1) containing 39 genes in red boxes were downloaded from “The Cancer Genome Atlas” (TCGA, https://cancergenome.nih.gov/). We stratified the pathspecific effects according to the survival time T (T = 1 if survival time larger than mean survival time 16.65 months of patients diagnosed with GBMs, otherwise T = 0) and adjusted for confounders including age and sex in white people.
Furthermore, we found 14 specific pathways with statisical significance (Table 9) contributing to GBM and corresponding 17 genes, SLC7A5, mLST8, Lipin1, Tel2, CLIP170, ATG1, SLC3A2, RNF152, eIF4B, GATOR1, STRAD, IGFR, IRS1, PDK1, TSC1/6, Rheb. These genes have also been verified in many studies. The pathway mTOR works through the PI3K pathway through 2 important complexes, each characterized by distinct binding partners that confer different activities. In complex with PRAS40, raptor, and mLST8/GbL, mTOR works as a downstream activator of PI3K/Akt signaling, associating growth factor signals with protein translation, cell growth, proliferation, and survival state. This complex is known as mTORC1. In complex with rictor, mSIN1, protor, and mLST8 (mTORC2), mTOR works as an upstream effector of Akt [24]. Upon growth factor receptormediated activation of PI3K, Akt is recruited to the membrane through the interaction of its pleckstrin homology domain with PIP3, thus exposing its activation loop and enabling phosphorylation at threonine 308 (Thr308) via the constitutively active phosphoinositide dependent protein kinase 1 (PDK1) [25,26,27]. Akt activates mTORC1 via inhibitory phosphorylation of TSC2, which along with TSC1, negatively regulates mTORC1 through inhibiting the Rheb GTPase, a positive regulator of mTORC1. mTORC1 impairs PI3K activation in growth factorstimulated cells by downregulating IRS 1 and 2 and PDGFR [24, 28, 29]. The pathway mTORC1 regulates SREBP via regulating the nuclear entry of lipin 1, a phosphatidic acid phosphatase. Dephosphorylated, nuclear, catalytically active lipin 1 help nuclear remodeling and mediate the effects of mTORC1 on SREBP target gene, SREBP promoter activity, and nuclear SREBP protein abundance. Inhibition of mTORC1 in the liver significantly impairs SREBP function and makes mice resistant, in a lipin 1dependent fashion, to the hepatic steatosis and hypercholesterolemia induced by a high fat and cholesterol diet. These findings developed lipin 1 as a key component of the mTORC1SREBP pathway [25]. Some studies provided evidence that ATG1 was the preferred translation initiation site in 8MGBA, and that endogenous SETMAR were very stable proteins [25]. In summary these data allowed us to propose that endogenous SETMAR proteins can contain the αpeptide in their Nterminal part, at least at some stages of GBM biogenesis [26]. The gene Rheb acts downstream of TSC1/TSC2 and upstream of mTOR to regulate cell growth. Both IGFIR and IGFIIR were overexpressed in GBMs compared with normal brain (P < 10(− 4) and P = 0.002, respectively). Moreover, with regard to standard clinical factors, IGFIR positivity was identified as an independent prognostic factor associated with shorter survival (P = 0.016) and was associated with a less favourable response to temozolomide [27]. The pathway mTOR regulates eIF4B phosphorylation in response to aminoacid refeeding [30]. Glioblastoma is the most aggressive brain cancer with the poor survival rate. A microRNA, miR451, and its downstream molecules, STRAD, are known to play a centrol role in regulating a biochemical balance between rapid proliferation and invasion in the presence of metabolic stress in microenvironment [31].
Discussion
System epidemiology couples traditional epidemiology with modern highthroughput technologies which seek to integrate pathwaybased (or networkbased) analysis into observational study designs to enhance the understanding of biological processes in the human organism. It provides a ways to organize and study the interdependencies of factors (e.g., genes, proteins, metabolites) at a human population level. Within this framework, the identification of pathways effects responsible for specific diseases has been one of the essential tasks. In the framework of bioinformatics, various methods existed for inferring biological networks aiming to mine underlying networks for identifying biological modules, clustering interactions, and topological features of the network such as degree and betweenness centrality [32,33,34]. Despite these procedures for distinguishing specific pathway (or network) topology between different disease status, statistical inference at a population level remains unsolved, and further development is still necessary.
Because, in practice, complexity of network tend to render it difficult to accurately detect the pathway contribution to disease, the simplification process of complex network is very crucial for identifying the target pathways. Based on the aim of identification of pathspecific effects, we proposed a series of simplification process to simplify and abstract the topology structure of complex network (Fig. 4). Of note, the nodes of pathspecific is only influenced by their parent nodes according to Markov Independence property. This simplification process greatly reduce the complexity of network structure and maintain the key factors affecting the target specific paths. Currently in the field of causal inference, most methods mainly focus on the simple and easily understandable causal diagrams, but the simplification is the crucial first step to feasibly serve to real world.
After simplification, calculation and tests of pathspecific effect also became feasible. We proposed a nonparameter riverway confluxbased nonparameter causal diagram model for identifying the pathspecific effects in systems epidemiology. Simulation studies revealed our proposed PSE with permutation tests could efficiently identify the statistically different pathways. Table 1 revealed that the type I error of five methods are close to the given nominal level (α = 0.05) when sample Jsize reached 1000. While Table 2 illustrated that only PSE with permutation tests remained stable, other methods deviated from the nominal level 0.05, when sample size were larger than 1000. Therefore, PSE statistic with permutation tests had better performances for testing total causal effect or pathspecific effect. Table 3 revealed that the powers of five methods almost remained invariant for total causal effect when the average causal effects β of edge on the specific path became larger and the effect difference δ was set to 1. On the contrary, the power of permutation tests got larger and was close to 1 for the pathspecific effect as average causal effect went up. Besides, Tables 5 and 6 revealed that varying across the effect differences δ, the power on total causal effect and pathspecific effect obviously elevated. Furthermore, we performed sensitivity analysis to observe that in most situations, PSE statistic would be not influenced by the parent nodes or child nodes of nodes on specific path (Tables 7 and 8).
In application analysis, the proposed typical PSE statistic was applied to analyze gene expression data in Mammalian target of rapamycin (mTOR) signal pathway (Fig. 1) of 461 white people from TCGA datasets containing 12,071 genes, 39 genes are successfully mapped to this signaling pathway. The pathway mTOR, a key mediator of phosphatidylinositol3kinase (PI3K) signaling, has emerged as a compelling molecular target in glioblastoma patients, although clinical efforts to target mTOR have not been successful. Here, we support the evidence demonstrating that mTOR is a compelling molecular target in GBM.
Figure 1 showed the mTOR pathway from KEGG dataset (www.kegg.jp) that have been verified to be associated with the survival time of glioblastoma multiforme (GBM) [32,33,34]. The data (sample size N = 461 white people) concerning this pathway containing 39 genes in red boxes were downloaded from “The Cancer Genome Atlas” (TCGA). Our studies results obtained 14 statistically significant positive pathways (Table 9). We stratified the pathspecific effects according to the survival time T (T = 1 if survival time larger than mean survival time 16.65 months of patients diagnosed with GBMs, otherwise T = 0) and adjusted for confounders including age and sex. Furthermore, we found 14 statistically positive specific pathways (Table 9) and most gene have been verified in many studies.
Conclusion
In the framework of systems epidemiology, the proposed permutationbased PSE are valid and powerful for detecting the specific pathway effect contributing to disease, thus potentially providing new insights and ways to unlock the black box of disease mechanism.
Methods
Complex network simplification rules (Fig. 4)
For specific path in complex network, we proposed some rules to simplify complex networks and extract specific path from complex network. Remove irrelevant variables from the causal diagram including (i) no causal effect on the variables of target path (e.g. C_{1} in Fig. 4) and (ii) no causal effect on any variable in the adjustment set (e.g. C_{2} in Fig. 4). These variables will not induce spurious association so can be ignored. Besides considering the influence of direct and indirect causal effect, we need to merge all direct and indirect causal paths between two variables. Finally, confounding paths or noncausal path remained in simplified causal diagram paths.
Pathspecific effect statistic PSE
For the sake of illustration, we take the specific path X_{1} → X_{2} → Y (Fig. 5) as an example. We wish to calculate the pathspecific effect based on the average causal effect. Firstly, according to the expectation dependence of X_{1} and Y, we have
and \( \frac{\partial E\left(Y{x}_1\right)}{\partial {x}_1}={\int}_{\infty}^{+\infty}\frac{\partial F\left(y{x}_1\right)}{\partial {x}_1} dy \).
Take the causal diagram depicted in Fig. 5a as a special case, the average causal effect (ACE) of X_{1} on Y is used to compare the effects of two different levels of X_{1}, i.e., \( {x}_1^{\prime } \) and \( {x}_1^{{\prime\prime} } \). Since p(c do(x_{1})) = p(c) and p(y c, do(x_{1})) = p(y c, x_{1}), we obtain
Similarly, we obtain the ACE of X_{1} on X_{2} as
From p(c do(x_{2})) = p(c) and p(y c, do(x_{2})) = p(y c, x_{2}), we have
Case of continuous variables
We first consider the case of continuous variables depicted in Fig. 5b. By C = (C_{1}, C_{2}), X_{1} ⊥ Y ∣ (C_{2}, X_{2}), we obtain
Applying integration by parts and definite integration:
If the distribution dependence is nondecreasing, so is the expectation dependence.
Theorem 1: For the specific path X_{1} → X_{2} → Y with confounders C, any \( {x}_2^{\prime }>{x}_2^{{\prime\prime} } \), we have
if satisfy the conditions:

1)
\( \frac{\partial E\left(Yc,{x}_2\right)}{\partial {x}_2}\perp C \)
or

2)
\( \left[E\left({X}_2c,{x}_1^{\prime}\right)E\left({X}_2c,{x}_1^{{\prime\prime}}\right)\right]\perp C \)
Proof: For condition 1, according to Eqs. 1 and 2, we have
By Eq. 2, the effect of X_{2} on Y can be written as
From above two equations, we obtain
Similarly, for condition 2, we can obtain
We also have
and
Thus we obtain
In observational studies, in order to estimate the causal effect, we need to adjust for the parent nodes of nodes on the specific path. For instance, for the causal diagram in Fig. 6, according to the backdoor criteria and docalculus [18], the specific path effect of X_{1} → X_{2} → Y, we need to separately adjust for C_{1} and C_{2} by linear regression as follows,
Case of discrete variables
Similarly the results for case of discrete variables can be proved by substituding the partial differentiation and the integration into differencing between adjacent level and summation, respectively. We have
Thus, similar to Eq. 4, we obtain
Similar to Eq. (1), we have
\( ACE\left\{{X}_1\to {X}_2 do\left({x}_1^{\prime}\right), do\left({x}_1^{{\prime\prime}}\right)\right\}=\sum \limits_cp(c)\sum \limits_{x_2}\left\{p\left({X}_2\le {x}_2c,{x}_1^{\prime}\right)p\left({X}_2\le {x}_2c,{x}_1^{{\prime\prime}}\right)\right\}. \)
From Eq. (2), for binary X_{1}, X_{2} and Y, and C is a discrete variable which may have multiple values under the condition of [E(Y c, X_{2} = m + 1) − E(Y c, X_{2} = m)] ⊥ C. We have
Extension to the case of multiple mediators
In specific path X_{1} → X_{2} → ⋯ → X_{K} → Y with continuous confounders C, for any \( {x}_i^{\prime }>{x}_i^{{\prime\prime} },\kern0.5em i=1,2,\cdots, K \), we have
For a discrete variable C, we have
In observational studies, according to backdoor criteria and docalculus [18], for the causal diagram in Fig. 5, the specific path effect of X_{1} → X_{2} → Y via adjusting for the binary parent nodes C_{1} and C_{2} is
Pathspecific statistic for two comparisons
The proposed pathspecific effect statistic (PSE) was based on product of average causal effect (ACE) of each directed edge, and took difference under two conditions (e.g., exposure vs. nonexposure, case vs. control). In order to identify the specific path X_{1} → Y the standardized pathspecific effect in the exposure or case group was defined as
While for nonexposure or control group, the standardized pathspecific effect was
where \( {ACE}^1\left\{{X}_1\to Y do\left({x}_1^{\prime}\right), do\left({x}_1^{{\prime\prime}}\right)\right\} \), \( {ACE}^0\left\{{X}_1\to Y do\left({x}_1^{\prime}\right), do\left({x}_1^{{\prime\prime}}\right)\right\} \) denoted separately the average causal effect in case and control group, \( {S}_{ACE^1\left\{{X}_1\to Y do\left({x}_1^{\prime}\right), do\left({x}_1^{{\prime\prime}}\right)\right\}} \) and \( {S}_{ACE^0\left\{{X}_1\to Y do\left({x}_1^{\prime}\right), do\left({x}_1^{{\prime\prime}}\right)\right\}} \) denoted the standard error of \( {ACE}^1\left\{{X}_1\to Y do\left({x}_1^{\prime}\right), do\left({x}_1^{{\prime\prime}}\right)\right\} \) and \( {ACE}^0\left\{{X}_1\to Y do\left({x}_1^{\prime}\right), do\left({x}_1^{{\prime\prime}}\right)\right\} \) in case and control group, respectively. Hypothesis tests were performed to test whether the two pathspecific effects had significant statistical difference. The null hypothesis and alternative hypothesis were separately equal to
and the test statistic PSE was
The proposed PSE statistic was developed to test the difference of pathspecific effects under two conditions.
Nonparametric permutation and bootstrap tests of PSE
To test whether the specific path contributed to the disease end point, we conducted a series of hypothesis tests. The permutationbased hypothesis tests were conducted as follows: 1) draw a large number of data on disease status (e.g., case and control group) without replacement and estimate PSE in each group, and make difference between two groups and then forms our statistic PSE; 2) Repeat this process to form a permutation distribution under the condition H_{0} is true; 3) obtain the value of statistic actually observed in study and locate the value in permutation distribution to get the P value; 4) reject the null hypothesis (H_{0} : PSE_{1} = PSE_{0}) if the P value is less than 0.05 [22]. While bootstrap tests were performed as follows: 1) draw a large number of bootstrap samples (e.g., 1000) and estimate PSE by two groups to form a bootstrap distribution; 2) define the 95% confidence interval (CI) of the bootstrap distribution; and 3) reject the null hypothesis (H_{0} : PSE_{1} = PSE_{0}) if the CI does not include zero [23]. Three bootstrap interval methods were used including,

1)
The Standard Norm Bootstrap Confidence Interval:

2)
The Basic Bootstrap Confidence Interval:

3)
The Percentile Bootstrap Confidence Interval:

4)
Bias correct confidence interval:
where\( a=\left({\overset{\frown }{z}}_0+\frac{{\overset{\frown }{z}}_0+{z}_{\alpha /2}}{1\overset{\frown }{a}\left({\overset{\frown }{z}}_0+{z}_{\alpha /2}\right)}\right),b=\Phi \left({\overset{\frown }{z}}_0+\frac{{\overset{\frown }{z}}_0+{z}_{1\alpha /2}}{1\overset{\frown }{a}\left({\overset{\frown }{z}}_0+{z}_{1\alpha /2}\right)}\right) \). All codes for automatic searching specific paths linking two continuous variables and adjusting set as well as PSE statistic can be found in supplementary materials.
Simulation
Simulations were conducted to evaluate the performances of PSE statistic in the situation of varying across pathspecific effect difference of PSE_{1} and PSE_{0} (e.g., Case group vs. Control group) and effects of parent nodes or child nodes on nodes on specific path as well as average causal effect value of every edge on specific path. We simulated a complex network on Myocardial Infarction and selected the target specific path: Calorific excess → Visceral adiposit → Inflammatory z ` milieu → Atherosclerosis → Myocardial infarction (Fig. 2) to test our statistic. The simulation process was mainly based on their parent nodes to generate corresponding child nodes by logistic regression model. For instance, to generate the child node Y (Visceral) depends on corresponding parent nodes X_{1} (Calorific) and X_{2}(Physical inactivity), logit(P(Y = 1 X_{1}, X_{2})) = α_{0} + β_{1}X_{1} + β_{2}X_{2}. Furthermore, we set different effects under two conditions on some specific paths and then identify the specific paths with different effects using PSE statistic.
Under the null hypothesis (PSE_{1} = PSE_{0}), given the varied sample sizes (N = 200, 400, 600, 800, 1000), 1000 simulations were conducted to assess the type I error of the PSE by Permutation test and the nonparametric bootstrap tests with confidence interval (CI) estimated by Basic bootstrap, the percentile bootstrap and biascorrected bootstrap methods and asymptotic normal distribution statistic CI. Under H_{1} (PSE_{1} ≠ PSE_{0}). Given the sample sizes 1000, 1000 simulations were repeated to assess the power under varied pathspecific effect difference (Case group vs. Control group) of specific path itself and their parent nodes or child nodes as well as average causal effect value of every edge on specific path, respectively.
Similarly, for continuous variales, according to the causal diagram in Fig. 6, we generated the simulated data via linear regression. We specified the differential directed edge X_{2} → Y in case and control designs and thus led to one differential specific path X_{1} → X_{2} → Y linking X_{1} to Y. The specific path effect in each group can be calculated as follows.

(1)
The specific path effect X_{1} → X_{2} → Y by adjusting for the parent node X_{3}:

(2)
The specific path effect X_{1} → X_{2} → X_{3} → Y by separately adjusting for the parent nodes X_{1} and X_{2}:

(3)
The specific path effect X_{1} → X_{3} → Y by separately adjusting for the parent nodes X_{2} and X_{3}:
Based on the pathway effects in two groups, we can develop PSE statistic via differenting the pathway effects in two groups.
All simulation codes were generated by R software available from CRAN (http://cran.rproject.org/).
Availability of data and materials
The data were downloaded from https://cancergenome.nih.gov/.
Abbreviations
 PSE:

Pathspecific effect
 MTOR:

Mammalian target of rapamycin
 ACE:

Average causal effect
 WHO:

The World Health Organization
 GBM:

Grade IV glioblastoma multiforme
References
 1.
Knox S S. From 'omics' to complex disease: a systems biology approach to geneenvironment interactions in cancer[J]. Cancer Cell International. 2010;10(1):11.
 2.
Nishi A, Milner DA Jr, Giovannucci EL, et al. Integration of molecular pathology, epidemiology and social science for global precision medicine. Expert Rev Mol Diagn. 2016;16(1):11–23.
 3.
Westreich D, Greenland S. The table 2 fallacy: presenting and interpreting confounder and modifier coefficients. Am J Epidemiol. 2013;177(4):292–8.
 4.
VanderWeele TJ, Robins JM. Directed acyclic graphs, sufficient causes and the properties of conditioning on a common effect. Am J Epidemiol. 2007;166:1096–104.
 5.
Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–6.
 6.
Leung EL, Cao ZW, Jiang ZH, et al. Networkbased drug discovery by integrating systems biology and computational technologies. Brief Bioinform. 2013;14:491–505.
 7.
Reuter JA, Spacek D, Snyder MP. Highthroughput sequencing technologies. Mol Cell. 2015;58(4):586–97.
 8.
Dammann O, Gray P, Gressens P, et al. Systems epidemiology: what’s in a name? Online J Public Health Inform. 2014;6(3):e198.
 9.
Berg EL. Systems biology in drug discovery and development. Drug Discov Today. 2014;19:113–25.
 10.
Zhang X, Wang W, Xiao K, et al. Translational medicine: application of omics for drug target discovery and validation. In: William CS, editor. An omics perspective on cancer research. The Netherlands: Springer; 2010. p. 235–47.
 11.
Wu X, Jiang R, Zhang MQ, et al. Networkbased global inference of human disease genes. Mol Syst Biol. 2008;4:189.
 12.
Yates PD, Mukhopadhyay ND. An inferential framework for biological network hypothesis tests. BMC Bioinformatics. 2013;14:94.
 13.
Ji J, Yuan Z, Zhang X, et al. Detection for pathway effect contributing to disease in systems epidemiology with a case–control design. BMJ Open. 2015;5(1):e006721.
 14.
Wang K, Li M, Bucan M. Pathwaybased approaches for analysis of genomewide association studies. Am J Hum Genet. 2007;81:1278–83.
 15.
Isci S, Ozturk C, Jones J, et al. Pathway analysis of highthroughput biological data within a Bayesian network framework. Bioinformatics. 2011;27:1667–74.
 16.
Yu K, Li Q, Bergen AW, et al. Pathway analysis by adaptive combination of p values. Genet Epidemiol. 2009;33:70–9.
 17.
Li C, Han J, Shang D, et al. Identifying disease related subpathways for analysis of genomewide association studies. Gene. 2012;503:101–9.
 18.
Pearl J. Direct and indirect effects. In: Proceedings of UAI01; 2001. p. 411–20.
 19.
Avin C, Shpitser I, Pearl J, et al. Identifiability of pathspecific effects[C]. international joint conference on artificial intelligence. 2005. p. 357–363.
 20.
Shpitser I. Counterfactual graphical models for longitudinal mediation analysis with unobserved confounding. Cogn Sci. 2013;37(6):1011–35.
 21.
Miles C, Shpitser I, Kanki P, et al. Quantifying an adherence pathspecific effect of antiretroviral therapy in the Nigeria PEPFAR program. arXiv preprint arXiv:1411.6028, 2014.
 22.
Good PI. Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses[J]. Technometrics. 1995;37(3):341–42.
 23.
Efron B, Tibshirani RJ. An introduction to the bootstrap. New York: Chapman & Hall; 1993. Direct and Indirect Effect.
 24.
Guertin DA, Sabatini DM. The pharmacology of mTOR inhibition. Sci Signal. 2009;2:e24.
 25.
Sabatini DM. mTOR and cancer: insights into a complex relationship. Nat Rev Cancer. 2006;6:729–34.
 26.
Zhang H, Bajraszewski N, Wu E, et al. PDGFRs are critical for PI3K/Akt activation and negatively regulated by mTOR. J Clin Invest. 2007;117:730–8.
 27.
Guo D, Bell EH, Chakravarti A. Lipid metabolism emerges as a promising target for malignant glioma therapy; 2013.
 28.
Zoncu R, Sabatini DM, Efeyan A. mTOR: from growth signal integration to cancer, diabetes and ageing. Nat Rev Mol Cell Biol. 2011;12(1):21.
 29.
Sarbassov DD, Guertin DA, Ali SM, Sabatini DM. Phosphorylation and regulation of Akt/PKB by the rictormTOR complex. Science. 2005;307:1098–101.
 30.
DussaussoisMontagne A, Jaillet J, Babin L, et al. SETMAR isoforms in glioblastoma: a matter of protein stability. Oncotarget. 2017;8(6):9835.
 31.
Inoki K, Li Y, Xu T, et al. Rheb GTPase is a direct target of TSC2 GAP activity and regulates mTOR signaling. Genes Dev. 2003;17(15):1829–34.
 32.
Maris C, D'Haene N, Trépant AL, et al. IGFIR: a new prognostic biomarker for human glioblastoma. Br J Cancer. 2015;113(5):729.
 33.
Van Gorp AGM, Van Der Vos KE, Brenkman AB, et al. AGC kinases regulate phosphorylation and activation of eukaryotic translation initiation factor 4B. Oncogene. 2009;28(1):95.
 34.
Kim Y. Regulation of cell proliferation and migration in glioblastoma: new therapeutic approach. Front Oncol. 2013;3:53.
Acknowledgments
The authors wish to acknowledge the editor and anonymous reviewers for their invaluable work and the participants who participate in the study.
Funding
Publication of this article is funded by 863 Program 2015AA020507. The design of the study and collection, analysis are funded by 973 program 2015CBB56000, and the interpretation of data and writing the manuscript are funded by National Natural Science Foundation of China 11331011, 11771028, 91630314, 81573259, 81773547, 81873439.
Author information
Affiliations
Contributions
HKL and ZG helped conduct the literature review and prepare the Methods and the theoretical sections of the text. XRS and YYY designed the study’s simulation strategy. FZX and HKL designed the study and directed its implementation. All authors have read and approved the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
No human, animal or plant experiments were performed in this study, and ethics committee approval was therefore not required.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1.
Codes for automatic calculating PSE statistic of all specific paths linking any two continuous variables.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Li, H., Geng, Z., Sun, X. et al. A novel pathspecific effect statistic for identifying the differential specific paths in systems epidemiology. BMC Genet 21, 85 (2020). https://doi.org/10.1186/s1286302000876w
Received:
Accepted:
Published:
Keywords
 Causal diagram model
 Causal inference
 Identification
 Pathspecific effect