Skip to main content

Network-based cancer genomic data integration for pattern discovery

Abstract

Background

Since genes involved in the same biological modules usually present correlated expression profiles, lots of computational methods have been proposed to identify gene functional modules based on the expression profiles data. Recently, Sparse Singular Value Decomposition (SSVD) method has been proposed to bicluster gene expression data to identify gene modules. However, this model can only handle the gene expression data where no gene interaction information is integrated. Ignoring the prior gene interaction information may produce the identified gene modules hard to be biologically interpreted.

Results

In this paper, we develop a Sparse Network-regularized SVD (SNSVD) method that integrates a prior gene interaction network from a protein protein interaction network and gene expression data to identify underlying gene functional modules. The results on a set of simulated data show that SNSVD is more effective than the traditional SVD-based methods. The further experiment results on real cancer genomic data show that most co-expressed modules are not only significantly enriched on GO/KEGG pathways, but also correspond to dense sub-networks in the prior gene interaction network. Besides, we also use our method to identify ten differentially co-expressed miRNA-gene modules by integrating matched miRNA and mRNA expression data of breast cancer from The Cancer Genome Atlas (TCGA). Several important breast cancer related miRNA-gene modules are discovered.

Conclusions

All the results demonstrate that SNSVD can overcome the drawbacks of SSVD and capture more biologically relevant functional modules by incorporating a prior gene interaction network. These identified functional modules may provide a new perspective to understand the diagnostics, occurrence and progression of cancer.

Background

With the rapid development of (single-cell) RNA-Seq and microarray technologies, huge number of cancer genomic data have been generated [13]. The data provide some new opportunities to study on the gene cooperative mechanisms [48]. Based on the hypothesis that genes with similar functions may show similar expression patterns, clustering techniques have been used to identify co-expressed gene sets in which genes present similar expression patterns across all samples. However, these traditional clustering techniques face with the limitation that some genes can co-regulate across some samples rather than all samples in the real biological systems [9]. Therefore, many biclustering methods [4, 1013] are proposed to discover some co-expressed gene sets in which genes present similar expression patterns across some samples.

Recently, several Sparse Singular Value Decomposition (SSVD) based methods have been proposed for biclustering gene expression data to discover gene functional modules (biclusters) [14], such as ALSVD [4], L0SVD [15], and so on. However, most of them ignore the prior gene interaction network knowledge from a protein protein interaction (PPI) network, whereas such information is very useful to improve biological interpretability of discovered gene modules [1618]. The PPI network has been used in many biological applications for accurate discovery or better biological interpretability [16, 1922]. However, as far as we know, there is very little work to incorporate the gene interaction network knowledge from PPI network into the bi-clustering framework. To address it, we integrate the gene network in the SSVD model for biclustering gene expression [23].

In this paper, we develop a sparse network-regularized SVD (SNSVD) model to identify gene functional modules by integrating gene expression data and a prior gene interaction network from a PPI network (Fig. 1). To ensure the discovered gene modules in which genes are co-expressed and densely connected in the prior PPI network, we introduce a sparse network-regularized penalty [20] in the model. Compared with the traditional regularized penalties (e.g., LASSO [24]), the sparse network-regularized penalty can make the biclustering process tend to select correlated and interacted genes for enhancing biological interpretability of gene modules. We present an alternating iterative algorithm to solve the SNSVD model. We evaluate the performance of SNSVD using a set of artificial data sets and gene expression data from the Cancer Genome Project (CGP) [3], and compare its performance with other representative SSVD methods. We investigate the functionality of these identified modules from multiple perspectives. The results show that SNSVD can identify more biologically relevant gene sets and improve their biological interpretations.

Fig. 1
figure 1

Overall workflow of Sparse Network-regularized Singular Value Decomposition (SNSVD). SNSVD integrates both a gene expression and a normalized Laplacian matrix L encoding a protein-protein interaction (PPI) network to identify gene functional modules. Based on the output of SNSVD (i.e., sparse singular vectors u and v), we can identify a gene module whose members are from the nonzero elements of u and v. Herein, we show a toy example to explain how SNSVD works. The gene module identified by SNSVD contains four genes (g1,g2,g3,g4) and five samples (s1,s2,s3,s4,s5), where the four genes are correlated across the five samples and the four genes correspond to a dense subnetwork of PPI network

Additionally, we present a framework based on SNSVD for analyzing matched miRNA and mRNA expression data to identify differentially co-expressed miRNA-gene modules. Extensive results when applying onto TCGA breast cancer data demonstrate that the identified miRNA-gene modules provide a new perspective for diagnosis and discrimination between two status of breast cancer.

Results

Simulation study

We evaluated the performance of SNSVD on the simulated data by comparing it with other sparse SVD based methods including L0SVD [15], ALSVD [4] and SCADSVD [4, 25]. Without loss of generality, we define a rank-one true signal matrix as uvT where u and v are vectors of size p×1 and n×1, respectively. The observed matrix is defined as X=uvT+γε, where ε is a noise matrix each element in which is randomly sampled from a standard normal distribution and γ is a nonnegative parameter to control the signal-to-noise ratio (SNR).

To generate the simulated data, we first generated two sparse singular vectors u and v with p=200,n=100 whose first 50 elements equal to 1 (non-zeros), and the remaining ones are zeros. Then we created a series of observation matrices X for each γ ranging from 0.02 to 0.06 in steps of 0.005. In addition, we created a prior “PPI” network for row variables of X, where any two nodes in first 50 vertices are connected with probability p11=0.3, and remaining ones are connected with probability p12=0.1. For each γ, we generated 50 different noise matrices ε to got 50 observed matrices X for testing. The average sensitivity, specificity and accuracy of u (or v) on the 50 matrices X were calculated. Moreover, we set σ=0.5 according to 5-fold cross validation test, and forced u and v to contain 50 non-zero elements with same sparsity level for each method by tuning the parameters so that the results of different methods are comparable. The average sensitivities, specificities and accuracies of u (or v) with different γ were compared in Fig. 2. We found that the performance of our proposed method (SNSVD) is superior to that of other methods. The results illustrate that SNSVD model can enhance the power of variable selection by integrating the prior network knowledge.

Fig. 2
figure 2

Performance of different methods on simulated data when γ is varied (γ is a parameter to control the signal-to-noise ratio). “Sensitivity” denotes the percentage of true non-zero entries in the identified vector, “Specificity” denotes the percentage of true zero entries in the identified vector, and “Accuracy” denotes classification accuracy

Application to the CGP gene expression data sets

We further investigated the performance of SNSVD by using the gene expression data with 641 cell lines including diverse cancer types and tissues from the Cancer Genome Project (CGP) (http://www.cancerrxgene.org/downloads) [3], and a PPI network from the Pathway-Commons database [26]. In total, there are 13,321 genes and 262,462 interactions in the PPI network. The 641 cell lines are from 16 tissues or 52 cancer types in the CGP data, where a tissue type contains about 40 cell lines and a cancer type contains about 12 cell lines.

Identifying functional modules

We set σ=100 according to 5-fold cross validation test, and set kv=50 (control the sample sparsity). In addition, we also selected a suitable λ to force the estimated u only containing 200 nonzero elements (control the gene sparsity). Using Algorithm 3, we identified the first 40 pairs of singular vectors {(u1,v1),,(u40,v40)}. Let U=[u1; ;u40] and V=[v1; ;v40], where the ith column of U and V correspond to the ith pair of sparse singular vectors. To reduce the false positive cases, we first calculated absolute z-score for each column of U (or V) according to Eq. (1). For each non-zero xij, we define the following formula:

$$ z_{{ij}} = \frac{||x_{{ij}}|-\mu_{j}|}{\sigma_{j}}, $$
(1)

where xij is i-th element in uj (or vj), μj is the average value of all non-zero elements in uj (or vj), and σj is their standard deviation. If zij is greater than a given threshold, the corresponding gene (or sample) is then selected into the module j. Herein, we obtained 40 gene functional modules with 160 genes and 40 samples in average (Fig. 3).

Fig. 3
figure 3

Distribution of the number of samples (i.e., cell lines) and genes from the identified modules by SNSVD on the CGP data

Functional analysis of the genes in modules

Firstly, we investigated whether the genes within the same modules are significantly co-expressed by calculating the modularity score in Eq. (16), the result showed that all identified modules were statistically significant with p-value <0.01 by using one-sided Wilcoxon signed rank test (Fig. 4).

Fig. 4
figure 4

The modularity scores of the identified modules by SNSVD on the CGP data. The red line in plot denotes an average modularity score of randomized modules (gene sets)

Secondly, we also investigated whether the genes within the same modules are connected with each other in the prior PPI network via the gene-gene interaction enrichment score. The result showed that 57% of the 40 modules were significantly inter-connected with each other in the PPI network, illustrating that our method tend to cluster genes interacting with each other.

Finally, we also checked the biological relevance of all the identified gene modules using gene functional enrichment analysis via DAVID online web tool [27]. By selecting the GO BP (Gene Ontology Biological Process) and KEGG pathways with Benjamini-Hochberg adjusted p-values<0.05 as significant ones, we obtained 766 significant GO BP pathways and 70 significant KEGG pathways. By statistically, 62.5% modules are significantly related with at least one GO BP pathways and 42.5% modules are significantly related with at least one KEGG pathways.

Functional analysis of the samples in modules

To evaluate the subtype-specific of samples in the identified modules, we computed the overlapping significance level of between module-samples and cancer/tissue specific samples. For each gene module, we first collected a sample set from the module. We then computed the overlapping significance levels between the sample set and any one tissue-sample set using the right hypergeometric test (Fig. 5A), and the overlapping significance levels between the sample set and any one cancer-sample set (Fig. 5B). We found that most of the identified gene modules can be seen as subtype-specific gene functional modules, which provide insights into the mechanisms of the relationship between different tissues and cancers.

Fig. 5
figure 5

These identified gene modules by SNSVD are subtype-specific related to some tissues or cancer types. A is a heatmap in term of different tissues. B is a heatmap in term of different cancer types. Note that each blue square in the two heatmaps corresponds to a significance overlapping relationship (Hypergeometric test, p <0.05)

Additionally, we also found that the cancer/tissue types of some modules are consistent with their corresponding functional pathways. Some examples are listed in Table 1 (See also Fig. 5). For example, module 1 contains 47 cell lines significantly overlapping with blood tissue and some blood-related cancers (e.g., AML, B cell leukemia, B cell lymphoma, lymphoblastic leukemia, lymphoblastic T cell leukaemia, lymphoid_neoplasm other), while the top enriched GO/KEGG pathways of 174 genes in module 1 are related to the immune system. Some previous work have reported that the development of blood-related cancers are associated with immune pathway abnormalities [28, 29]. Similarly, these samples in module 2 are also significantly related with some blood-related cancers (B cell leukemia, B cell lymphoma, Burkitt lymphoma, lymphoblastic leukemia, and lymphoid_neoplasm other), while some genes in which are significantly enriched in some immune-related pathways. These samples in module 4 are significantly related with central nervous system (CNS), while some genes in which are significantly enriched in nervous system related GO/KEGG pathways.

Table 1 The first five enriched GO/KEGG pathways of top ten modules identified by SNSVD on the CGP data where “P-value” denotes Benjamini-Hochberg adjusted P-value

Finally, we also evaluated whether the identified 40 modules are greatly overlapped. Since each module contains a gene set and a sample set. To assess the overlapping relationship between two different modules. For any two gene modules, we computed the overlapping significance level p1 and p2 between their gene sets and sample sets respectively by using the right-tailed hypergeometric test. If p1<0.05 and p2<0.05, then we considered that the two modules are significant overlapped. Among all 780 module-module pairs for the identified 40 modules, we found that only 17 out of the 780 module-module pairs are significantly overlapped (Fig. 6), showing that our method can find diverse functional modules.

Fig. 6
figure 6

The overlapping significance level between any two identified modules by SNSVD on the CGP data. Each gray corresponds to a significance overlapping relationship (Hypergeometric test, p <0.05)

Comparison with sparse SVD on the CGP gene expression data sets

Since L0SVD have shown good performances in simulation study compared to other sparse SVD methods, we compared it with our method to further illustrate the importance of integrating the PPI network. To this end, we also identified 40 gene modules on the CGP data by using L0SVD and Fig. 7 shows the comparing results. We found that the interaction enrichment scores of the identified modules by SNSVD were significantly higher than that by L0SVD (one-sided Wilcoxon signed rank test p-value <0.01) (Fig. 7A). These results demonstrate that SNSVD can find more tightly connected genes than L0SVD by integrating the PPI network. Furthermore, SNSVD obtains a greater number of significant GO BP terms at different levels than L0SVD (one-sided Wilcoxon signed rank test p-value <0.001) (Fig. 7B), showing that incorporating the PPI network does help SNSVD to discover more biological interpretable modules.

Fig. 7
figure 7

(A) Comparison of the gene-gene interaction enrichment scores of the identified modules by SNSVD and L0SVD, respectively, on the CGP data. (B) Functional enrichment comparison based on the number of GO BP (Gene Ontology Biological Process) terms

Application to the BRCA data sets

Data and preprocessing

We downloaded the processed RNA-seq and miRNA-seq data of Breast invasive carcinoma (BRCA) from TCGA database [30] (Broad GDAC Firehose: http://firebrowse.org/). We firstly filtered out the genes and the miRNAs which are not expressed in more than 70% samples and the raw gene/miRNA expression values were log2-transformed. Secondly, we used the wilcoxon rank sum test to identify differentially expressed genes/miRNAs with bonferroni adjusted p-value <0.05 between cancer and adjacent normal samples. It causes 9896 differentially expressed genes and 320 differentially expressed miRNAs to be preserved. Thirdly, we imputed the missing values of miRNA and gene expression data by using k-nearest neighbors [31]. Finally, we extracted the matched gene and miRNA expression matrices across cancer and adjacent normal samples, where A1 and B1 represent gene and miRNA expression data of cancer samples, respectively and (ii) A2 and B2 represent gene and miRNA expression data of adjacent normal samples, respectively (Fig. 8A). There are 9896 genes and 320 miRNAs, 760 cancer samples and 87 adjacent normal samples in the BRCA data sets.

Fig. 8
figure 8

A framework based on SNSVD method for identifying differentially co-expressed miRNA-gene modules on the BRCA data. A Calculating two miRNA-gene correlation matrices X1 and X2 using the Pearson correlation method based on the four expression matrices A1,A2,B1 and B2 whose rows were centered and scaled. B We first obtain the differential correlation matrix X by using X=X1X2. Then, SNSVD is used to identify top ten differentially co-expressed miRNA-gene modules by integrating the differentially co-expressed matrix X and a PPI network

Additionally, we also downloaded a PPI network from Pathway-Commons database [26], and collected a set of cancer genes from the allOnco database (http://www.bushmanlab.org/links/genelists) which merges some different cancer genes from several databases, and a set of cancer miRNAs from the reference [32].

Identifying differentially co-expressed miRNA-gene modules

Recent research revealed that some abnormal miRNA-gene regulatory relationship plays key roles in tumor progression and development [3335]. Some computational methods have been proposed for identifying miRNA-gene co-expressed modules by using matched miRNA and mRNA expression data of cancer [13, 3640]. Though power, these methods do not ensure that the miRNAs and genes in a module are differentially expressed between two biological conditions. Besides, some methods have already been developed for differential co-expression analysis [4144]. However, these methods only focus on single gene expression data analysis. To this end, we proposed a new framework based on SNSVD for analyzing matched miRNA and mRNA expression data between two biological conditions to identify differentially co-expressed miRNA-gene modules (Fig. 8).

Herein, we applied SNSVD to the BRCA data and empirically set λ,kv in SNSVD to yield top ten differentially co-expressed modules for each σ. Each identified miRNA-gene module contain about 10 miRNAs and 100 genes. Formally, a miRNA-gene module contains a miRNA set and a gene set. We found that as σ becomes larger, the modules identified by SNSVD contain more edges (Table 2). The results showed SNSVD could overcome the drawbacks of sparse SVD (SNSVD with σ=0 in Table 2) to capture the modules with more edges by incorporating the PPI network.

Table 2 Application of SNSVD to the BRCA data. “edge.avg” represents the average of number of edges of modules in the PPI network, and “Fold Change” represents the fold change of “edge.avg” between the identified modules and random modules, and “d.avg” denotes the average of singular values of modules

Functional analysis of modules

Without loss of generality, the ten modules identified by SNSVD with σ=60 (See Table 2) were considered for further biological analysis. We found that (i) the average adPCC (absolute differential Pearson Correlation Coefficient) for the identified modules by SNSVD on the BRCA data is larger than the average of all absolute elements of X (Wilcoxon rank-sum test, p<1e−16) (Fig. 9A); (ii) more than half of the miRNAs in the 70% (7 of 10) modules are cancer miRNAs, and 80% (8 of 10) modules are significantly enriched at least one KEGG/GO BP pathway (Benjamini-Hochberg adjusted p<0.05); (iii) three modules (module 2, 3 and 8) contain significantly more cancer genes with hypergeometric test, p<0.05. More results are shown in Fig. 9B. Additionally, we obtained 39 miRNAs and 961 genes by combining the identified ten modules. We found that about 50% (19 of 39) miRNAs are cancer miRNAs, and about 21% (203 of 961) genes are cancer genes (hypergeometric test, p<4.3e−6).

Fig. 9
figure 9

Biological function analysis of the identified ten differentially co-expressed miRNA-gene modules by SNSVD with σ=60. A For each module, the distribution (pink area) is fitted based on the absolute values of all the elements in the differentially co-expressed matrix X, and the distribution (light blue area) is fitted based on the absolute values of the elements from the module corresponding submatrix in X. P-values were computed by using permutation test. B For each module, “Average of adPCC” is the average of the absolute values of the elements from the module corresponding submatrix in X. “#Gene edges”, “#Cancer gene”, “#Cancer miRNA”, “#KEGG pathways” and “#GOBP pathways” represent the number of interaction edges, cancer genes, cancer miRNAs, significantly enriched KEGG pathways and GO BP pathways (Benjamini-Hochberg adjusted p<0.05), respectively

Discussion

In our previous work, SSVD has been developed for module discovery and its effectiveness has been demonstrated [13]. However, it cannot integrate the gene network data from PPI network. To this end, we develop the SNSVD method that integrates gene expression data and a gene interaction network to identify underlying gene functional modules. In the SNSVD, we define a sparse network regularized function which is a combination of L1-regularized norm and network-regularized norm to make the biclustering process tend to select interacted genes in the prior gene interaction network. Experimental results on the CGP and BRCA data demonstrate that SNSVD can overcome the drawbacks of SSVD. Although SNSVD is an effective method, some further studies are deserved to investigate: (1) extend SNSVD to identify non-linear relationships; (2) extend SNSVD to integrate other omics data, such as DNA methylation data; (3) apply SNSVD to other biological problems.

Conclusions

In this paper, we presented a Sparse Network-regularized SVD (SNSVD) model for network-based cancer genomics data integration analysis and developed an alternating iterative algorithm to solve the model. By comparing with other representative methods on the simulated data and the real data, we found that SNSVD could find modules with high qualities by integrating the PPI interaction network. By investigating the modules identified by SNSVD on the CGP data, we found that all the genes within the same modules are co-expressed, and most genes in the same modules are connected with each other in the prior PPI network and enriched in at least one gene functional term. Besides, we also applied our method to the BRCA data from TCGA database for identifying ten differentially co-expressed miRNA-gene modules. Some breast cancer related miRNA-gene modules were discovered. To sum up, our work provides a promising way to integrate the network information into the sparse SVD framework, which can help to find biologically significant functional modules and makes the results easily interpreted. An R package of SNSVD is available at https://github.com/wenwenmin/SNSVD.

Methods

Sparse network-regularized SVD (SNSVD) model

Let \(\boldsymbol {X}\in \mathbb {R}^{p\times n}\) (p genes and n samples) be the gene expression data. Suppose \(\boldsymbol {A} \in \mathbb {R}^{p\times p}\) is an adjacency matrix of a PPI network, where Aij=1 if vertex i and j is connected and Aij=0 otherwise. Thus, the normalized Laplacian matrix L=(Lij)p×p encoding the PPI network can be defined as:

$$\begin{array}{@{}rcl@{}} \boldsymbol{L}_{{ij}}= \left\{\begin{array}{ll} 1, &\text{if}~i=j~\text{and}~d_{i}\neq 0,\cr -\frac{\boldsymbol{A}_{{ij}}}{\sqrt{d_{i}d_{j}}}, &\text{if}~i~\text{and}~j~\text{are~adjacent},\cr 0, &\text{otherwise}. \end{array}\right. \end{array} $$
(2)

where \(d_{i} = \sum _{j=1}^{p} \boldsymbol {A}_{{ij}}\). Correspondingly, we have \(\boldsymbol {u}^{T}\boldsymbol {L}\boldsymbol {u} = \frac {1}{2}\sum _{i}\sum _{j}\boldsymbol {A}_{{ij}}\left (\frac {u_{i}}{\sqrt {d_{i}}}-\frac {u_{j}}{\sqrt {d_{j}}}\right)^{2}\), which encourages the estimated coefficients of u to be smooth over adjacent genes in the PPI network A [20]. To further force u to be sparse, we introduce a sparse network-regularized penalty:

$$ P_{1}(\boldsymbol{u}) = \lambda \|\boldsymbol{u}\|_{1}+ \sigma \boldsymbol{u}^{T}\boldsymbol{L}\boldsymbol{u}, $$
(3)

where λ and σ are two parameters. In the penalty (3), the L1 norm (u1) is to induce sparsity in u; and the quadratic Laplacian norm (uTLu) makes the selected genes tend to connect with each other in the PPI network.

To integrate the network information in SVD framework, we present a sparse network-regularized SVD (SNSVD) model as follows:

$$ \begin{aligned} & \underset{\boldsymbol{u},\boldsymbol{v},d}{\text{minimize}} && \|\boldsymbol{X} - d\boldsymbol{uv}^{T}\|_{F}^{2}\\ & \text{subject to} && \|\boldsymbol{u}\|_{2}^{2} \leq 1, \lambda \|\boldsymbol{u}\|_{1} + \sigma \boldsymbol{u}^{T}\boldsymbol{L}\boldsymbol{u} \leq c_{1},\\ & && \|\boldsymbol{v}\|_{2}^{2} \leq 1, \|\boldsymbol{v}\|_{0} \leq k_{v}, \end{aligned} $$
(4)

where c1 and kv are two parameters to control the number of selected genes and samples separately. As for the samples, we simply use a L0-regularized penalty on v sample variables (corresponding to sample variables) to induce sparseness. Compared to L1-norm, L0-norm is known as the most essential sparsity measure and has nice theoretical properties [15, 45].

SNSVD algorithm

Since \(||\boldsymbol {X}-d\boldsymbol {u}\boldsymbol {v}^{T}||_{F}^{2} = tr\left (\boldsymbol {X}\boldsymbol {X}^{T}\right)+d^{2}tr\left (\boldsymbol {u}\boldsymbol {v}^{T}\boldsymbol {v}\boldsymbol {u}^{T}\right)-2d\boldsymbol {u}^{T}\boldsymbol {X}\boldsymbol {v}\), where tr(·) denotes the trace of a matrix; Both u and v are guaranteed to be two unit vectors, tr(uvTvuT)=tr(uTuvTv)=1. Minimizing \(||\boldsymbol {X}-d\boldsymbol {u}\boldsymbol {v}^{T}||_{F}^{2}\) in Eq. (4) is equivalent to minimizing −uTXv. Although there are three parameters u,v and d to be optimized in Eq. (4). It is notable that once u and v are fixed, then d can be determined d=uTXv in Eq. (4). Thus, to solve Eq. (4), we just need to optimize u and v. Inspired by Ref. [46], we present an alternating iterative strategy to solve u and v, i.e., fixing v to update u and fixing u to update v.

Fixing v in Eq. (4), it is equivalent to solve the following sub-problem:

$$ \begin{aligned} & \underset{\boldsymbol{u}}{\text{minimize}} && {-\boldsymbol{u}^{T}\boldsymbol{X}\boldsymbol{v}}\\ & \text{subject to} && \|\boldsymbol{u}\|_{2}^{2} \leq 1, \lambda \|\boldsymbol{u}\|_{1}+ \sigma \boldsymbol{u}^{T}\boldsymbol{L}\boldsymbol{u} \leq c_{1}. \end{aligned} $$
(5)

Let z=Xv, the optimization problem in (5) can be redefined as follows:

$$ \begin{aligned} & \underset{\boldsymbol{u}}{\text{minimize}} && -\boldsymbol{u}^{T}\boldsymbol{z}\\ & \text{subject to} && \|\boldsymbol{u}\|_{2}^{2} \leq 1, \lambda \|\boldsymbol{u}\|_{1}+ \sigma \boldsymbol{u}^{T}\boldsymbol{L}\boldsymbol{u} \leq c_{1}. \end{aligned} $$
(6)

To solve it, we write its Lagrangian form as follows:

$$ \mathcal{L}(\boldsymbol{u}) = -\boldsymbol{u}^{T}\boldsymbol{z}+ \eta \boldsymbol{u}^{T}\boldsymbol{u} + \lambda \|\boldsymbol{u}\|_{1} + \sigma \boldsymbol{u}^{T}\boldsymbol{L}\boldsymbol{u}, $$
(7)

where λ≥0, η≥0,σ≥0. In order to facilitate the calculation without loss of generality, we use \(\frac {1}{2} \eta \) instead of \(\eta, \frac {1}{2} \sigma \) instead of σ, then Eq. (7) can be rewritten as:

$$ \mathcal{L}(\boldsymbol{u}) = -\boldsymbol{u}^{T}\boldsymbol{z}+\frac{1}{2}\eta \boldsymbol{u}^{T}\boldsymbol{u} + \lambda \|\boldsymbol{u}\|_{1} + \frac{1}{2} \sigma \boldsymbol{u}^{T}\boldsymbol{L}\boldsymbol{u}. $$
(8)

It is a convex function with respect to u, therefore its optimal solution can be characterized by some sub-gradient equations (see e.g., [47]). Since L=ID−1/2AD−1/2 (based on Eq. 2). For convenience, let W=IL=D−1/2AD−1/2 (D is a diagonal matrix and \(\boldsymbol {D}_{{ii}} = \sum _{j} \boldsymbol {A}_{{ij}}\)), then we have the sub-gradient equations of Eq. (8) as:

$$ \frac{\partial \mathcal{L}}{\partial \boldsymbol{u}_{j}} = -\boldsymbol{z}_{j} + \eta \boldsymbol{u}_{j} + \lambda s_{j} + \sigma \boldsymbol{u}_{j} -\sigma \boldsymbol{W}_{j}\boldsymbol{u} = 0,~j = 1,\cdots,p. $$
(9)

where sj=sign(uj) if uj≠0 and sj{t,|t|≤1} otherwise; and Wj is the jth row of matrix W. Let the solution of (8) be \( \widehat {\boldsymbol {u}}=\left (\widehat {\boldsymbol {u}}_{1}, \widehat {\boldsymbol {u}}_{2}, \cdots, \widehat {\boldsymbol {u}}_{p}\right)\). By using a coordinate descent method [48, 49], we obtain the following coordinate update rule for \(\widehat {\boldsymbol {u}}_{j}\):

$$\begin{array}{@{}rcl@{}} \widehat{\boldsymbol{u}}_{j}= \left\{\begin{array}{ll} 0 &\text{if}~|\boldsymbol{z}_{j} + \sigma \boldsymbol{W}_{j}\widehat{\boldsymbol{u}}| \leq \lambda,\cr \frac{\boldsymbol{z}_{j} + \sigma \boldsymbol{W}_{j}\widehat{\boldsymbol{u}}- \lambda \text{sign}(\widehat{\boldsymbol{u}}_{j})}{\eta+\sigma} &\text{otherwise}. \end{array}\right. \end{array} $$
(10)

Define \(\mathcal {S}(a,\lambda) = \text {sign}(a)(|a|-\lambda)_{+}\), we have \(\widehat {\boldsymbol {u}}_{j} = \mathcal {S}(\boldsymbol {z}_{j} + \sigma \boldsymbol {W}_{j} \widehat {\boldsymbol {u}}, \lambda)/(\eta +\sigma)\). Let \(\breve {\boldsymbol {u}}_{j} = \mathcal (\boldsymbol {z}_{j} + \sigma \boldsymbol {W}_{j}\widehat {\boldsymbol {u}}, \lambda)\) and \(\breve {\boldsymbol {u}}=\left (\breve {\boldsymbol {u}}_{1},...,\breve {\boldsymbol {u}}_{p}\right)^{T}\), we can obtain a normalized solution \(\boldsymbol {u}=\frac {\widehat {\boldsymbol {u}}}{\| \widehat {\boldsymbol {u}} \|_{2}}=\frac {\breve {\boldsymbol {u}}}{\| \breve {\boldsymbol {u}} \|_{2}}\). In a word, we use a coordinate descent method to minimize Eq. (8) and update one uj at a time while keeping uk fixed for all kj.

Fixing u in Eq. (4), it is equivalent to solve the following sub-problem:

$$ \begin{aligned} & \underset{\boldsymbol{v}}{\text{minimize}} && \|\boldsymbol{X} - d\boldsymbol{uv}^{T}\|_{F}^{2}\\ & \text{subject to} && \|\boldsymbol{v}\|_{2}^{2} \leq 1, \|\boldsymbol{v}\|_{0} \leq k_{v}. \end{aligned} $$
(11)

Let \(\boldsymbol {z}_{v} = \boldsymbol {X}^{T}\boldsymbol {u}, \boldsymbol {\widehat {v}} = d\boldsymbol {v}\), we thus have \(\|\boldsymbol {X} - d\boldsymbol {uv}^{T}\|_{F}^{2} = \|\boldsymbol {z}_{v}-\boldsymbol {\widehat {v}}\|_{2}^{2} + c \), where c=tr(XTX)−uTXXTu. Obviously c is a constant value with respect to v. Thus problem (11) is equivalent to:

$$ \min_{\boldsymbol{\widehat{v}}} \|\boldsymbol{z}_{v} - \boldsymbol{\widehat{v}}\|_{2}^{2},~~\text{subject to}~\|\boldsymbol{\widehat{v}}\|_{0} \leq k_{v}. $$
(12)

Its optimal solution is \(\boldsymbol {\widehat {v}} = \boldsymbol {z}_{v} \bullet I(|\boldsymbol {z}_{v}| \geq |\boldsymbol {z}_{v}|_{(k_{v})})\) where I(·) is the indicator function and "" is point multiplication function, and |zv|(i) denotes the i-th order statistic of |zv|, i.e. |zv|(1)≥|zv|(2)≥,...,≥|zv|(n). In other words, we only keep the kv variables of zv corresponding to its kv largest absolute values. The normalized optimal solution of Eq. (11) is \(\boldsymbol {v} = \widehat {\boldsymbol {v}}/\|\widehat {\boldsymbol {v}}\|_{2}\), i.e.,

$$ \boldsymbol{v} = \frac{\boldsymbol{z}_{v}\bullet I(|\boldsymbol{z}_{v}| \geq |\boldsymbol{z}_{v}|_{(k_{v})})}{\|\boldsymbol{z}_{v}\bullet I(|\boldsymbol{z}_{v}| \geq |\boldsymbol{z}_{v}|_{(k_{v})})\|_{2}}. $$
(13)

Finally, we develop an alternating iterative algorithm by alternately updating u and v to solve SNSVD model. The details of this algorithm is given in Algorithm 1, and its time complexity is \(\mathcal {O}\left (Tnp + Tp^{2} + Tn^{2}\right)\), where T is the number of iterations.

Convergence analysis of SNSVD algorithm

Next, we give the convergence analysis of Algorithm 1. In fact, Algorithm 1 is to solve the Lagrangian form of problem (4) as follows:

$$ \begin{aligned} & \underset{\boldsymbol{u},\boldsymbol{v},d}{\text{minimize}} && -\boldsymbol{u}^{T}\boldsymbol{X}\boldsymbol{v} + \lambda \|\boldsymbol{u}\|_{1} + \sigma \boldsymbol{u}^{T}\boldsymbol{L}\boldsymbol{u}\\ & \text{subject to} && \|\boldsymbol{u}\|_{2}^{2} \leq 1, \|\boldsymbol{v}\|_{2}^{2} \leq 1, \|\boldsymbol{v}\|_{0} \leq k_{v}. \end{aligned} $$
(14)

Let H(u,v)=−uTXv+σuTLu,f(u)=ρ(u)+λu1 and g(v)=ρ(v)+τ(v,kv) where

$$\begin{array}{*{20}l} \rho(\boldsymbol{u}) & = \left\{\begin{array}{ll} 0, &\text{if}~\|\boldsymbol{u}\|_{2}^{2} \leq 1\cr +\infty, &\text{otherwise}. \end{array}\right. \end{array} $$
(15a)
$$\begin{array}{*{20}l} \rho(\boldsymbol{v}) & = \left\{\begin{array}{ll} 0, &\text{if}~\|\boldsymbol{v}\|_{2}^{2} \leq 1\cr +\infty, &\text{otherwise}. \end{array}\right. \end{array} $$
(15b)
$$\begin{array}{*{20}l} \tau(\boldsymbol{v},k_{v}) & = \left\{\begin{array}{ll} 0,&\text{if}~\|\boldsymbol{v}\|_{0}\leq k_{v}\cr +\infty,&\text{otherwise}. \end{array}\right. \end{array} $$
(15c)

Therefore the Lagrangian form of problem (4) can be written as F(u,v)=H(u,v)+f(u)+g(v) which is a semialgebraic function [46]. Based on the Theorem 1 in [46], Algorithm 1 converges to a critical point of F(u,v).

Parameter selection of SNSVD algorithm

As to λ and kv’s choice in Algorithm 1 when it is applied to the CGP gene expression data, we select a suitable λ to force the estimated u only containing 200 nonzero elements which is beneficial for further analysis of the biological function of the module and set kv=50 (control the sample sparsity) which ensures that the number of samples within in the module is approximately the same as the number of samples of a subtype. As to σ’s choice in Algorithm 1, we present a 5-fold cross-validation framework (Algorithm 2). To this end, we define a binary matrix is.na(X) with the same size of X and the elements are 1 if they are missing in X, 0 otherwise.

Learning multiple pairs of singular vectors using SNSVD

It is notable that every run of Algorithm 1 can only obtain a pair of sparse singular vectors u and v (Fig. 1). In order to identify multiple modules, we can repeat running Algorithm 1. After each turn of the iteration, we use the obtained u,v and d to modify the gene expression data X, (X:=XduvT), the modified X is then used as the new input data for the next run to obtain the next pair of singular vectors. Moreover, we notice that Algorithm 1 may get different local optima with different initials, we run Algorithm 1 five times with different initials which are generated according to the multivariate standard normal distribution and choose the best one as the final solution of each turn. The detailed procedure is described in Algorithm 3.

Modularity score

To assess whether the genes within the same module are co-expressed/correlated, we use a modularity score to describe the overall co-expression of genes within the module. For a given module k containing pk genes and nk samples, we first calculate the correlation between gene i and j across the nk samples, denoted as wij. For convenience, we force to set wii=0 for each i. Then the modularity score of the module can be defined as:

$$ \text{Modularity} = \frac{1}{p_{k}\cdot(p_{k}-1)} \sum\limits_{i=1}^{p_{k}} \sum\limits_{j=1}^{p_{k}} |w_{{ij}}|. $$
(16)

Intuitively, if a module has a high modularity score, then the genes within the module is highly co-expressed.

Gene-gene interaction enrichment score

In order to evaluate whether the genes within the same module are tightly connected in the prior PPI network, we use the right tailed hypergeometric test to compute a significance level of each module. Suppose that the PPI network contains n genes and m edges, and module i contains ni genes and mi edges, then the significance level of module k can be calculated via the following equation:

$$ p(i) = \sum_{x< m_{i}}\frac{\binom{m}{x} \binom{N-m}{m_{i}-x}}{\binom{N}{N_{i}}}, $$
(17)

where \(N = \binom {n}{2}\) and \(N_{i} = \binom {n_{i}}{2}\). Accordingly, we can define the gene-gene interaction enrichment score s(i) of the module i by the following formula:

$$ s(i) = -\text{log}_{10}(p(i)). $$
(18)

The higher the gene-gene interaction enrichment score is, the denser the genes connect with each other. If the score is higher than 1.3, then the genes are significantly inter-connected with each other in the PPI network.

Availability of data and materials

The CGP datasets used during the current study are available in the Cancer Genome Project (CGP) repository (http://www.cancerrxgene.org/downloads). And the BRCA datasets used during the current study are available in the TCGA repository (Broad GDAC Firehose: http://firebrowse.org/).

Abbreviations

SVD:

Singular value decomposition

SSVD:

Sparse singular value decomposition

SNSVD:

Sparse network-regularized singular value decomposition

LASSO:

Least absolute shrinkage and selection operator

SNR:

Signal-to-noise ratio

CGP:

Cancer genome project

TCGA:

The cancer genome atlas

PPI:

Protein protein interaction

KEGG:

Kyoto encyclopedia of genes and genomes

GO:

Gene ontology

References

  1. Wan Q, Dingerdissen H, Fan Y, Gulzar N, Pan Y, Wu TJ, Yan C, Zhang H, Mazumder R. Bioxpress: An integrated RNA-seq-derived gene expression database for pan-cancer analysis. Database (Oxford). 2015; 2015:1–13.

    CAS  Google Scholar 

  2. Vallejos CA, Risso D, Scialdone A, Dudoit S, Marioni JC. Normalizing single-cell RNA sequencing data: challenges and opportunities. Nat Methods. 2017; 14(6):565–71.

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Iorio F, Knijnenburg TA, Vis DJ, Bignell GR, Menden MP, Schubert M, Aben N, Gonçalves E, Barthorpe S, Lightfoot H, et al. A landscape of pharmacogenomic interactions in cancer. Cell. 2016; 166(3):740–54.

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Lee M, Shen H, Huang JZ, Marron J. Biclustering via sparse singular value decomposition. Biometrics. 2010; 66(4):1087–95.

    PubMed  Google Scholar 

  5. Liquet B, de Micheaux PL, Hejblum BP, Thiébaut R. Group and sparse group partial least square approaches applied in genomics context. Bioinformatics. 2015; 32(1):35–42.

    PubMed  Google Scholar 

  6. Min W, Liu J, Zhang S. Edge-group sparse pca for network-guided high dimensional data analysis. Bioinformatics. 2018; 34(20):3479–87.

    CAS  PubMed  Google Scholar 

  7. Liu X, Chang X, Liu R, Yu X, Chen L, Aihara K. Quantifying critical states of complex diseases using single-sample dynamic network biomarkers. PLoS Comput Biol. 2017; 13(7):1005633.

    Google Scholar 

  8. Yu X, Zhang J, Sun S, Zhou X, Zeng T, Chen L. Individual-specific edge-network analysis for disease prediction. Nucleic Acids Res. 2017; 45(20):170.

    Google Scholar 

  9. Eren K, Deveci M, Küçüktunç O, Ümit V. Çatalyürek: A comparative analysis of biclustering algorithms for gene expression data. Brief Bioinforma. 2013; 14(3):279–92.

    CAS  Google Scholar 

  10. Sill M, Kaiser S, Benner A, Kopp-Schneider A. Robust biclustering by sparse singular value decomposition incorporating stability selection. Bioinformatics. 2011; 27(15):2089–97.

    CAS  PubMed  Google Scholar 

  11. Oghabian A, Kilpinen S, Hautaniemi S, Czeizler E. Biclustering methods: Biological relevance and application in gene expression analysis. PLoS ONE. 2014; 9(3).

  12. Chen S, Liu J, Zeng T. Measuring the quality of linear patterns in biclusters. Methods. 2015; 83:18–27.

    CAS  PubMed  Google Scholar 

  13. Min W, Liu J, Luo F, Zhang S. A two-stage method to identify joint modules from matched microRNA and mRNA expression data. IEEE Trans Nanobiosci. 2016; 15(4):362–370.

    Google Scholar 

  14. Yang D, Ma Z, Buja A. Rate optimal denoising of simultaneously sparse and low rank matrices. J Mach Learn Res. 2016; 17(1):3163–89.

    Google Scholar 

  15. Asteris M, Kyrillidis A, Koyejo O, Poldrack R. A simple and provable algorithm for sparse diagonal CCA. In: International Conference on Machine Learning: 2016. p. 1148–1157.

  16. Sokolov A, Carlin DE, Paull EO, Baertsch R, Stuart JM. Pathway-based genomics prediction using generalized elastic net. PLoS Comput Biol. 2016; 12(3):e1004790.

    PubMed  PubMed Central  Google Scholar 

  17. Hill SM, Heiser LM, et al. Inferring causal molecular networks: empirical assessment through a community-based effort. Nat Methods. 2016; 13(4):310–8.

    PubMed  PubMed Central  Google Scholar 

  18. Enrico G. Using prior knowledge from cellular pathways and molecular networks for diagnostic specimen classification. Brief Bioinforma. 2016; 17(3):440–52.

    Google Scholar 

  19. Lee E, Chuang H-Y, Kim J-W, Ideker T, Lee D. Inferring pathway activity toward precise disease classification. PLoS Comput Biol. 2008; 4(11):e1000217.

    PubMed  PubMed Central  Google Scholar 

  20. Li C, Li H. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics. 2008; 24(9):1175–82.

    CAS  PubMed  Google Scholar 

  21. Sun H, Feng R, Lin W, Li H. Network-regularized high-dimensional cox regression for analysis of genomic data. Stat Sin. 2013; 24(3):1433–59.

    Google Scholar 

  22. Chen J, Zhang S. Integrative analysis for identifying joint modular patterns of gene-expression and drug-response data. Bioinformatics. 2016; 32(11):1724–32.

    CAS  PubMed  Google Scholar 

  23. Zhu F, Liu J, Min W. Gene functional module discovery via integrating gene expression and ppi network data. In: International Conference on Intelligent Computing: 2019. p. 116–126. https://doi.org/10.1007/978-3-030-26969-2_11.

  24. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological). 1996; 58(1):267–88.

    Google Scholar 

  25. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc. 2001; 96(456):1348–60.

    Google Scholar 

  26. Cerami EG, Gross BE, et al. Pathway commons, a web resource for biological pathway data. Nucleic Acids Res. 2011; 39(Database Issue):685–90.

    Google Scholar 

  27. Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protocol. 2009; 4(1):44–57.

    CAS  Google Scholar 

  28. Leeksma OC, de Miranda NF, Veelken H. Germline mutations predisposing to diffuse large B-cell lymphoma. Blood Cancer J. 2017; 7(2):532.

    Google Scholar 

  29. Disis ML. Immune regulation of cancer. J Clin Oncol. 2010; 28(29):4531–8.

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Lander ES, Park PJ. The cancer genome atlas pan-cancer analysis project. Nat Genet. 2013; 45(10):1113–20.

    Google Scholar 

  31. Troyanskaya O, Cantor M, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001; 17(6):520–5.

    CAS  PubMed  Google Scholar 

  32. Xie B, Ding Q, Han H, Wu D. miRCancer: a microRNA cancer association database constructed by text mining on literature. Bioinformatics. 2013; 29(5):638–44.

    CAS  PubMed  Google Scholar 

  33. Garzon R, Calin GA, Croce CM. MicroRNAs in cancer. Ann Rev Med. 2009; 60:167–79.

    CAS  PubMed  Google Scholar 

  34. Adams BD, Kasinski AL, Slack FJ. Aberrant regulation and function of microRNAs in cancer. Curr Biol. 2014; 24(16):762–76.

    Google Scholar 

  35. Iorio MV, Croce CM. MicroRNAs in cancer: small molecules with a huge impact. J Clin Oncol. 2009; 27(34):5848.

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Zhang S, Li Q, Liu J, Zhou XJ. A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules. Bioinformatics. 2011; 27(13):401–9.

    CAS  Google Scholar 

  37. Zhang S, Liu C-C, Li W, Shen H, Laird PW, Zhou XJ. Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res. 2012; 40(19):9379–91.

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Bryan K, et al. Discovery and visualization of miRNA-mRNA functional modules within integrated data using bicluster analysis. Nucleic Acids Res. 2013; 42(3):17.

    Google Scholar 

  39. Li Y, Liang C, Wong K-C, Luo J, Zhang Z. Mirsynergy: detecting synergistic miRNA regulatory modules by overlapping neighbourhood expansion. Bioinformatics. 2014; 30(18):2627–35.

    CAS  PubMed  Google Scholar 

  40. Jin D, Lee H. A computational approach to identifying gene-microRNA modules in cancer. PLoS Comput Biol. 2015; 11(1):1004042.

    Google Scholar 

  41. Tesson BM, Breitling R, Jansen RC. DiffCoEx: a simple and sensitive method to find differentially coexpressed gene modules. BMC Bioinformatics. 2010; 11:497.

    PubMed  PubMed Central  Google Scholar 

  42. Ideker T, Krogan NJ. Differential network biology. Mol Syst Biol. 2012; 8:565.

    PubMed  PubMed Central  Google Scholar 

  43. Ha MJ, Baladandayuthapani V, Do K-A. Dingo: differential network analysis in genomics. Bioinformatics. 2015; 31(21):3413–20.

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Zhu L, et al. MetaDCN: meta-analysis framework for differential co-expression network detection with an application in breast cancer. Bioinformatics. 2016; 33(8):1121–9.

    PubMed Central  Google Scholar 

  45. Yang F, Shen Y, Liu ZS. The proximal alternating iterative hard thresholding method for L0 minimization, with complexity \(\mathcal {O}(1/\sqrt {k})\). J Comput Appl Math. 2017; 311:115–29.

    Google Scholar 

  46. Bolte J, Sabach S, Teboulle M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math Program. 2014; 146(1-2):459–94.

    Google Scholar 

  47. Nesterov Y. Primal-dual subgradient methods for convex problems. Math Program. 2009; 120(1):221–259.

    Google Scholar 

  48. Friedman J, Hastie T, Höfling H, Tibshirani R, et al. Pathwise coordinate optimization. Ann Appl Stat. 2007; 1(2):302–332.

    Google Scholar 

  49. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010; 33(1):1–22.

    PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Wenwen Min would like to thank the support of National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, CAS during his visit.

About this supplement

This article has been published as part of BMC Genomic Data Volume 22 Supplement 1, 2021: Proceedings of the 2019 International Conference on Intelligent Computing (ICIC 2019): genomic data. The full contents of the supplement are available online at https://bmcgenomdata.biomedcentral.com/articles/supplements/volume-22-supplement-1.

Funding

This work was supported by the National Science Foundation of China [No. 61802157] and Natural Science Foundation of Jiangxi Province of China [No. 20192BAB217004]. The funding bodies had no role in the design of the study or collection, analysis and interpretation of the data.

Author information

Authors and Affiliations

Authors

Contributions

The authors wish it to be known that, in their opinion, F.Z. and W.M. should be regarded as Joint First Authors. F.Z., W.M., J.Liu and J.Li developed the methodology and executed experiments. F.Z. and W.M. drafted the manuscript, J.Liu and J.Li revised the manuscript. W.M., J.Liu and J.Li supervised this study. The authors read and approved the final manuscript.

Corresponding authors

Correspondence to Jiang Li, Juan Liu or Wenwen Min.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhu, F., Li, J., Liu, J. et al. Network-based cancer genomic data integration for pattern discovery. BMC Genom Data 22, 54 (2021). https://doi.org/10.1186/s12863-021-01004-y

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/s12863-021-01004-y

Keywords

  • Gene co-expression analysis
  • Differentially co-expression analysis
  • Gene interaction network
  • Sparse SVD
  • Structured sparse learning