Skip to main content

Identification of hub genes and pathways in colitis-associated colon cancer by integrated bioinformatic analysis



Colitis-associated colon cancer (CAC) patients have a younger age of onset, more multiple lesions and invasive tumors than sporadic colon cancer patients. Early detection of CAC using endoscopy is challenging, and the incidence of septal colon cancer remains high. Therefore, identifying biomarkers that can predict the tumorigenesis of CAC is in urgent need.


A total of 275 DEGs were identified in CAC. IGF1, BMP4, SPP1, APOB, CCND1, CD44, PTGS2, CFTR, BMP2, KLF4, and TLR2 were identified as hub DEGs, which were significantly enriched in the PI3K-Akt pathway, stem cell pluripotency regulation, focal adhesion, Hippo signaling, and AMPK signaling pathways. Sankey diagram showed that the genes of both the PI3K-AKT signaling and focal adhesion pathways were upregulated (e.g., SPP1, CD44, TLR2, CCND1, and IGF1), and upregulated genes were predicted to be regulated by the crucial miRNAs: hsa-mir-16-5p, hsa-mir-1-3p, et al. Hub gene-TFs network revealed FOXC1 as a core transcription factor. In ulcerative colitis (UC) patients, KLF4, CFTR, BMP2, TLR2 showed significantly lower expression in UC-associated cancer. BMP4 and IGF1 showed higher expression in UC-Ca compared to nonneoplastic mucosa. Survival analysis showed that the differential expression of SPP1, CFRT, and KLF4 were associated with poor prognosis in colon cancer.


Our study provides novel insights into the mechanism underlying the development of CAC. The hub genes and signaling pathways may contribute to the prevention, diagnosis and treatment of CAC.

Peer Review reports


Colon cancer is the third leading cause of cancer-associated death worldwide. Sporadic, hereditary, and colitis-associated colon cancer (CAC) are the three categories of this disease based on etiology. CAC is a major complication of inflammatory bowel disease (IBD). Compared with the age- and sex-matched general population, patients with IBD have a twofold increased risk of developing colon cancer [1]. Owing to a rising incidence and duration of IBD, the prevalence of CAC has also increased. Previously published epidemiological data has shown that the incidence of CAC ranges from 0.64% to 0.87% among the general population. However, 8%–16% of these patients die of the disease [2,3,4]. In terms of clinical features, CAC patients have a younger age of onset and more multiple lesions and invasive tumors than sporadic colorectal cancer patients; in addition, the prognosis of these patients is poor [5]. Early detection of CAC using endoscopy is challenging, and the incidence of septal colon cancer remains high. Thus, the discovery of specific molecular markers for CAC is urgently required.

It is widely known that microarray and RNA sequencing are both primary techniques used in transcriptome analysis. Horever, microarray is the common choice of most researchers since RNA-Seq is a expensive technique with data storing challenges and complex data analysis [6, 7]. Microarrays have widely been used to explore and identify the specific biomarkers for diagnosis and prognosis of disease [8]. Previously, bioinformatics analyses of CAC were mainly conducted by using gene chips of ulcerative colitis and colon adenocarcinoma [9, 10]. However, not all patients with ulcerative colitis would develop colon cancer. Meanwhile, some studies have demonstrated that there were significant changes in genome-wide RNA patterns between sporadic colon cancer and CAC patients [11]. Therefore, as the genes involved in the development of CAC and the relationship between those genes is still unclear [12], it is imperative to explore and reveal the accurate genes and signaling pathways of CAC.

In this study, we downloaded GSE43338 and GSE44904 datasets from the publicly available Gene Expression Omnibus (GEO) database and normalized the data to identify the differentially expressed genes (DEGs) between CAC and normal adjacent (control) tissues. In addition, this study provides a multi-level bioinformatics analysis strategy for identifying DEGs that consists of modular analysis, functional enrichment analysis, and screening of core genes by constructing a protein–protein interaction network (PPI) and the Sankey diagram of core genes. Gene-related network analyses were performed using NetworkAnalyst. The mRNA expression of hub genes were examined in ulcerative colitis-associated cancer patients. Prognostic analysis of hub genes was conducted based on The Cancer Genome Atlas (TCGA) data. Our findings may contribute to a better understanding of the mechanisms underlying the occurrence and development of CAC.

Material and methods

Acquisition and processing of gene expression set

GSE44904 and GSE43338 datasets were downloaded from the GEO database (Gene Expression Omnibus, The platform for the dataset GSE44904 is GPL7202 (Agilent-014868 Whole Mouse Genome Microarray 4 × 44 K G4122), which includes the AOM/DSS group (n = 3), DSS group (n = 3), AOM group (n = 3), and control group (n = 3). The platform for dataset GSE43338 was GPL339 ([MOE430A] Affymetrix Mouse Expression 430A Array). The CAC group (n = 4) and CAC control group(n = 2) were selected as per the needs of the study. The R software limma package Version 4.0, ( [13] was used to calibrate the data, the platform annotation file was used to annotate the probe, and the probe that did not match the gene (gene symbol) was removed. In addition, for multiple probes mapped to the same gene, the average value was calculated as the final expression value.

Screening and VENN analysis of DEGs

Two or more groups of samples were compared using the limma R package, and the genes with adj. P. Val < 0.05 and |log fold change (FC)|> 2 were considered to be DEGs. The upregulated and downregulated gene lists were saved as Excel files, and the TXT files of all gene lists sorted by logFC in each dataset were saved for subsequent analysis. The bioinformatics online tool (AIPuFu, was used to analyze the data obtained by VENN. The DEGs in the GSE44904 dataset were screened by VENN to identify the differential genes expressed alone in the AOM/DSS group. Then, above differential genes intersecting with the upregulated and downregulated DEGs of GSE43338 dataset were used as the target DEGs for follow-up analysis.

Construction of PPI protein interaction network and module analysis

The Search Tool for the Retrieval of Interacting Genes (STRING, is an online database that explores functional interactions between proteins encoded by differential genes and visualizes the PPI-protein interaction network of DEGs [14]. We selected the PPI relation pairs with a combined score > 0.4, eliminated the scattered PPI pairs, and mapped them to the network. The PPI network diagram was constructed using the Cytoscape software ( The MCODE plugin in the Cytoscape software was used to filter the submodules based on the default parameters "Degree Cutoff = 2″, "Node Score Cutoff = 0.2″, "K-Core = 2″ and " Max. Depth = 100".

Screening of hub genes for DEGs

The Cytohubba plug in the Cytoscape software was used to screen hub genes. TOP 15 nodes were calculated by Degree, Closeness and Radiality methods in Cytohubba. Scores were calculated by the Cytohubba plugin, and the top 11 genes with the most significance in the survival analysis were selected as hub genes according to their score.

Functional enrichment analysis of genes

The database used for annotation, visualization, and integrated discovery (DAVID, is an online tool that provides a comprehensive set of functional annotation methods for a range of genes or proteins provided by researchers [15]. The identified genes were analyzed for GO annotation and KEGG ( pathway enrichment using the DAVID tool. P < 0.05 was selected as the threshold for considering genes to be enriched, and the TXT file of the above analysis results was downloaded for further analysis.

Analysis of transcriptional factors (TFs) and miRNAs of hub genes

NetworkAnalyst3.0 ( is a comprehensive network visual analysis platform for gene expression analysis and meta- analysis [16]. JASPAR database on the platform was used to analyze the TFs related to the hub genes. The gene-miRNA target interaction network was built using the miRNet 2.0.

mRNA expression of hub genes were examined in patients

Microarray mRNA expression data of GSE3629 was taken from GEO. All statistical analyses and plots were conducted using R software. Shapiro–Wilk normality test and Wilcoxon rank-sum test were used to analyze the expression of hub genes in UC-Ca and UC-NonCa samples, respectively [17].

Survival analysis of hub genes

The survival analysis of the identified hub genes was carried out by using the online software UALCAN (, which uses TCGA Level 3 RNA-seq and clinical data from 31 cancer types. UALCAN can estimate the effect of gene expression levels and clinicopathologic features on patient survival [18].


Microarray data normalization and identification of DEGs

The chip expression datasets GSE44904 and GSE43338 were normalized, and the results are shown in Fig. 1. The limma R package (adjusted p < 0.05, and | log fold change (fc) |> 2) was used to screen DEGs. First, different groups in GSE44904 were compared, the different volcanoes plots are shown in Fig. 2a- c. Second, a total of 905 DEGs, comprising 496 upregulated and 409 downregulated genes, were screened from the dataset GSE43338. The DEGs of GSE43338 datasets are shown in Fig. 2d. A heat map was drawn for the top 100 DEGs as shown in Fig. 2e&f. Based on the different groups in the GSE44904 dataset, we further performed Venn analysis to screen out DEGs solely in CAC. Then a total of 1063 DEGs were identified, comprising 503 upregulated and 560 downregulated genes (Fig. 2g-h). Based on the DEGs screened from the two data sets, a Venn analysis was repeated, and 275 overlapping genes were found, comprising 103 upregulated and 172 downregulated genes (Fig. 2i-j).

Fig. 1
figure 1

Normalized gene expression. The normalization of GSE44904 dataset (a and b). The normalization of GSE43338 dataset (c and d). Blue represents data before normalization, and red represents data after normalization

Fig. 2
figure 2

Identification of DEGs from two dataset chips. Different groups in GSE44904 dataset: AOM/DSS VS Control group (a), AOM VS Control group (b), DSS VS Control group (c), and (d) GSE43338 dataset (CAC VS Control group). adj. P. Val < 0.05 and | log a fold change |< 2, red dots represent upregulated genes, green dots represent downregulated genes, and black dots represent genes with no significant difference. Heat maps of the top 100 DEGs in GSE44904 (e) and GSE43338 (f) datasets. Red indicates relative upregulation of gene expression; green indicates relative downregulation of gene expression. VENN diagram of DEGs identified from datasets (g&h: DEGs were only expressed in the AOM/DSS group from GSE44904 dataset; i&j: overlapping DEGs which were upregulated and downregulated in the two datasets)

PPI network construction and functional analysis of DEGs

The STRING online database was used to analyze the 275 intersecting DEGs. A PPI network was constructed as shown in Fig. 3a. To study the functional annotation of the selected DEGs, DAVID analysis was performed to categorize genes by biological process (BP), molecular function (MF), and cellular component (CC). The results were considered statistically significant at p < 0.05; the GO results are shown in Fig. 3c. BP mainly includes positive regulation of transcription from RNA polymerase II promoter, oxidation–reduction process, negative regulation of transcription from RNA polymerase II promoter, negative regulation of cell proliferation, positive regulation of transcription, DNA-templated, cell proliferation, transport, inflammatory response, negative regulation of transcription, DNA-templated, cell adhesion, among others. CC mainly includes extracellular space, plasma membrane, extracellular exosome, extracellular region, integral component of plasma membrane, endoplasmic reticulum membrane, Golgi apparatus, endoplasmic reticulum, and others. MF mainly includes hormone activity, transporter activity, calcium ion binding, receptor binding, heparin binding, and oxidoreductase activity. We performed KEGG analysis of DEGs and as shown in Fig. 3e, the pathways mainly enriched were ovarian steroidogenesis, fat digestion and absorption, metabolism, vitamin digestion and absorption, and regulation of pluripotency of stem cells, arachidonic acid metabolism, FoxO signaling pathway, aldosterone-regulated sodium reabsorption, bile secretion, PI3K-Akt pathway, cancer, and ether lipid metabolism.

Fig. 3
figure 3

Protein–protein network and module analysis of DEGs. The network map of DEGs was constructed using STRING (a). The modular analysis was carried out on the network to screen out the module (b) with the highest score (MCODE score = 9.0). Red represents upregulated genes and the blue represents downregulated genes. Gene ontology (GO) enrichment analysis in DEGs and module genes were performed using the DAVID Database (c: DEGs, d: module genes); Classification: Biological Process (BP), B: Cellular Component (CC), C: Molecular Function (MF). KEGG pathways using the ggplot2 package in R language for visualization (e: DEGs, f: module genes). The size of the dot represents the amount of gene enrichment, and the color of the dot represents p value

To further understand the DEGs, the MCODE plugin in the Cytoscape software was subsequently used for modular analysis, and the sub-modules with high scores were selected with a score of 9. Module genes were SPP1, Tgoln2, ApoB, FSTL1, LAMB1, LAMC1, CHGB, BMP4, and CYR61 (Fig. 3b). The GO function analysis results for the submodule genes are shown in Fig. 3d. BP mainly includes extracellular matrix organization, cell adhesion, positive regulation of epithelial cell proliferation, and positive regulation of cell migration. CP mainly includes the extracellular region, extracellular space, and extracellular exosomes. MF mainly includes heparin binding and extracellular matrix binding. KEGG pathway analysis showed that genes were mainly enriched in ECM-receptor interaction, focal adhesion, PI3K-Akt signaling pathway, and cancer pathways, such as small cell lung cancer pathways (Fig. 3f).

Hub genes selection and analysis

The scores of DEGs were calculated using the Cytoscape software, and the top 11 genes were selected as hub genes (Fig. 4a). These included IGF1, BMP4, SPP1, APOB, CCND1, CD44, PTGS2, CFTR, BMP2, KLF4, and TLR2. Detailed information on the hub genes, is shown in Table 1. The scores calculated by the Radiality and Closeness methods in the cytohubba pluginto were shown in Table S1. To determine the enriched pathways terms for hub genes, KEGG pathway analysis was performed using DAVID. The genes were enriched in signaling pathways regulating many biological functions (Fig. 4b). The Sankey diagram shows the distribution of hub genes in the different signaling pathways (Fig. 4c): signaling pathways regulating pluripotency of stem cells (enriched genes: IGF1, BMP4, BMP2, KLF4; p = 0.0015), pathways in cancer (enriched genes: BMP4, BMP2, CCND1, IGF1, and PTGS2; p = 0.0035), proteoglycans in cancer (enriched genes: CCND1, IGF1, CD44, and TLR2; p = 0.0043), AMPK signaling pathway (enriched genes: CCND1, IGF1, CFTR; p = 0.0186), PI3K-Akt signaling pathway (enriched genes: CCND1, SPP1, IGF1, TLR2; p = 0.0196), Hippo signaling pathway (enriched genes: BMP4, BMP2, CCND1; p = 0.0273), and pathways involved in focal adhesion (enriched genes: CCND1, SPP1, IGF1; p = 0.0483).

Fig. 4
figure 4

The hub genes were screened and analyzed by KEGG and correlation analysis. The top 11 genes with the most significance were selected as hub genes according to the score (a). KEGG pathway analysis of hub genes was analyzed by DAVID (b). The distribution relationship between hub genes and pathways (c): Red represents upregulated genes and blue represents downregulated genes. Correlation analysis of core TF and hub genes (d) and gene-miRNA interactions network (e), circles represents genes, diamonds represents TFs, and squares represents the miRNAs, sizes represents the degree

Table1 Detailed information about the hub gene

The TF-gene regulatory network was constructed based on the JASPAR database on the Network Analyst platform. Figure 4d depicts the transcription factors that can regulate two or more genes. In addition to hub genes, there were 46 transcription factors in the regulatory network, and 86 relationship pairs were established. Among the predicted transcription factors, FOXC1 is considered to be the core TF that can regulate multiple genes, including SPP1, IGF1, BMP4, TLR2, CD44, KLF4, and CFTR. In order to further investigate the upregulated genes in the hub genes, we performed gene-miRNA interactions network using miRNet 2.0. A total of 8 genes, 613 miRNAs, and 823 gene-miRNA pairs were registered in the network (Fig. 4e). Main miRNAs with interactions of more than six genes are listed in Table S2. It was predicted that hsa-miR-16-5p could regulate CCND1, CD44, PTGS2, IGF1, APOB, SPP1, and BMP4, while hsa-miR-1-3p could regulate CCND1, CD44, IGF1, PTGS2, APOB, and BMP4.

mRNA expression of the hub genes in patients

mRNA expression results of hub genes in the GSE3629 indicated that CFTR(p < 0.01), KLF4(p < 0.05), BMP2(p < 0.05) and TLR2(p < 0.01) were downregulated. BMP4(p < 0.05), and IGF1(p < 0.05) were upregulated. These were consistent with our analysis results. There were no significant differences in mRNA expression of CD44, PTGS2, CCND1, SPP1 and APOB (Fig. 5).

Fig. 5
figure 5

The mRNA expression level of hub genes in patients according to the GEO database. UC-NonCa indicates nonneoplastic mucosa tissue of ulcerative colitis patients, and UC-Ca indicates ulcerative colitis-associated cancer tissue. ns, p ≥ 0.05; *, p < 0.05; **, p < 0.01; ***, p < 0.001

Survival analysis of hub genes in colon cancer

Considering CAC as an etiological classification of colon cancer, we used colon cancer data from the TCGA database to analyze the survival of hub genes (Fig. 6). Survival analysis data contained information on high or low expression of target genes, as well as that on the correlation between hub genes and colon cancer. Among the 11 hub genes, the following genes were found to be associated with the prognosis of colon cancer patients: SPP1 (p = 0.019), CFTR (p = 0.031), and KLF4 (p = 0.048).

Fig. 6
figure 6

Survival analysis of hub genes in colon cancer (P < 0.05). (a) CFTR, (b) KLF4, (C) SPP1


Not all patients with inflammatory bowel disease develop CAC. Therefore, comparing the differentially expressed genes in the CAC model and those in the IBD model may enable us to find specific genes in CAC. In this study, data from the GEO database (GSE44904 and GSE43338) were normalized, different groups of the GSE44904 dataset were analyzed. Through Venn analysis, DEGs alone in CAC (AOM/DSS) were screened. Through intersection analysis using gene microarray data from the CAC animal model in the GSE43338 dataset, a total of 275 specific genes (including 103 upregulated and 172 downregulated genes) were found in CAC. GO and KEGG pathway analyses of the selected DEGs indicated that some biological processes and functions were associated with CAC, such as regulation of transcription from RNA polymerase II promoter, reduction process, cell proliferation, inflammatory response, cell adhesion, extracellular space, plasma membrane, extracellular exosome, transporter activity, calcium ion binding, and receptor binding. Furthermore, the enrichment results of the genes in the submodules with the highest scores also confirmed the importance of these biological processes and functions. In the KEGG pathway analysis, a large number of differential genes were found to be enriched in metabolic pathways, which is consistent with published studies [19]. Lu and Wang, through metabonomics analysis, found that there were many metabolic pathway changes in colon cancer induced by AOM/DSS [20]. Our study also demonstrated that fat digestion and absorption, ovarian steroidogenesis, vitamin digestion and absorption, arachidonic acid metabolism, ether lipid metabolism, and other metabolic pathways are closely related to the occurrence and development of CAC.

However, interestingly, in addition to the metabolic pathway, a large number of DEGs were enriched in pathways in cancer, signaling pathways regulating pluripotency of stem cells, PI3K-Akt signaling pathway, and FoxO signaling pathway. Subsequently, KEGG pathway analysis was performed for the genes in the submodules. The pathways obtained were similar to those enriched in DEGs, such as the pathways involved in cancer, PI3K-Akt signaling pathway, and focal adhesion pathway. These results suggest that these pathways and their genes play key roles in the occurrence and development of CAC. Focal adhesion is the contact point between cells and the surrounding environment, which can drive cell migration. The signaling pathway plays an important role in wound healing and tumor metastasis. It has been found that low expression of miR-4728-3p in ulcerative colitis-associated colorectal cancer can influence CAV1, THBS2, and COL1A2 genes as well as focal adhesion signaling, which is related to tumor pathogenesis [21]. Li and Wang found that activation of focal adhesion kinase prevented the development of ulcerative colitis and CAC [22].

Further, PPI network analysis was conducted on DEGs. According to the degree score value, we identified DEGs with the highest score and significance as hub genes, namely, BMP4, SPP1, APOB, CCND1, CD44, PTGS2, CFTR, BMP2, KLF4, TLR2, and IGF1. To validate the results of bioinformatics analysis, we examined the mRNA expression levels of hub genes in patients by using GEO databases. The results were basically consistent with the observed gene expression trends. There was no significant difference in mRNA expression of some hub genes, which may be due to the small sample size. KEGG pathway analysis for the hub genes revealed that these genes were not only enriched in signaling pathways regulating the pluripotency of stem cells, PI3K-Akt signaling pathway, and focal adhesion pathway, but also were enriched in the Hippo and AMPK signaling pathways. These genes and their enriched pathways are closely related to the occurrence and development of CAC. Pluripotency is a characteristic of stem cells, and a small number of cells in tumors have self-renewal ability and produce heterogeneous tumors [23]. P53 can inhibit the pluripotency of tumor stem cells. In a preclinical animal model of CAC, targeted knockout of stem cell-specific P53 was found to significantly increase tumor size and incidence [24]. Josse et al. also found that PI3K/Akt is the main pathway affected by the AOM/DSS model through miRNA chip experiments [25]. This finding is consistent with our findings. In human colon tissue infiltrated with inflammatory cells, the PI3K/Akt pathway is activated and mediates the progression of colitis and CAC through a positive feedback loop that maintains the recruitment of inflammatory cells [26].

In inflammation-related tumor models, inhibition of IGF1 signaling can reduce the number and size of colon tumors in wild-type mice [27]. IGF-1R knockout can activate the LKB1/AMPK pathway and play a protective role in colitis and CAC [28]. Chen et al. found that the Hippo pathway was involved in the occurrence of intestinal inflammation and progression of CAC in an experimental mouse model [29]. YAP1 is a transcriptional co-activator in the Hippo signaling pathway. PGE2 signaling can increase the expression and transcriptional activity of YAP1, and YAP1 further activates PTGS2 and PTGER4, which in turn can activate PGE2. This positive feedback loop plays an important role in colon regeneration and promotes the development of colitis-related cancer [30]. In a mouse model of CAC, Ya-Chun Chou demonstrated that Boswellia serrata mediated Akt/GSK3β/cyclin D1 signaling pathway and altered the composition of gut microbiota to alleviate tumor growth [31].

Furthermore, other hub genes were significantly associated with the development of CAC. For example, an abnormal expression of BMP protein is a common feature of cancer. In the colon mucosa, the BMP pathway overlaps with several other colon cancer pathways [32]. Inhibition of the BMP pathway is an early event in inflammation-driven colon tumors in mice [33]. TLR2 is highly expressed in tumor tissues of CRC patients. Gene knockout and knockdown of TLR2 can inhibit the proliferation of inflammation-related colorectal cancer and sporadic colorectal cancer [34]. SPP1 is an important inflammatory mediator. It is upregulated in inflammation-related intestinal tumors and mediates the progression of colon cancer [35]. Yang and Liu found that deletion of KLF4 causes genetic instability, which in turn lead to the progression of CAC [36]. The mutation of the APOB gene in CRC associated with ulcerative colitis was found by whole exon sequencing, and there was a significant difference between ulcerative colitis-associated CRC and scattered CRC [37]. CD44 is an adhesion and anti-apoptotic molecule that is highly expressed in colon cancer [38]. However, in a comparative study, CD44 expression was found to be lower in ulcerative colitis-associated dysplasia and cancers than in sporadic colonic tumors [39].

The regulatory network of TF-gene predicted analysis showed that FOXC1, FOXL1, NFKB1, STAT3, JUN, E2F1, CREB1, and GATA2 were significantly related to hub gene. Recent studies have emphasized the important role of transcription factor nuclear factor kappa B (NF-κB) and signal transducer and activator of transcription 3 (STAT3) in the progression of inflammation-associated cancer [40, 41]. Meanwhile, transcription factors JUN [42], E2F1 [43], and GATA2 [44] have been reported to be closely related to the occurrence and development of colitis-associated tumors. FoxC1, as a core transcription factor, interacts most closely with hub genes. FoxC1 belongs to the forkhead box (FOX) transcription factor family. Many studies have confirmed that at least 14 proteins in the FOX transcription factor family are closely related to the pathogenesis of CRC [45]. Currently, as a new cancer marker and therapeutic target, the regulatory role of FOXC1 in many types of cancer has been widely studied [46]. Future studies should focus on CAC.


In summary, based on GSE44904 and GSE43338 datasets, bioinformatics analysis identified 275 DEGs in CAC, including 103 upregulated and 172 downregulated genes. IGF1, BMP4, SPP1, APOB, CCND1, CD44, PTGS2, CFTR, BMP2, KLF4, and TLR2 were hub proteins, which were mainly related to the PI3K-Akt signaling pathway, focal adhesion, Hippo signaling pathway, AMPK signaling pathway, and stem cell pluripotency regulation pathway. The expression of hub genes were examined in the patient samples. A study on the TF-gene regulatory network of hub genes showed that FOXC1 was the core transcription factor, and had the most interaction with hub genes. Additional work is needed to elucidate the underlying mechanisms behind these observations. Survival analysis showed that the differential expression of SPP1, CFRT, and KLF4 were associated with poor prognosis in colon cancer. This study helps us further understand the mechanism of CAC progression.

Availability of data and materials

Data is available at TCGA and GEO database, accession numbers: GSE44904: GSE43338: GSE3629:


  1. Lutgens MW, van Oijen MG, van der Heijden GJ, Vleggaar FP, Siersema PD, Oldenburg B. Declining risk of colorectal cancer in inflammatory bowel disease: an updated meta-analysis of population-based cohort studies. Inflamm Bowel Dis. 2013;19(4):789–99.

    Article  PubMed  Google Scholar 

  2. Chu TPC, Moran GW, Card TR. The pattern of underlying cause of death in patients with inflammatory bowel disease in england: a record linkage study. J Crohns Colitis. 2017;11(5):578–85.

    Article  PubMed  Google Scholar 

  3. Gong W, Lv N, Wang B, et al. Risk of ulcerative colitis-associated colorectal cancer in China: a multi-center retrospective study. Dig Dis Sci. 2012;57(2):503–7.

    Article  PubMed  Google Scholar 

  4. Eaden JA, Abrams KR, Mayberry JF. The risk of colorectal cancer in ulcerative colitis: a meta-analysis. Gut. 2001;48(4):526–35.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  5. Dobbins WO 3rd. Dysplasia and malignancy in inflammatory bowel disease. Annu Rev Med. 1984;35:33–48.

    Article  PubMed  Google Scholar 

  6. Zhao S, Fung-Leung WP, Bittner A, Ngo K, Liu X. Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. Plos One. 2014;9(1): e78644.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  7. Vahlensieck C, Thiel CS, Adelmann J, Lauber BA, Polzer J, Ullrich O. Rapid transient transcriptional adaptation to hypergravity in jurkat t cells revealed by comparative analysis of microarray and RNA-Seq data. Int J Mol Sci. 2021;22(16):8451.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. Fan L, Hui X, Mao Y, Zhou J. Identification of acute pancreatitis-related genes and pathways by integrated bioinformatics analysis. Dig Dis Sci. 2020;65(6):1720–32.

    CAS  Article  PubMed  Google Scholar 

  9. Shi W, Zou R, Yang M, et al. Analysis of genes involved in ulcerative colitis activity and tumorigenesis through systematic mining of gene co-expression networks. Front Physiol. 2019;10:662.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Zhou J, Xie Z, Cui P, et al. SLC1A1, SLC16A9, and CNTN3 are potential biomarkers for the occurrence of colorectal cancer. Biomed Res Int. 2020;2020:1204605.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  11. Colliver DW, Crawford NP, Eichenberger MR, et al. Molecular profiling of ulcerative colitis-associated neoplastic progression. Exp Mol Pathol. 2006;80(1):1–10.

    CAS  Article  PubMed  Google Scholar 

  12. Shawki S, Ashburn J, Signs SA, Huang E. Colon cancer: inflammation-associated cancer. Surg Oncol Clin N Am. 2018;27(2):269–87.

    Article  PubMed  Google Scholar 

  13. Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7): e47.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  14. Szklarczyk D, Franceschini A, Kuhn M, et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011;39(Database issue):D561–8.

    CAS  Article  PubMed  Google Scholar 

  15. da Huang W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4(1):44–57.

    CAS  Article  Google Scholar 

  16. Zhou G, Soufan O, Ewald J, Hancock REW, Basu N, Xia J. NetworkAnalyst 3.0: a visual analytics platform for comprehensive gene expression profiling and meta-analysis. Nucleic Acids Res. 2019;47(W1):W234–41.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  17. Weir GA, Middleton SJ, Clark AJ, et al. Using an engineered glutamate-gated chloride channel to silence sensory neurons and treat neuropathic pain at the source. Brain. 2017;140(10):2570–85.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Chandrashekar DS, Bashel B, Balasubramanya SAH, et al. UALCAN: a portal for facilitating tumor subgroup gene expression and survival analyses. Neoplasia. 2017;19(8):649–58.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  19. Gao Y, Li X, Yang M, et al. Colitis-accelerated colorectal cancer and metabolic dysregulation in a mouse model. Carcinogenesis. 2013;34(8):1861–9.

    CAS  Article  PubMed  Google Scholar 

  20. Lu Y, Wang J, Ji Y, Chen K. Metabonomic variation of exopolysaccharide from Rhizopus nigricans on AOM/DSS-induced colorectal cancer in mice. Onco Targets Ther. 2019;12:10023–33.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. Pekow J, Hutchison AL, Meckel K, et al. miR-4728-3p functions as a tumor suppressor in ulcerative colitis-associated colorectal neoplasia through regulation of focal adhesion signaling. Inflamm Bowel Dis. 2017;23(8):1328–37.

    Article  PubMed  Google Scholar 

  22. Li J, Lu Y, Wang D, et al. Schisandrin B prevents ulcerative colitis and colitis-associated-cancer by activating focal adhesion kinase and influence on gut microbiota in an in vivo and in vitro model. Eur J Pharmacol. 2019;854:9–21.

    CAS  Article  PubMed  Google Scholar 

  23. Sharif T, Martell E, Dai C, et al. Autophagic homeostasis is required for the pluripotency of cancer stem cells. Autophagy. 2017;13(2):264–84.

    CAS  Article  PubMed  Google Scholar 

  24. Davidson LA, Callaway ES, Kim E, et al. Targeted deletion of p53 in Lgr5-expressing intestinal stem cells promotes colon tumorigenesis in a preclinical model of colitis-associated cancer. Cancer Res. 2015;75(24):5392–7.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. Josse C, Bouznad N, Geurts P, et al. Identification of a microRNA landscape targeting the PI3K/Akt signaling pathway in inflammation-induced colorectal carcinogenesis. Am J Physiol Gastrointest Liver Physiol. 2014;306(3):G229–43.

    CAS  Article  PubMed  Google Scholar 

  26. Khan MW, Keshavarzian A, Gounaris E, et al. PI3K/AKT signaling is essential for communication between tissue-infiltrating mast cells, macrophages, and epithelial cells in colitis-induced cancer. Clin Cancer Res. 2013;19(9):2342–54.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  27. Youssif C, Cubillos-Rojas M, Comalada M, et al. Myeloid p38α signaling promotes intestinal IGF-1 production and inflammation-associated tumorigenesis. EMBO Mol Med. 2018;10(7):e8403.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  28. Wang SQ, Yang XY, Cui SX, Gao ZH, Qu XJ. Heterozygous knockout insulin-like growth factor-1 receptor (IGF-1R) regulates mitochondrial functions and prevents colitis and colorectal cancer. Free Radic Biol Med. 2019;134:87–98.

    CAS  Article  PubMed  Google Scholar 

  29. Chen G, Han Y, Feng Y, et al. Extract of Ilex rotunda Thunb alleviates experimental colitis-associated cancer via suppressing inflammation-induced miR-31-5p/YAP overexpression. Phytomedicine. 2019;62: 152941.

    CAS  Article  PubMed  Google Scholar 

  30. Kim HB, Kim M, Park YS, et al. Prostaglandin E2 activates YAP and a positive-signaling loop to promote colon regeneration after colitis but also carcinogenesis in mice. Gastroenterology. 2017;152(3):616–30.

    CAS  Article  PubMed  Google Scholar 

  31. Chou YC, Suh JH, Wang Y, Pahwa M, Badmaev V, Ho CT, Pan MH. Boswellia serrata resin extract alleviates azoxymethane (AOM)/dextran sodium sulfate (DSS)-induced colon tumorigenesis. Mol Nutr Food Res. 2017;61(9).

  32. Hardwick JC, Kodach LL, Offerhaus GJ, van den Brink GR. Bone morphogenetic protein signalling in colorectal cancer. Nat Rev Cancer. 2008;8(10):806–12.

    CAS  Article  PubMed  Google Scholar 

  33. Karagiannis GS, Afaloniati H, Karamanavi E, Poutahidis T, Angelopoulou K. BMP pathway suppression is an early event in inflammation-driven colon neoplasmatogenesis of uPA-deficient mice. Tumour Biol. 2016;37(2):2243–55.

    CAS  Article  PubMed  Google Scholar 

  34. Meng S, Li Y, Zang X, Jiang Z, Ning H, Li J. Effect of TLR2 on the proliferation of inflammation-related colorectal cancer and sporadic colorectal cancer. Cancer Cell Int. 2020;20:95.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  35. Bahri R, Pateras IS, D’Orlando O, et al. IL-15 suppresses colitis-associated colon carcinogenesis by inducing antitumor immunity. Oncoimmunology. 2015;4(9):e1002721. (Published 2015 Jan 22).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  36. Yang VW, Liu Y, Kim J, Shroyer KR, Bialkowska AB. Increased genetic instability and accelerated progression of colitis-associated colorectal cancer through intestinal epithelium-specific deletion of Klf4. Mol Cancer Res. 2019;17(1):165–76.

    CAS  Article  PubMed  Google Scholar 

  37. Yan P, Wang Y, Meng X, et al. Whole exome sequencing of ulcerative colitis-associated colorectal cancer based on novel somatic mutations identified in Chinese patients. Inflamm Bowel Dis. 2019;25(8):1293–301.

    Article  PubMed  Google Scholar 

  38. Subramaniam V, Vincent IR, Gardner H, Chan E, Dhamko H, Jothy S. CD44 regulates cell migration in human colon cancer cells via Lyn kinase and AKT phosphorylation. Exp Mol Pathol. 2007;83(2):207–15.

    CAS  Article  PubMed  Google Scholar 

  39. Mikami T, Mitomi H, Hara A, et al. Decreased expression of CD44, alpha-catenin, and deleted colon carcinoma and altered expression of beta-catenin in ulcerative colitis-associated dysplasia and carcinoma, as compared with sporadic colon neoplasms. Cancer. 2000;89(4):733–40.;2-#.

    CAS  Article  PubMed  Google Scholar 

  40. Zhang HX, Xu ZS, Lin H, et al. TRIM27 mediates STAT3 activation at retromer-positive structures to promote colitis and colitis-associated carcinogenesis. Nat Commun. 2018;9(1):3441.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  41. Callejas BE, Mendoza-Rodríguez MG, Villamar-Cruz O, et al. Helminth-derived molecules inhibit colitis-associated colon cancer development through NF-κB and STAT3 regulation. Int J Cancer. 2019;145(11):3126–39.

    CAS  Article  PubMed  Google Scholar 

  42. Liu ZY, Wu B, Guo YS, et al. Necrostatin-1 reduces intestinal inflammation and colitis-associated tumorigenesis in mice. Am J Cancer Res. 2015;5(10):3174–85.

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Kang DW, Choi CY, Cho YH, et al. Targeting phospholipase D1 attenuates intestinal tumorigenesis by controlling β-catenin signaling in cancer-initiating cells. J Exp Med. 2015;212(8):1219–37.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  44. Zhong L, Huot J, Simard MJ. p38 activation induces production of miR-146a and miR-31 to repress E-selectin expression and inhibit transendothelial migration of colon cancer cells. Sci Rep. 2018;8(1):2334.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  45. Laissue P. The forkhead-box family of transcription factors: key molecular players in colorectal cancer pathogenesis. Mol Cancer. 2019;18(1):5.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Han B, Bhowmick N, Qu Y, Chung S, Giuliano AE, Cui X. FOXC1: an emerging marker and therapeutic target for cancer. Oncogene. 2017;36(28):3957–63.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references


We acknowledge TCGA and GEO database for providing their platforms and contributors for uploading their datasets.


This work was supported by the Nursery Fund of Affiliated Hospital of Jining Medical University (No. MP-MS-2020–009 to Yongming Huang), and Shandong Medical Science and Technology Program (No. 2018WS460 to Xiaoyuan Zhang).

Author information

Authors and Affiliations



This article was completed in collaboration with all the following authors. YJ determined the research theme and formulated the main research plan. HYM and ZXY analyzed the data, and wrote the manuscript. WP and LYS helped collect data and references. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jie Yao.

Ethics declarations

Ethics approval and consent to participate

TCGA and GEO belong to public databases. The patients involved in the database have obtained ethical approval. Our study is based on open source data, so there are no ethical issues and other conflicts of interest. There are no human subjects in this article and informed consent is not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflict of interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

: Table S1. Top 15 in network ranked by Closeness method and top 15 in network ranked by Radiality method.

Additional file 2: Table S2.

The main related miRNAs of upregulated genes in the hub genes.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Huang, Y., Zhang, X., PengWang et al. Identification of hub genes and pathways in colitis-associated colon cancer by integrated bioinformatic analysis. BMC Genom Data 23, 48 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Colitis-associated colon cancer
  • Differentially expressed genes
  • Signaling pathways
  • functional enrichment analysis
  • Prognosis