Investigation of candidate biomarkers and prognostic values in endometrial cancer based on bioinformatics analysis

Background: Endometrial cancer is a common gynecological cancer whose incidence is increasing annually worldwide. However, the biomarkers that provide the prognosis and progression of endometrial cancer are still lacking. Methods: The differentially expressed mRNAs and miRNAs were screened out using mRNA and miRNA expression data of endometrial cancer from Gene Expression Omnibus, and then validated in the Cancer Genome Atlas. The Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses were conducted using the Database for Annotation,Visualization and Integrated Discovery. A protein–protein interaction network was constructed by STRING and visualized using Cytoscape. OncoLnc was used for studying the prognostic effects of the hub genes. In addition, miRecords were used to predict target genes of differentially expressed miRNAs, and then a miRNA-mRNA regulatory network was constructed. Results: Two eligible human endometrial cancer datasets ( GSE17025 and GSE25405) met the requirement. A total of 520 differentially expressed mRNAs and 30 differentially expressed miRNAs were identified. These differentially expressed mRNAs were mainly enriched in cell cycle, skeletal system development, vasculature development, oocyte maturation, and oocyte meiosis signaling pathways. 160 pairs of differentially expressed miRNAs and mRNAs, including 22 differentially expressed miRNAs and 71 overlapping differentially expressed mRNAs, were validated in endometrial cancer samples using starBase v2.0 project. And the prognosis analysis found that Cyclin E1 (CCNE1, one of the 82 hub genes, which was correlated with hsa-miR-195) was correlated with significantly worse overall survival in endometrial cancer patients. Conclusions: These hub genes and differentially expressed miRNAs might be used as molecular targets for the treatment of endometrial cancer and prognostic biomarkers for


Background
Endometrial cancer (EC), that is, uterine corpus endometrial carcinoma (UCEC), is derived from the endometrium epithelial malignant tumors. With an increase in obesity and an aging population, the incidence and mortality rates of EC are increasing in developed countries [1]. According to the latest statistics of the American Cancer Society [2], Over 61, 000 cases were estimated to be diagnosed with EC in 2017. At present, advanced stage EC still accounts for 20% to 30%, once relapsed, the prognosis of which is very poor.
Currently, the biomarkers of EC are still lacking in efficiency in diagnosis and prognosis.
For example, Cancer antigen 125 (CA125), being most frequently used as a biomarker for ovarian cancer, has some diagnostic/prognostic value in EC [3]. however, CA125 level is elevated in a number of physiological and pathological gynecological and nongynecological conditions, such as age [4,5]

Functional and pathway enrichment analysis
The Database for Annotation, Visualization and Integrated Discovery (DAVID, http://david.ncifcrf.gov) facilitates users to perform biological analysis from data collection [15]. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted with DAVID. FDR < 0.05 was set as statistically significant.
Construction of PPI network and module analysis PPI network of DEGs was constructed using STRING database (version 11.0, https://stringdb.org/) and visualized using Cytoscape (version 3.7.1) [16,17]. The parameter was set as medium confidence score ≥ 0.7, module analyses were conducted using Cytoscape software MCODE package with degree cut-off = 2, node score cut-off = 0.2, max depth = 100 and k-score = 2 [18]. The functional enrichment analyses for these DEGs in the modules were conducted with DAVID.

Prediction of the target gene of miRNA
The target gene of miRNA (TG-miRNA) was predicted by employing miRecords (http://c1.accurascience.com/miRecords/), which includes 11 different miRNA target genes predicted databases [19]. A TG-miRNA can only be identified when at least four different prediction databases predict that the gene is a target gene.

Construction of the miRNA-mRNA regulatory network
The intersection of TG-miRNAs and DEGs were considered to be potentially valuable differentially expressed target genes. Pearson correlation analysis was then used in starBase (http://starbase.sysu.edu.cn/) to verify the association between these potentially valuable differentially expressed target genes and DEMs in patients with EC [20,21]. These significant differentially expression target genes and corresponding miRNAs were used to construct a miRNA-mRNA regulatory network using the Cytoscape software. The Degree of interaction of the node ≥ 5 which was defined as hub miRNA.

Survival analysis of hub genes
The overall survival of patients with EC with regard to hub genes were calculated using Kaplan-Meier analysis in OncoLnc (www.oncolnc.org). The patients were divided into two (high vs. low) groups according to the median values of mRNA expression of the hub gene.
The log-rank test was used to examine the significance of difference between two groups.

Identification of DEGs and DEMs
A total of 1,961 DEGs and 149 DEMs were identified from GSE17025 and GSE25405, respectively; 2,339 DEGs and 205 DEMs were identified from the mRNA and miRNA data of uterine corpus endometrial carcinoma in TCGA (named TCGA-UCEC and TCGA-UCEC_miRNA, respectively); 520 common DEGs and 30 common DEMs were screened out with Venny 2.1.0 http://bioinfogp.cnb.csic.es/tools/venny/index.html) [22], respectivly ( Fig. 1a, Fig. 1b). there were 212 upregulated genes and 308 downregulated genes, and 15 upregulated and 15 downregulated miRNAs in EC tissues compared with NE tissues, respectively ( Table 1, Table 2). 7 A PPI network consisting of 287 nodes and 1,840 edges was constructed, which included 212 upregulated and 308 downregulated genes (Fig. 2). Then, 82 nodes were screened out as hub genes (Degree of interaction≥10 were selected as the threshold) [23], there were close correlations among hub genes (Fig.3, Additional file 1). After analyzing the network with the MCODE app in Cytoscape software, an important module was obtained, including 50 nodes and 1,082 edges (Fig. 4). Functional enrichment analyses of biological processes with regard to this module showed that these genes were enriched in cell cycle, cell division, and DNA replication signaling pathways (Table 4). Three KEGG pathways were enriched in cell cycle, oocyte meiosis, and oocyte maturation signaling pathways ( Table   4).

Discussion
In recent years, although clinical medical scientists have made significant progress in the treatment of EC with surgery and chemotherapy, the incidence and mortality rate of EC are still increasing [24]. It is necessary to further understand the etiology and mechanism of EC progression to improve the prognosis of EC.
In this study, by integrating GSE17025 with TCGA-UCEC, 520 common DEGs were screened out in EC tissues compared with NE tissues. these 520 common DEGs were composed of

Conclusion
Based on bioinformatics analyses of EC-related microarray data in the GEO database and clinical data related to EC in TCGA database, we found that 27 hub genes (BUB1, TOP2A,        Protein-protein interaction network of the differentially expressed genes in endometrial cancer tissues compared with normal endometrium tissues. Green and red nodes represent upregulated and downregulated genes, respectively. The edges/lines stand for the regulatory association between nodes.

Figure 3
Protein-protein interaction network of hub genes of the differentially expressed genes in endometrial cancer tissues compared with normal endometrium tissues.
Green and red nodes represent upregulated and downregulated genes, respectively. The edges/lines stand for the regulatory association between nodes.

Figure 4
Demonstration of the important module by cytoscape. The edges/lines stand for interaction relationship between nodes.

Figure 5
The miRNA-mRNA regulatory network. Green and red nodes stand for upregulation and downregulation, respectively. The ellipses represent genes and the triangles represent miRNAs Figure 6 Overall survival analysis of CCNE1 expression with prognosis of endometrial cancer patients (Logrank p-value = 0.000157). Based on the median expression level of CCNE1, the patients with EC were divided into two (high vs. low) groups.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.
Additional file 2.pdf