Mitochondria in tumour progression: a network of mtDNA variants in different types of cancer

Mitochondrial participation in tumorigenesis and metastasis has been studied for many years, but several aspects of this mechanism remain unclear, such as the association of mitochondrial DNA (mtDNA) with different cancers. Here, based on two independent datasets, we modelled an mtDNA mutation-cancer network by systematic integrative analysis including 37 cancer types to identify the mitochondrial variants found in common among them. Our network showed mtDNA associations between gastric cancer and other cancer types, particularly kidney, liver, and prostate cancers, which is suggestive of a potential role of such variants in the metastatic processes among these cancer types. A graph-based interactive web tool was made available at www2.lghm.ufpa.br/mtdna. We also highlighted that most shared variants were in the MT-ND4, MT-ND5 and D-loop, and that some of these variants were nonsynonymous, indicating a special importance of these variants and regions regarding cancer progression, involving genomic and epigenomic alterations. This study reinforces the importance of studying mtDNA in cancer and offers new perspectives on the potential involvement of different mitochondrial variants in cancer development and metastasis.


Background
Mitochondria are cytoplasmic organelles responsible for several pathways in cell stability and operation and are mainly known for their role in ATP production during aerobic respiration. Two of the three main steps of this process occur inside these organelles, the tricarboxylic acid cycle (TCA) in the mitochondrial matrix and oxidative phosphorylation (OXPHOS) in the mitochondrial cristae, the latter generating most of the ATP from aerobic respiration [1]. Because of their origin, mitochondria have their own genome, a double-stranded (heavy and light strands) circular molecule called the mitochondrial genome (mtgenome or mitogenome), which is highly specialized for energy metabolism. The human mtgenome is composed of 37 genes, of which 13 encode subunits of the OXPHOS protein complexes, 22 tRNA and 2 rRNA; in addition, more than 1,000 proteins encoded by the nuclear genome are necessary for overall mitochondrial function [2].
Considering the importance of mitochondrial energy metabolism and other functions to cellular homeostasis, imbalances in such processes may lead to the development of diseases. In fact, for the past few decades, mitochondria have been known to be related to cancer, but many aspects of this involvement remain unknown. Notably, the mtgenome has been reported to be altered in tumours, similar to the nuclear genome, and to be more susceptible to damage than the nuclear genome [3,4].
Although many efforts are being made to understand the related mechanisms, it seems that mitochondrial genetics should be taken into consideration more often [5].
In this context, we recently published a study performing whole-genome sequencing for the first time in regard to gastric cancer (GC) in a Brazilian population [6]. Even more recently, a study by [7] was published with the whole mtgenome of multiple types of cancer from different populations -which do not seem to include Brazil -as part of the International Cancer Genome Consortium/The Cancer Genome Atlas Pan-Cancer Analysis of Whole Genomes Consortium (ICGC/TCGA PAWG Consortium), and the database was named The Cancer Mitochondria Atlas (TCMA). Based on these datasets, the present study aimed to analyse the mtDNA mutations that occur in common among GC and different types of cancer by modelling and developing an interactive web tool to explore the generated mtDNA mutationcancer network, suggesting potential biomarkers in the mtgenome that may be involved in cancer development and progression.

General analysis of the merged datasets
Here, we analysed the set of variants that are shared by individuals from two datasets, as described in the Materials and Methods section, and it should be noted that the INDEL file from the TCMA database shared no variants with the GC dataset and was therefore excluded from this study. After dataset merging, we observed 90 individuals and 22 cancer types from TCMA that presented shared variants with GC. This was probably a reflection of different genetic ancestries in both datasets, considering that the GC dataset consisted of individuals from the Brazilian Amazon region, in which Native American mtDNA haplogroups are frequently found [6], while the TCMA dataset does not seem to include the Brazilian population. This emphasizes the importance of considering the genetic background of the studied populations and stresses the potential of the shared mutations observed in this analysis.
In total, 30 mitochondrial SNVs were shared between the GC and TCMA datasets, with 15 being more frequent in GC, 10 being more frequent in TCMA and five being equally frequent in these datasets (

Shared mtDNA variants and their potential impact
Then, we performed a more in-depth analysis of the mutations in common among the different types of cancer by modelling an mtDNA mutation-cancer network, which resulted in 7,020 associations of mitochondrial variants with these 37 cancer types (36 from TCMA and GC as a reference). In Fig. 2, we plotted the mtDNA mutation-cancer network, providing a general overview of the shared variants, while Table 1 shows the statistical analysis of the variant overlaps between GC and the different cancer types, with 11 being statistically significant (from a total of 22). The empirical p-values were obtained by runs of network randomizations between pairs of cancer types regarding the Jaccard index for the number of found variants, based on the complex network strategy proposed by Araújo et al. (2016). The probability of a given Jaccard index was calculated with the equation.
where SNPA is the number of SNPs of Cancer A, SNPB is the number of SNPs of Cancer B and CAB is the JC calculated for the overlap between phenotypes A and B. The low values for the Jaccard index indicate that cancers showed more exclusive variants than shared ones, highlighting the found variants as potential biomarkers. Importantly, the largest number of shared mtDNA mutations per pair was eight, found in each of the following cancer types: Liver-HCC, Kidney-RCC and Prost-Ade-noCA. This result suggests a possible correlation of these types with cancer progression.
Notably, 11 variants in the following regions were shared between GC and at least two other types of cancer with statistical significance: two in MT-ND4 -10810 (Liver-HCC and Kidney-RCC). Curiously, there was only one variant shared with statistical significance between GC and Stomach-AdenoCA, which could be due to the reduced sample number of this type of cancer in both datasets, a general limitation of this study that could lead to a loss of information, so future studies with larger datasets are encouraged. Nevertheless, it is remarkable that all 11 variants with statistical significance in the analysed tumour samples are in MT-ND4, MT-ND5 or the D D-loop, that is, two genes that encode subunits of Complex I and a regulatory region. Among these shared variants, three are nonsynonymous, predicted as pathogenic with a Mutpred probability ranging from 0.51 to 0.63. The amino acid change associations were previously predicted between Prost-AdenoCA and MT-ND5 (13789 T > C, Y485H), Kidney-RCC and MT-ND6 (14178 T > C, I166V), and Kidney-ChRCC and MT-ATP6 (8584 G > A, A20T) [7].

Different approaches in cancer-related SNP networks
Considering our results, as seen in Fig. 2, we stress that much information can be lost if most efforts only focus on the nuclear genome, leaving the mitochondrial genome aside. Indeed, by dissecting the mtDNA network, we identified genetic overlaps suggesting novel events that are not observed when building nuclear SNP-disease networks. In Fig. 3, we show a network of this sort that was modelled using GWAS hits from the DANCE web tool regarding GC, prostate cancer, renal cell carcinoma and hepatocellular carcinoma. This finding adds a layer of complexity in understanding the genetics of several cancers. In this context, our study contributes to the identification of a set of mitochondrial variants associated with GC and other types of cancer.
Taking this into consideration, we developed a graphbased visualization tool of the dataset overlaps analysed here to improve the understanding of the relationship among mtDNA variants and cancer types. The mtDNA mutation-cancer interactions found here were catalogued and reported as seen in Fig. 2, along with raw data for the network, which can be interactively explored. This user-friendly web tool was made freely available at https:// www2. lghm. ufpa. br/ mtdna.

Discussion
In this study, our analyses with multiple types of cancer have highlighted different mtDNA mutations, particularly in the MT-ND4, MT-ND5 and D-loop, suggesting a special importance of such mitochondrial mutations and regions during cancer development and progression. The MT-ND4 and MT-ND5 genes encode core subunits of respiratory chain complex I (also known as NADH dehydrogenase), which receives electrons from carrier NADH (reduced nicotinamide adenine dinucleotide) after the TCA cycle, initiating the process of OXPHOS [1]. Because OXPHOS is responsible for most ATP produced in aerobic cellular respiration and Complex I is the main entry of electrons, defects in this complex could result in impairment of the entire process. It can also increase the production of reactive oxygen species (ROS), culminating in oxidative stress and promoting tumorigenesis [8]. In fact, mutations in MT-ND4 and MT-ND5 leading to complex I dysfunction have been previously associated with procancerous phenotypes and tumorigenesis in different types of cancer, although the particularities of this relationship are still unknown [9,10].
The D-loop, which is part of the major noncoding region (NCR) in the mt genome, is responsible for the start of heavy strand replication, as well as the transcription of both strands, and it is known to be hypervariable [11,12]. Since D-loop and NCR are terms frequently used as synonyms, it should be noted that, here, "D-loop" represents not only the D-loop per se but also the major NCR. Nevertheless, abnormalities in replication and transcription of mtDNA could lead to mitochondrial instability and malfunction, which in turn could play a role in carcinogenesis and metastasis. Independent studies have reported D-loop variants in association with different kinds of cancer, including gastric cancer, but there is still no consensus as to which variants influence this process [12][13][14].
Notably, epigenomic factors may also play a role in D-loop function. For instance, long noncoding RNAs (lncRNAs) generated by heavy strand transcription have been described in the D-loop with an interaction with mitochondrial topoisomerase 1B (TOP1MT), which normally promotes mtDNA replication, suggesting a possible influence of these lncRNAs on the regulation of mtDNA expression [15]. In addition, treatment with miR34a encapsulated in hyaluronic acid nanoparticles has been demonstrated to reduce CpG methylation in the D-loop and to increase mRNA transcripts of different mitochondrial genes in lung cancer cells [16]. Currently, little is known about epigenomic mechanisms such as noncoding RNAs in mitochondria, particularly in D-loops, but it is increasingly clear that this matter should be further explored [1].
Regardless, in our analyses, it was also notable that 11914 (MT-ND4) and 16111 (D-loop) are not only present in the greatest number of cancer types, but these types are the same (hepatocellular carcinoma, oesophageal adenocarcinoma, pancreatic adenocarcinoma, renal cell carcinoma and prostate adenocarcinoma). Additionally, 10810 (MT-ND4), 12705 (MT-ND5) and 16527 (D-loop) are also shared by the same cancer types (hepatocellular carcinoma and renal cell carcinoma). Another interesting finding, as previously mentioned, is that kidney, prostate, and liver cancers (renal cell carcinoma, prostate adenocarcinoma, and hepatocellular carcinoma, respectively) presented more shared variants with GC than the other types of cancer (eight variants). Curiously, no studies were found reporting a positive association with such variants in cancer, except for 16189 (D-loop). This mutation, which is suggested to generate mitochondrial impairment and genome instability, has been related to different complex conditions, such as endometriosis [17], coronary artery disease [18], type 2 diabetes mellitus and metabolic syndrome [19], as well as multiple types of cancer, namely, endometrial [20], breast [21], colorectal [22] and acute myeloid leukaemia [23].
Additionally, a study by [24] suggested that variant 16189 could especially affect iron homeostasis and electron transport chains of OXPHOS, leading to increased oxidative stress. In fact, excess iron has been considered a risk factor for cancer development and progression [25][26][27].
Notably, mitochondria have been associated with tumorigenesis and metastasis, even though the mitochondrial mechanisms and dynamics involved in these processes are not fully understood [28]. A recent review by [29] highlighted the importance of studying mitochondrial dysfunction in different complex diseases, including cancer, in regard to mtDNA alterations as potential biomarkers for the development, metastasis and prognosis of cancer. In this sense, our findings might shed light on this matter, considering the overlap of cancer types, particularly GC with kidney, liver, and prostate cancers, which allowed us to hypothesize that these variants are likely to play a role in their progression with potential involvement in a metastatic process between these types of cancer. In the specialized literature, although considered an uncommon process, it is possible to find various case reports of gastric metastases from renal cell carcinoma [30,31] or prostatic adenocarcinoma [32,33], even several years after the first diagnosis [34]. A secondary tumour might also be so similar to a primary tumour that it could be challenging to diagnose it as a metastatic tumour at first [35]. Recently, a case report has reinforced that it is still unclear how gastric metastasis from prostate cancer occurs -whether it is by lymphatic or haematogenous spread -as well as the importance of an early diagnosis [36]. It is noteworthy that some cases of prostate or kidney metastasis from gastric adenocarcinoma have also been reported [37,38]. Regarding liver metastasis from gastric cancer, it has been pointed out that the liver is one of the most frequently affected organs in metastasis from gastric cancer: metastasis is found in approximately 35% of gastric cancer cases at first diagnosis, of which up to 14% involve this organ [39,40]. On the other hand, hepatocellular carcinoma with gastric metastasis seems to be very rare, occurring in less than 1% of patients with this type of cancer, and thus, it is possible to misdiagnose it as gastric cancer with liver metastasis [41].
In this context, understanding the effects of genetic variants on complex diseases such as cancer may directly impact clinical strategies for personalized patient care. Approaches to building disease networks have been previously developed based on genetic associations, mainly using autosomal DNA, molecular pathways or candidate nuclear genes [42,43]. Genetic comorbidity was shown for intraclass diseases, such as neuropsychiatric disorders, and this was recently found for cancer in a TCGA (The Cancer Genome Atlas) analysis that reported 10 signalling pathways in common among 33 tumours [44]. The comprehension of interactions between phenotype and genotype and between intraclass diseases impacts the understanding of many progressive aspects of a disease such as cancer, including cancer types, metastatic mechanisms and the development of a personalized diagnosis or prognosis, as well as pharmacogenomic prevention strategies.
Importantly, genetic studies have not generally considered mitochondrial variants on a large scale and their impact during investigations of genetic comorbidity. Here, we have shown that many variants of interest could be overlooked, so we highlight the idea of data integration with several layers of genetic associations for modelling disease networks. This approach may contribute to the identification of patterns of genetic similarity and to the discovery of new mtDNA associations in the mechanism of gastric cancer and other types of cancer.

Conclusions
In summary, this work has provided new information on the mitochondrial influence on tumorigenesis and metastasis from a network perspective, indicating some potential mtDNA biomarkers -particularly MT-ND4, MT-ND5 and D-loop -and emphasizing the importance of studying mitochondrial genetics in association with cancer progression. Therefore, considering the sample number to be a limitation of this work (together with the absence of heteroplasmy and haplogroup analyses due to a lack of data), future studies with larger datasets in systems biology, as well as functional studies, are recommended to reinforce the observed interactions of these variants in different types of cancer. We also provide a user-friendly web tool to explore the broad set of shared mitochondrial variants among cancers that may support larger datasets from future studies, encouraging the development of new software for cancer and genetics epidemiology while also improving the transparency and reproducibility of research studies.

Data Collection and Characterization
Data of mitochondrial variants were extracted from two sources: (i) homoplasmic mtDNA mutations in gastric cancer from the study by [6] and (ii) the mutation section (Single Nucleotide Variant -SNV and Insertion/Deletion -INDEL files) of The Cancer Mitochondria Atlas (TCMA) portal (https:// ibl. mdand erson. org/ tcma/), which was made public and freely available by [7]. The first dataset included 13 tumour samples from gastric cancer (GC) patients, while the second included 2,536 tumour samples distributed in 36 cancer types. More detailed characterization of the samples can be found in the mentioned works.

Data Analyses
Datasets from both sources were merged into one, considering each variant/position in common. For that and all other analyses, we used Python 3 programming language and the following libraries in this environment: Pandas [45], Matplotlib [46], Seaborn [47] and Scipy [48].

The mtDNA mutation-cancer network
We constructed a bipartite network of mitochondrial mutations and cancer with datasets of different types of cancer. The mtDNA mutation-cancer network is presented as a graph G(V, E), where V comprises two distinct sets of nodes: mitochondrial mutations (u) and cancers (v). In this analysis, we considered a mitochondrial mutation to be associated with cancer if this association was reported in the works that originated the analysed datasets [6,7].
Next, we assessed the topology of the mtDNA mutation-cancer network and computed an empirical p value of mutation overlap (i.e., mutations in common) between pairs of cancer types. This was done by runs of 100 K randomization of associations between the cancers to generate their empirical distribution. Thus, the probability (P) of a given overlap was calculated with P(NA, NB | NAB), where NA is the number of mitochondrial mutations of cancer "A", NB is the number of mitochondrial mutations of cancer "B" and NAB is the overlap calculated for the pair of cancers "A" and "B". This strategy was inspired by the work of [42], which found significant association overlap for complex diseases based on data from the NHGRI/EBI GWAS Catalog (https:// www. ebi. ac. uk/ gwas/).
We then implemented a web tool to better visualize and explore this mtDNA mutation-cancer network. HTML5 with the bootstrap framework (https:// getbo otstr ap. com/) was used for front-end development. The mtDNA mutation-cancer network was generated using the D3.js library, and raw data were implemented with DataTables (https:// datat ables. net/), with both libraries in JavaScript and JQuery (https:// jquery. com/). The network and raw data were stored in JSON files and accessed via Ajax combined with libraries in Python 3 using the Flask framework for web applications (https:// flask. palle tspro jects. com/ en/1. 1.x/) in the backend.

Nuclear SNP-Disease Network
We retrieved genetic association data using the Disease-Ancestry Network (DANCE -www. ldgh. com. br/ dance), which is a web tool that allows the recovery of associations among complex diseases and nuclear single nucleotide polymorphisms (SNPs) in a bipartite network called SNP-Disease Network [42]. DANCE integrates data from the 1000 Genomes Project (Phase 3) [49] and the NHGRI GWAS Catalog (v. March 2020) [50], comprising data from 3,885 public genome-wide association studies (GWAS). DANCE database stores 149,262 associations among 4,208 phenotypes and 120,748 risk-alleles. We queried risk-alleles associations with GC, renal cell carcinoma, hepatocellular carcinoma, and prostate cancer. Risk-alleles were considered for those variants that reached a statistical significance (P-value ≤ 1e-8) and odds ratio > 1.