RNA-Seq analysis in giant pandas reveals the differential expression of multiple genes involved in cataract formation
BMC Genomic Data volume 22, Article number: 44 (2021)
The giant panda (Ailuropoda melanoleuca) is an endangered mammalian species native to China. Fewer than 2500 giant pandas are known to exist, many of which are bred in captivity as a means to preserve and repopulate the species. Like other captive mammals, giant pandas acquire age-related cataracts, reducing their quality of life. Recent comparative genome-wide methylation analysis revealed 110 differentially methylated genes associated with cataract formation including six also associated with the formation of age-related cataracts in humans.
To investigate the pathological pathway in greater detail, here we used RNA-Seq analysis to investigate the differential expression profiles of genes in three giant pandas with cataracts and three healthy controls. We identified more than 700 differentially expressed genes, 29 of which were selected for further analysis based on their low q-value. We found that many of the genes encoded regulatory and signaling proteins associated with the control of cell growth, migration, differentiation and apoptosis, supporting previous research indicating a key role for apoptosis in cataract formation.
The identification of genes involved in the formation of age-related cataracts could facilitate the development of predictive markers, preventative measures and even new therapies to improve the life of captive animals.
The housing of mammals in zoos and reservations is an efficient strategy to protect endangered species and encourage repopulation, but captive mammals tend to live longer than their wild counterparts and thus experience diseases of ageing, which are uncommon in the wild. One example is the development of age-related cataracts, which are associated with ageing mammals due to the accumulation of oxidative damage in the lens . Cataracts are the main cause of blindness in ageing humans and other primates, as well as companion animals such as dogs and cats [2,3,4]. They are also prevalent in captive giant pandas (Ailuropoda melanoleuca), which live to 15–20 years in the wild but 25–30 years in captivity [5, 6]. Giant pandas 18 or more years old are described as aged because they have reached an equivalent human age of ~ 75. The prevalence of cataracts in the current population of aged giant pandas is ~ 20%.
Age-related cataracts are heritable with significant environmental triggers, including oxidative stress and the resulting accumulation of DNA damage [7,8,9,10]. The genes most strongly associated with cataracts therefore include those related to oxidative stress responses, the production of antioxidant enzymes and metabolites, and various DNA repair pathways [11,12,13]. Previous studies identified changes in DNA methylation associated with cataract formation in several mammals [14,15,16] and we recently reported that 110 genes with functions relevant to cataract pathogenesis are differentially methylated in giant pandas, including six genes known to be associated with age-related cataracts in humans .
Epigenetic modifications such as DNA methylation affect gene expression and therefore control the availability of the corresponding gene products. To gain more insight into the role of differential gene expression in the pathogenesis of age-related cataracts in giant pandas, we took blood samples from three aged giant pandas with cataracts and three healthy controls for RNA-Seq analysis. Following the alignment of reads with the reference genome, we identified expression profiles representing genes expressed exclusively or preferentially in the cataractogenic or healthy samples, and determined the corresponding functional annotations. The identification of genes that are overexpressed or suppressed during the formation of cataracts could lead to the development of new diagnostics, preventative treatments and therapeutic approaches to improve the quality of life for captive giant pandas and other mammals.
Peripheral blood samples were collected from six giant pandas (five females and one male) ranging in age from 19 to 37, with three of the females affected by cataracts and the other specimens defined as healthy based on regular physical examinations (Table 1). The blood samples were assigned to three sample bands: A (affect females), B (unaffected male) and C (unaffected females). These sample bands were subsequently used for comparative transcriptomic analysis.
RNA-Seq data processing and quality analysis
Blood samples from all six giant pandas were used to prepare a de novo RNA-Seq dataset with an average of 52.05 ± 5.63 million reads per specimen and an average length of 144.35 ± 1.86 bp after quality control and trimming. For each specimen, ~ 87% of the reads mapped to unique sequences in the reference genome (Supplementary Table S1). Homogeneity distribution analysis and the gene coverage ratio indicated the anticipated distribution of reads within each gene and across different genes (Supplementary Fig. S1a,b) and also confirmed that ~ 44.72% of the reads mapped to exonic regions of the reference sequence (Supplementary Fig. S1c).
Analysis of gene structure characteristics
Analysis of the RNA-Seq dataset for gene coverage and chromosome distribution revealed differences in mapping density across different chromosomes, with chromosomes GL192341.1 and GL192355.1 showing a particularly low read density and chromosomes GL192348.1 showing the opposite phenomenon (Fig. 1a). The mapped reads were screened for single-nucleotide polymorphisms (SNPs) and insertion/deletion polymorphisms (indels) revealing 5.9–6.7 × 104 SNPs but < 1 × 104 indels in each sample (Fig. 1b). Among the SNPs, the number of transitions vastly exceeded the number of transversions, and the transitions A > G, C > T, G > A and T > C were particularly abundant (Supplementary Fig. S2). We detected more than 2.5 × 104 alternative mRNA processing events, the most abundant of which were alternative transcriptional start sites (8 × 103) followed by alternative transcriptional stop sites (7 × 103) (Fig. 2). We also identified at least eight new transcript regions representing previously unknown isoforms of known genes, with seven of these new regions mapping to four genes: GNAO1, OGFOD1, HERPUD1 and NLRC5 (Supplementary Table S2).
Analysis of gene expression
The number of reads mapping to each gene in the reference genome represents the abundance of the corresponding mRNA when corrected for factors such as gene length, sequencing depth and saturation. We also ensured the accuracy of our results by testing for correlation between samples (Supplementary Table S3). We then determined the number of genes that were co-expressed in our samples, revealing a core of 2711 common genes that were universally expressed and a minimum of 172 and a maximum of 263 genes in each animal that were not part of this common set (Fig. 3). PCA indicated that 36.68% of the variation could be explained by PC1, a further 33.06% by PC2, and 13.65% by PC3, primarily separating the affected and unaffected animals but to a lesser extent separating the unaffected male from the unaffected females. Overall, this resulted in the clustering of the affected females together and the dispersion of the other three samples, suggesting the affected females had more in common than any of the unaffected individuals had in common with each other (Fig. 4a). This outcome was broadly supported by PoCA, which revealed that 50.02% of the variation could be explained by PoC1, a further 13.65% by PoC2 and 5.62% by PoC3, again leading to the formation of an affected cluster with the other samples dispersed (Fig. 4b). The individual two-dimensional plots showing the PCA and PoCA data in more detail are provided in Fig. 5.
Identification and analysis of differentially expressed genes
We observed a much greater number of differentially expressed genes when comparing female pandas with cataracts to the healthy male (A vs B, 705 genes) than when comparing female pandas with cataracts to healthy females (A vs C, 33 genes). However, most of these genes were not differentially expressed when comparing healthy male and female pandas (B vs C, 116 genes), suggesting that the A vs B profile cannot be wholly explained by sex-specific differences in gene expression (Fig. 6). Given that very few genes were differentially expressed when comparing affected and healthy females (A vs C), cataract formation appears to influence a larger number of genes in male than female pandas. Among the 705 differentially expressed genes in the A vs B comparison, 533 were upregulated and 172 downregulated. When visualized as a scatter plot (Fig. 7), it is clear that more genes are upregulated in the healthy male than the affected females and are more likely to show a statistically significant change in expression (Fig. 7b) whereas the 23 upregulated and 10 downregulated genes in the comparison A vs C show limited statistical significance (Fig. 7a). The results are emphasized in the corresponding heat maps (Fig. 8). The comparison of affected females (A) with unaffected pandas of either sex (B + C) revealed 29 genes satisfying q < 0.05 and |log2 fold change (FC)| > 1 (Table 2).
Functional annotation of differentially expressed genes
The biological functions and related pathways of the differentially expressed genes were investigated by screening the sequences against the GO and KEGG databases, as well as the Clusters of Orthologous Groups of proteins (COG) and euKaryotic Ortholog Groups (KOG) maintained by the NCBI. GO annotations revealed that the differentially expressed genes represented a wide range of biological processes (111 genes), cellular components (73 genes) and molecular functions (5 genes). Strongly represented biological process categories (accounting for > 10% of the genes) included cellular process, metabolic process, biological regulation/regulation of a biological process (particularly positive regulation), and response to stimulus. With few exceptions, the functional profile of the differentially expressed genes mirrored that of the total gene catalog, although the differentially expressed genes were overrepresented in the categories immune system process and multi-organism process but underrepresented in the categories biological phase and cell aggregation (Fig. 9). There was little difference in terms of cellular component categories, with the exception of the differentially expressed genes being unrepresented in the category nucleoid. However, one of the most abundant categories among the differentially expressed and complete gene catalog was protein-containing complex, and preliminary protein network analysis revealed that the genes most strongly upregulated in the A vs B comparison also tended to be more likely to interact with other proteins and also tended to have more connections (data not shown). In terms of molecular functions, there was again little difference between the differentially expressed genes and complete gene catalog, with the exception of the differentially expressed genes being unrepresented in the categories involving morphogen, metallochaperone and chemoattractant/chemorepellant activity (Fig. 9). Interestingly, several genes encoding enzymes involved in histidine metabolism were included among the differentially expressed pathways revealed when screening KEGG (Fig. 10). The analysis of GO categories that were enriched among the differentially expressed genes revealed strong hits for immunity and defense-related functions, which showed the highest Rich factors (Table 3, Fig. 11). Visualization of the biological, cellular and molecular functions by means of directed acyclic graphs revealed that 45 genes differed between the healthy and unhealthy pandas (Supplementary Fig. S3).
Ageing mammals in captivity often develop cataracts due to the accumulation of oxidative damage . This has been observed in captive giant pandas, which typically live up to 10 years longer than their wild counterparts [5, 6]. The prevalence of cataracts in the current population of aged giant pandas is ~ 20% and this is associated with a declining quality of life, as the animals find it difficult to feed and negotiate their surroundings. Although genetic factors have been identified that promote age-related cataracts, the most important triggers are environmental, particularly oxidative stress and DNA damage [7,8,9,10]. We previously compared the DNA methylation status of giant pandas with and without cataracts in an attempt to identify epigenetic effects that might influence the expression of genes associated with cataract formation . We identified multiple differentially methylated genes with potential roles in cataract-related pathways, including base excision repair, apoptosis and p53 signaling. Certain genes also showed abnormal methylation profiles specifically in the pandas with cataracts, including the cysteine-aspartate protease gene CASP3, a pro-apoptotic mediator already linked to cataracts in rats [18, 19], and the glutathione S-transferase gene GSTM3, which is expressed in the lens tissues of human patients with age-related cataracts .
DNA methylation is an epigenetic mechanism that regulates gene expression by modifying the structure of chromatin, usually leading to the suppression of gene expression. As a logical extension of our previous study, we were therefore interested in the analysis of differential gene expression between giant pandas with cataracts and controls with healthy eyes. Using the same animals as in our previous study, we carried out RNA-Seq analysis to identify panels of genes either upregulated or downregulated in pandas with cataracts. Following the alignment of reads with the reference genome, we identified expression profiles representing genes expressed exclusively or preferentially in the cataractogenic or healthy samples, and identified the corresponding functional annotations by screening the KEGG database and GO categories. Our RNA-Seq data also revealed abundant alternative splicing and novel transcripts within the dataset, as previously reported during murine lens development . Two genes that were shown in our DNA methylation study  to be methylated specifically in affected pandas were also shown to be downregulated in the affected pandas by RNA-Seq analysis. CASP3, encoding the apoptotic cysteine-aspartate protease was downregulated with a log2FC of − 1.03, whereas GSTM3, encoding the glutathione S-transferase, was downregulated with a log2FC of − 1.28.
A link between the glutathione (GSH) pathway and cataract formation was identified in a transcriptomic study in mice . The authors considered the consequences of GSH deficiency in naïve and buthionine sulfoximine-treated C57Bl/6 LEGSKO (lens GSH-synthesis knockout) mice compared to wild-type controls, thus providing a more relevant model of oxidative damage. Among 24,415 mapped reads, 441 genes showed significantly modulated expression, including genes involved in epithelial-mesenchymal transition (EMT) signaling, the visual cycle, and lipid metabolism. Several detoxification genes were upregulated, including the aldehyde dehydrogenases Aldh1a1 and Aldh3a1, the metallothioneins Mt1 and Mt2, the carboxylesterase Ces1g, and the urea transporter Slc14a1, whereas genes encoding lens crystallins and other vision-related genes were downregulated . The authors concluded that GSH deficiency in the lens leads to the expression of detoxifying genes and the activation of EMT signaling (providing evidence of adaption to the loss of antioxidant capacity) and also revealing a pathogenetic mechanism of cataract formation.
Similarly, our RNA-Seq analysis revealed the differential expression of several genes related to oxidative damage, the visual cycle, developmental functions, and lipid metabolism, suggesting that ageing pandas develop cataracts as the natural capacity for oxidative stress responses begins to diminish. For example, we observed a 7.45-fold downregulation of NCF1, encoding a membrane-bound subunit of NADPH oxidase which is involved in superoxide production and the induction of apoptosis [23, 24]. Examples of differentially regulated genes involved in visual perception included COL2A1, encoding a collagen component of the extracellular matrix (1.2-fold induction) and UNC119, encoding a G-protein-binding factor that is required for protein trafficking in photoreceptor cells (1.1-fold induction) . The differentially expressed genes related to developmental functions included EPCAM, encoding the epithelial cell adhesion molecule involved in cell–cell adhesion, cell signaling, migration, proliferation, and differentiation (downregulated 1.1-fold) [26, 27] as well as a chloride channel that interacts with the cytoskeleton and is known to regulate vascular morphogenesis in the eye (CLIC4, 1.2-fold induction) [28, 29] and another gene involved in vascular remodeling (ENG, 2.1-fold induction) .
Previous transcriptomic studies of cataract formation have focused on the analysis of gene expression in models of human congenital cataracts rather than age-related cataract formation, but it is possible that some of the pathway elements are conserved. For example, mutations in the major intrinsic protein (MIP, also known as aquaporin 0) promote cataract formation in human infants, and several strains of rats and mice with loss-of-function mip mutations also develop fully penetrant cataracts [31,32,33,34,35]. More recently, we reported a novel mutation in the panda Mip gene also associated with cataract formation . Transcriptomic analysis in juvenile mip−/− knockout mice identified 29 genes with > 2-fold changes in expression, including the mitochondrial translocase (Timmdc1), a matrix metallopeptidase (Mmp2), a Rho GTPase-interacting protein (Ubxn11) and a transcription factor (Twist2) which were strongly upregulated, and a proteasome subunit (Psmd8), a ribonuclease (Pop4), and a heat-shock protein (Hspb1) that were strongly downregulated .
The discovery of differentially expressed proteasome subunits and heat shock factors indicates that the regulation of protein turnover may contribute to cataract formation, and likewise we identified several differentially expressed genes with similar functions. These included GLG1, which encodes a negative regulator of protein processing (downregulated 5.7-fold in affected pandas), AZIN1, which encodes a regulator of protein turnover (downregulated 1.2-fold in affected pandas), and UCHL1, which encodes a thiol-dependent ubiquitin-specific protease (downregulated 1.8-fold in affected pandas). Furthermore, we detected the strong (16.4-fold) upregulation of TNFSF12, encoding a regulator of protein turnover linked to angiogenesis and apoptosis , and a 2.1-fold induction of SERPINB10, another regulator of proteases associated with apoptosis . The link with apoptosis was also supported by the 3.7-fold induction of EGR1, which mediates p53-independent apoptosis induced by c-Myc . We also observed the 1.5-fold induction of ARHGAP21, a target of p53 that promotes the degradation of MDM2 to maintain TP53 stability , and the 1.4-fold induction of STEAP3, another target of p53 that regulates cell cycle progression and apoptosis . The IFI27L2 gene, encoding an interferon-induced mitochondrial membrane protein that also promotes apoptosis, was induced 1.24-fold . Recent studies have shown that lens epithelial cells undergo apoptosis as a common cytological basis for all kinds of cataract except congenital lesions.
Several of the most strongly modulated genes we identified are involved in signaling, indicating the activation of cell–cell signaling as a response to lens deterioration. The top-ranking downregulated gene was RCAN1 (16-fold repression), which encodes a calcineurin-binding regulator of CNS development. The relevance of this gene in cataracts is unclear, but may be linked to the vascular remodeling discussed above given its role in the suppression of angiogenesis . Other genes we identified appear to be involved in signaling pathways involving G-protein-coupled receptors (GPCRs). For example, CYSLTR1 (induced 1.3-fold) encodes a leukotriene-specific GPCR typically involved in bronchoconstriction, but it also regulates vascular permeability, cell migration and collagen deposition, which may be the more relevant functions here . The UV irradiation of lens epithelial cells was previously shown to increase the expression of LET-7B, the ligand for another GPCR (LGRT), resulting in the induction of apoptosis . A targeted deletion in LGR4 reduced the resistance of rat lens epithelial cells to oxidative stress and accelerated the development of age-related cataracts . RASD1 (induced 4.3-fold) encodes a member of the Ras family of G-protein regulators acting downstream of GPCRs, and its upregulation in human cell lines has been shown to suppress cell growth and induce apoptosis [48, 49]. Similarly RANBP9 (induced 2.7-fold) encodes a Ras protein that influences cytoskeletal organization and interacts with several regulators of cell growth . PDE4A (induced 1.6-fold) encodes a cAMP-specific 3′,5′-cyclic phosphodiesterase that regulates second messenger signaling downstream of GPCRs by cleaving cAMP . Other modulated genes encoding signaling proteins potentially involved in the regulation of cell growth included PTP4A3, encoding a membrane-associated protein tyrosine phosphatase (induced 2.8-fold) , ANXA3, encoding annexin 3 (downregulated 1.75-fold) , and DUSP22, encoding a dual-specificity protein kinase (induced 1.42-fold) . We also observed the 1.92-fold downregulation of the MYBL1 gene, whose principal function is the regulation of meiosis, so its relevance in the context of cataract formation is unclear.
Mutations in the HSF4 gene have been shown to cause cataracts in humans and other mammals , including the recent discovery of a novel HSF4 mutation in pandas . HSF4 is a transcriptional repressor and we proposed that the novel mutation we discovered in this gene is likely to affect its interactions with upstream signaling components, thus disrupting the genetic control of lens functions. Although HSF4 was not among the differentially expressed genes we detected in this study, we found that both FOS and FOSB were strongly upregulated (4.5-fold and 4.9-fold, respectively). These genes encode two leucine zipper proteins that dimerize with members of the JUN family to form transcriptional regulators . Importantly, the FOS family of transcription factors is involved in the regulation of many of the processes discussed above revealed by the functional analysis of other differentially expressed genes, including cell proliferation, differentiation and survival, oxidative stress and angiogenesis . It is therefore possible that FOS/FOSB play a key role in the regulation of the other genes discussed above, or that the FOS/FOSB genes are targets of the modulated signaling pathways we have identified.
The analysis of GO enrichment provided results that were broadly consistent with the functional annotation of the differentially expressed genes, with cellular process, metabolic process, biological regulation/regulation of a biological process as the most overrepresented categories in the differentially expressed gene catalog. The categories immune system process and multi-organism process were also represented, which may reflect the modulation of signaling proteins that are also known to participate in the regulation of immunity responses, including cytokine release and immune cell recruitment. Another enriched category was protein-containing complex, which agreed with our preliminary protein network analysis and is consistent with the tendency of signaling proteins to form complexes. Finally, functional annotation by screening the differentially expressed genes against the KEGG database generated several hits in the histidine metabolic pathway, which is particularly interesting given that dietary histidine and/or carnitine are known to prevent cataract development in salmon, presumably by countering the effect of oxidative stress [59, 60]. Taken together, our results confirm that age-related cataracts in pandas bear many of the same hallmarks as cataracts in other animals, including the modulation of stress response genes, metabolic adaptations, and signaling pathways regulating cell growth and apoptosis. The identification of genes that are overexpressed or suppressed during the formation of cataracts could lead to the development of markers for early diagnosis, preventative strategies and therapies that improve the quality of life for captive giant pandas and other mammals.
Blood samples from six giant pandas with and without cataracts were used for de novo RNA-Seq analysis. This revealed the differential expression of several genes related to oxidative damage, the visual cycle, developmental functions, and lipid metabolism, suggesting that ageing pandas may develop cataracts as the natural capacity for oxidative stress responses begins to diminish. We found that many of the genes encoded regulatory and signaling proteins associated with the control of cell growth, migration, differentiation and apoptosis, supporting previous research indicating a key role for apoptosis in cataract formation. The identification of genes potentially involved in the formation of age-related cataracts could facilitate the development of predictive markers, preventative measures and even new therapies to improve the life of captive animals.
Routine physical examinations were carried out every month on the living captive animals, including eye, mouth and nose and general physical appearance, abdominal palpation, and general clinical signs. Blood was collected once a month for the analysis of physiological and biochemical indicators in order to exclude risk factors such as injury, diabetes or other diseases that can promote cataract formation .
Peripheral blood samples (2 ml) were collected from all six giant panda specimens (Table 1) because markers for many diseases can be detected in blood using transcriptomics methods  including eye diseases [62,63,64]. JN and LL were from Beijing Zoo, XX and YE were from Chongqing Zoo, and BD and YY were from Strait (Fuzhou) Giant Panda Research and Exchange Center. Three of the females were diagnosed with age-related cataracts and the other three donors were healthy controls. Blood samples were stored at 4 °C and total RNA was extracted within 2 days using the Trizol Total RNA Extractor kit (Sangon Biotech, Shanghai, China) according to the manufacturer’s protocol. The samples were treated with RNase-free DNase I to remove genomic DNA. RNA integrity was evaluated by 1.0% agarose gel electrophoresis, and RNA quality and quantity were assessed using a NanoPhotometer (Implen, Westlake Village, CA, USA) and an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA).
Library preparation and sequencing
High-quality RNA samples were sent to Sangon Biotech for library preparation and sequencing. The libraries were generated from 2 μg RNA using the VAHTS mRNA-seq V2 Library Prep Kit for Illumina (Vazyme Biotech, Nanjing, China) and index codes were added to attribute sequences to each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations at 94 °C in VAHTS 5× First Strand Synthesis Reaction Buffer, and first-strand cDNA was synthesized using random hexamers and M-MuLV reverse transcriptase RNase H−. Second-strand cDNA was synthesized using DNA polymerase I and RNase H, which also created blunt ends for 3′ polyadenylation. Following adapter ligation, cDNA fragments in the size range 150–200 bp were selected using the AMPure XP system (Beckman Coulter, Brea, CA, USA). The size-selected and adaptor-ligated cDNA was incubated with 3 μL USER enzyme mix (New England Biolabs, Ipswich, MA, USA) for 15 min at 37 °C then 5 min at 95 °C. We then carried out PCR using Phusion high-fidelity DNA polymerase, universal PCR primers i7 (5′-CAA GCA GAA GAC GGC ATA CGA GAT-index primer-GTG ACT GGA GTT CAG ACG TGT GCT C-3′) and i5 (5′-A*A TGA TAC GGC GAC CAC CGA GAT CTA CAC-index primer-ACA CTC TTT CCC TAC ACG ACG CTC TTC CGA T*C-3′) and sample-specific index primers (Table 4). The PCR products were purified (AMPure XP system) and library quality was assessed using the Agilent Bioanalyzer 2100 system. The libraries were then quantified and pooled. Paired-end sequencing was carried out using a HiSeq XTen device (Illumina, San Diego, CA, USA).
Data assessment and quality control
The quality of the sequence data was determined using FastQC v0.11.2. Raw reads were filtered using Trimmomatic v0.36 in five steps. First, the adaptor sequences were removed. Next, low-quality bases (Q < 20) were removed from reads in the 3′ to 5′ direction and then in the 5′ to 3′ direction. Then we used a sliding window method to remove low-quality bases from the read tails (window size = 5 bp). Finally, any reads smaller than 35 nt were removed along with the corresponding paired reads.
Alignment with reference genome
Sequences randomly selected from clean data were used as blastn queries against the NCBI nucleotide database (http://ncbi.nlm.nih.gov/). Hits with an E-value cutoff of ≤1 × 10− 10, similarity > 90% and coverage > 80% of the results were used to calculate the species distribution and pollution detection. The remaining clean reads were mapped to the reference genome using HISAT2 v2.0 with default parameters. RSeQC v2.6.1 was used for the statistical analysis of the alignment results.
Gene structure analysis
The sequence of each chromosome was compared and mapped statistically to show the distribution of sequences using BEDtools v2.26.0. After comparing the reads to the reference genome, the proportion of each gene structure in the reads was counted, including exons, introns and intergenic DNA, using Qualimap v2.2.1. Only 30 chromosomes were displayed. BCFtools v1.5 was used to find SNPs and SnpEff v2.36 was used to determine their effects. The sequences were filtered according to the mass value (> 20) and coverage (> 8). The sequences were assembled using StringTie v1.3.3b, and compared with existing genomic data in the STRING database (http://string-db.org/) using GffCompare v0.10.1 to find new transcription regions. The predicted transcripts were used as gene models to identify variable splicing. ASprofile v1.0.4 is used to classify variable splicing events according to the predicted gene model of each sample.
StringTie v1.3.3b was applied to the transcriptionally assembled reference genome to determine gene expression levels by measuring transcript abundances as transcripts per million reads (TPM) for both the protein-coding genes and lncRNAs in each sample. After quality control, the sequence was compared with the reference genome using HISAT2 v2.1.0 and the results were statistically compared using RSEQC v2.6.1, with the feature count used to homogenize the read count matrix (gene length and sequence depth) after each gene count to determine the TPM. Principal component analysis (PCA) and principal co-ordinates analysis (PCoA) were used to determine the distance and difference between samples in vegan v2.0.10. Data were presented as TPM avoid the influence of gene length or sequencing discrepancies during sample comparison. We used DESeq2 v1.12.4 to identify genes that were differentially expressed between two samples. Differential expression was considered significant if the false discovery rate (q-value) was ≤0.05 and the log2 fold change (|FC| value) was ≥2. If the normalized expression of a gene between two samples was zero, its expression value was adjusted to 0.01 (because 0 cannot be plotted on a log plot). If the normalized expression of a gene in two libraries was < 1, further differential expression analysis was conducted without this gene.
Functional analysis of differentially expressed genes
Given the absence of biological replicate samples, we first standardized all the data with TMM. This removed all the unexpressed genes, identified a sample with average data trends from many samples as the reference sample, calculated the total number of reads for all samples, and divided each sample by its own total number of reads to get the modified number of reads. The Q3 value (the third quartile) of each sample’s modified read number was calculated, averaged, and the sample with the smallest difference from the average Q3 was used as the reference sample. To find the representative gene sets in each sample and calculate the standardization factor of the sample, we referred to the fold change of these representative gene sets. Differentially expressed genes were identified by standardizing the read count data in DESeq2 v1.12.4 as described above. TopGO v2.24.0 was then used for GO enrichment analysis and to prepare the significant GO-directed acyclic graph. ClusterProfiler v3.0.5 was used to analyze KEGG pathways and KOG taxonomic enrichment analysis, allowing the construction of a network diagram. Functional enrichment analysis was carried out to identify differentially expressed genes significantly enriched in GO terms (biological, cellular and molecular functions) or KEGG metabolic pathways. Genes were mapped to the GO database (http://www.geeontology.org) and KEGG database (http://www.kegg.jp) [65,66,67], the number of genes representing each term or pathway was calculated, and hypergeometric tests were performed to identify significantly enriched GO terms or KEGG pathways in the gene list. GO terms and KEGG pathways were considered significant if the q-value was ≤0.05.
The results of differential gene expression analysis were visualized using DESeq2 v1.12.4 and mapped to the STRING protein interaction network database (http://string-db.org/) to construct the protein interaction network. Based on these results, Venn diagrams were prepared with VennDiagram v1.6.17 in the R package. The correlation of gene expression levels between samples was determined in R using the cor.test package to ensure data reliability and rational sample selection. The sample-to-sample and group-to-group distance heat map were prepared in R using the gplots v2.17.0 package, which showed the distance relationship between the samples or the groups directly. Scatter plots were used to show the degree of difference between groups of genes and were prepared in R v3.2.
Availability of data and materials
The RNA sequencing data are available at NCBI GenBank (https://dataview.ncbi.nlm.nih.gov/object/PRJNA720280?reviewer=m87rlugdrccp4b1f1q494083re). All data necessary for confirming the conclusions of the article are present within the article, figures, and tables.
Truscott RJWJEER. Age-related nuclear cataract-oxidation is the key. Exp Eye Res. 2005;80(5):709–25. https://doi.org/10.1016/j.exer.2004.12.007.
Uno HJA. Age-related pathology and biosenescent markers in captive rhesus macaques. Age. 1997;20(1):1–13. https://doi.org/10.1007/s11357-997-0001-5.
Asbell PA, Dualan I, Mindel JS, Brocks D, Ahmad M, Epstein SPJTL. Age-related cataract. Lancet. 2005;365(9459):599–609. https://doi.org/10.1016/S0140-6736(05)70803-5.
Urfer SR, Greer K, Wolf NS. Age-related cataract in dogs: a biomarker for life span and its relation to body size. Age. 2011;33(3):451–60. https://doi.org/10.1007/s11357-010-9158-4.
Jin Y, Lin W, Huang S, Zhang C, Pu T, Ma W, et al. Dental abnormalities in eight captive giant pandas (Ailuropoda melanoleuca) in China. J Comp Pathol. 2012;146(4):357–64. https://doi.org/10.1016/j.jcpa.2011.08.001.
Jin Y, Chen S, Chao Y, Pu T, Xu H, Liu X, et al. Dental abnormalities of eight wild Qinling giant pandas (Ailuropoda melanoleuca qinlingensis), Shaanxi Province. China J Wildlife Dis. 2015;51(4):849–59. https://doi.org/10.7589/2014-12-289.
Hammond CJ, Duncan DD, Snieder H, De Lange M, West SK, Spector TD, et al. The heritability of age-related cortical cataract: the twin eye study. Invest Ophthalmol Vis Sci. 2001;42(3):601–5.
Ottonello S, Foroni C, Carta A, Petrucco S, Maraini GJO. Oxidative stress and age-related cataract. Ophthalmologica. 2000;214(1):78–85. https://doi.org/10.1159/000027474.
Ho M, Peng Y, Chen S, Chiou SJJ. Senile cataracts and oxidative stress. J Clin Gerontol Geriatr. 2010;1(1):17–21. https://doi.org/10.1016/j.jcgg.2010.10.006.
Tinaztepe OE, Ay M, Eser EJCER. Nuclear and mitochondrial DNA of age-related cataract patients are susceptible to oxidative damage. Curr Eye Res. 2017;42(4):1–6. https://doi.org/10.1080/02713683.2016.1200100.
Billingsley G, Santhiya ST, Paterson AD, Ogata K, Wodak SJ, Hosseini SM, et al. CRYBA4, a novel human cataract gene, is also involved in microphthalmia. Am J Hum Genet. 2006;79(4):702–9. https://doi.org/10.1086/507712.
Hasanova N, Kubo E, Kumamoto Y, Takamura Y, YJBJoO A. Age-related cataracts and Prdx6: correlation between severity of lens opacity, age and the level of Prdx 6 expression. Br J Ophthalmol. 2009;93:1081–4.
Zhang Y, Zhang L, Sun D, Li Z, Wang L, Liu PJMV. Genetic polymorphisms of superoxide dismutases, catalase, and glutathione peroxidase in age-related cataract. Mol Vis. 2011;17:2325–32.
Liu X, Luo Y, Zhou P, Lu YJIO, Science V. DNA methylation mediated and oxidative stress related genes CRYAA and GJA3 in nuclear age-related cataract (ARC) and its mechanism. Invest Ophthalmol Vis Sci. 2015;56:5877.
Zhou P, Luo Y, Liu X, Fan L, Lu YJTFJ. Down-regulation and CpG island hypermethylation of CRYAA in age-related nuclear cataract. FASEB J. 2012;26(12):4897–902. https://doi.org/10.1096/fj.12-213702.
Wang Y, Li F, Zhang G, Kang L, Qin B, Guan HJCER. Altered DNA methylation and expression profiles of 8-oxoguanine DNA glycosylase 1 in lens tissue from age-related cataract patients. Curr Eye Res. 2015;40(8):815–21. https://doi.org/10.3109/02713683.2014.957778.
You Y, Bai C, Liu X, Xia M, Jia T, Li X, et al. Genome-wide analysis of methylation in giant pandas with cataract by methylation-dependent restriction-site associated DNA sequencing (MethylRAD). PLoS One. 2019;14(9):e0222292. https://doi.org/10.1371/journal.pone.0222292.
Doshna CW, Fortner JH, Pfohl JC, Aleo TMW, MEJIO V. Science V Investigation of the role of apoptosis in drug-induced cataract formation. Invest Ophthalmol Vis Sci. 2002;43:2377.
Galichanin K, Svedlund J, Soderberg PGJEER. Kinetics of GADD45α, TP53 and CASP3 gene expression in the rat lens in vivo in response to exposure to double threshold dose of UV-B radiation. Exp Eye Res. 2012;97(1):19–23. https://doi.org/10.1016/j.exer.2012.02.003.
Li B, Zhou J, Zhang G, Wang Y, Kang L, Wu J, et al. Relationship between the altered expression and epigenetics of GSTM3 and age-related cataract. Invest Ophthalmol Vis Sci. 2016;57(11):4721–32. https://doi.org/10.1167/iovs.16-19242.
Srivastava R, Budak G, Dash S, Lachke SA, Janga SC. Transcriptome analysis of developing lens reveals abundance of novel transcripts and extensive splicing alterations. Sci Rep. 2107;7:11572.
Whitson JA, Zhang X, Medvedovic M, Chen J, Wei Z, Monnier VM, et al. Transcriptome of the GSH-depleted lens reveals changes in detoxification and EMT signaling genes, transport systems, and lipid homeostasis. Invest Ophthalmol Vis Sci. 2017;58(5):2666–84. https://doi.org/10.1167/iovs.16-21398.
Wientjes FB, Reeves EP, Soskic V, Furthmayr H, Segal AW. The NADPH oxidase components p47(phox) and p40(phox) bind to moesin through their PX domain. Biochem Biophys Res Commun. 2001;289(2):382–8. https://doi.org/10.1006/bbrc.2001.5982.
Gu Y, Xu YC, Wu RF, Nwariaku FE, Souza RF, Flores SC, et al. p47phox participates in activation of RelA in endothelial cells. J Biol Chem. 2003;278(19):17210–7. https://doi.org/10.1074/jbc.M210314200.
Swanson DA, Chang JT, Campochiaro PA, Zack DJ, Valle D. Mammalian orthologs of C. elegans unc-119 highly expressed in photoreceptors. Invest Ophthalmol Vis Sci. 1998;39(11):2085–94.
Balzar M, Winter MJ, de Boer CJ, Litvinov SV. The biology of the 17-1A antigen (ep-CAM). J Mol Med. 1999;77(10):699–712. https://doi.org/10.1007/s001099900038.
Münz M, Kieu C, Mack B, Schmitt B, Zeidler R, Gires O. The carcinoma-associated antigen EpCAM upregulates c-myc and induces cell proliferation. Oncogene. 2004;23(34):5748–58. https://doi.org/10.1038/sj.onc.1207610.
Jilishitz I. Chloride intracellular channels 1 and 4 function in distinct branches of S1P signaling to regulate endothelial cell behavior and vascular development. PhD thesis, Columbia University; 2016.
Ulmasov B, Bruno J, Gordon N, Hartnett ME, Edwards JC. Chloride intracellular channel protein-4 functions in angiogenesis by supporting acidification of vacuoles along the intracellular tubulogenic pathway. Am J Pathol. 2009;174(3):1084–96. https://doi.org/10.2353/ajpath.2009.080625.
Sanz-Rodriguez F, Guerrero-Esteo M, Botella LM, Banville D, Vary CP, Bernabéu C. Endoglin regulates cytoskeletal organization through binding to ZRP-1, a member of the Lim family of proteins. J Biol Chem. 2004;279(31):32858–68. https://doi.org/10.1074/jbc.M400843200.
Shiels A, Bassnett S. Mutations in the founder of the MIP gene family underlie cataract development in the mouse. Nat Genet. 1996;12(2):212–5. https://doi.org/10.1038/ng0296-212.
Shiels A, Mackay D, Bassnett S, al-Ghoul K, Kuszak J. Disruption of lens fiber cell architecture in mice expressing a chimeric AQP0-LTR protein. FASEB J. 2000;14(14):2207–12. https://doi.org/10.1096/fj.99-1071com.
Sidjanin DJ, Parker-Wilson DM, Neuhauser-Klaus A, et al. A 76-bp deletion in the Mip gene causes autosomal dominant cataract in Hfi mice. Genomics. 2001;74(3):313–9. https://doi.org/10.1006/geno.2001.6509.
Okamura T, Miyoshi I, Takahashi K, Mototani Y, Ishigaki S, Kon Y, et al. Bilateral congenital cataracts result from a gain-of-function mutation in the gene for aquaporin-0 in mice. Genomics. 2003;81(4):361–8. https://doi.org/10.1016/S0888-7543(03)00029-6.
Watanabe K, Wada K, Ohashi T, Okubo S, Takekuma K, Hashizume R, et al. A 5-bp insertion in Mip causes recessive congenital cataract in KFRS4/Kyo rats. PLoS One. 2012;7(11):e50737. https://doi.org/10.1371/journal.pone.0050737.
Bai C, You YY, Liu XF, Xia MH, Wang W, Jia T, et al. A novel missense mutation in the gene encoding major intrinsic protein (MIP) in a Giant panda with unilateral cataract formation. BMC Genomics. 2021;22(1):100. https://doi.org/10.1186/s12864-021-07386-8.
Bennett TM, Zhou Y, Shiels A. Lens transcriptome profile during cataract development in Mip-null mice. Biochem Biophys Res Commun. 2016;478(2):988–93. https://doi.org/10.1016/j.bbrc.2016.08.068.
Burkly LC. TWEAK/Fn14 axis: the current paradigm of tissue injury-inducible function in the midst of complexities. Seminars in immunology. The TNF family - challenges ahead. 2014;26:229–36.
Schleef RR, Chuang TL. Protease inhibitor 10 inhibits tumor necrosis factor alpha -induced cell death. Evidence for the formation of intracellular high M(r) protease inhibitor 10-containing complexes. J Biol Chem. 2000;275(34):26385–9. https://doi.org/10.1074/jbc.C000389200.
Boone DN, Qi Y, Li Z, Hann SR. Egr1 mediates p53-independent c-Myc-induced apoptosis via a noncanonical ARF-dependent transcriptional mechanism. Proc Natl Acad Sci U S A. 2011;108(2):632–7. https://doi.org/10.1073/pnas.1008848108.
Rosa LRO, Soares GM, Silveira LR, Boschero AC, Barbosa-Sampaio HCL. ARHGAP21 as a master regulator of multiple cellular processes. J Cell Physiol. 2018;233(11):8477–81. https://doi.org/10.1002/jcp.26829.
Passer BJ, Nancy-Portebois V, Amzallag N, Prieur S, Cans C, Roborel de Climens A, et al. The p53-inducible TSAP6 gene product regulates apoptosis and the cell cycle and interacts with nix and the Myt1 kinase. Proc Natl Acad Sci U S A. 2003;100(5):2284–9. https://doi.org/10.1073/pnas.0530298100.
Gytz H, Hansen MF, Skovbjerg S, Kristensen AC, Hørlyck S, Jensen MB, et al. Apoptotic properties of the type 1 interferon induced family of human mitochondrial membrane ISG12 proteins. Biol Cell. 2017;109(2):94–112. https://doi.org/10.1111/boc.201600034.
Baek KH, Zaslavsky A, Lynch RC, Britt C, Okada Y, Siarey RJ, et al. Down's syndrome suppression of tumour growth and the role of the calcineurin inhibitor DSCR1. Nature. 2009;459(7250):1126–30. https://doi.org/10.1038/nature08062.
Liu M, Yokomizo T. The role of leukotrienes in allergic diseases. Allergol Int. 2015;64(1):17–26. https://doi.org/10.1016/j.alit.2014.09.001.
Dong Y, Zheng Y, Xiao J, Zhu C, Zhao M. MicroRNA let-7b induces lens epithelial cell apoptpsis by targeting leucinerich repeat containing G protein-coupled receptor 4 (Lgr4) in age-related cataract. Exp Eye Res. 2016;147:98–104. https://doi.org/10.1016/j.exer.2016.04.018.
Zhu J, Hou Q, Dong XD, Wang Z, Chen X, Zheng D, et al. Targeted deletion of the murine Lgr4 gene decreases lens epithelial cell resistance to oxidative stress and induces age-related cataract formation. Plos one. 2015;10(3):e0119599.
Vaidyanathan G, Cismowski MJ, Wang G, Vincent TS, Brown KD, Lanier SM. The Ras-related protein AGS1/RASD1 suppresses cell growth. Oncogene. 2004;23(34):5858–63. https://doi.org/10.1038/sj.onc.1207774.
Liu XJ, Li YQ, Chen QY, Xiao SJ, Zeng SE. Up-regulating of RASD1 and apoptosis of DU-145 human prostate cancer cells induced by formononetin in vitro. Asian Pacific J Cancer Prev. 2014;15(6):2835–9. https://doi.org/10.7314/APJCP.2014.15.6.2835.
Nakamura M, Masuda H, Horii J. Kuma Ki, Yokoyama N, Ohba T, Nishitani H, Miyata T, Tanaka M, Nishimoto T. when overexpressed, a novel centrosomal protein, RanBPM, causes ectopic microtubule nucleation similar to γ-tubulin. J Cell Biol. 1998;143(4):1041–52. https://doi.org/10.1083/jcb.143.4.1041.
Wallace DA, Johnston LA, Huston E, MacMaster D, Houslay TM, Cheung YF, et al. Identification and characterization of PDE4A11, a novel, widely expressed long isoform encoded by the human PDE4A cAMP phosphodiesterase gene. Mol Pharmacol. 2005;67(6):1920–34. https://doi.org/10.1124/mol.104.009423.
Zeng Q, Dong JM, Guo K, Li J, Tan HX, Koh V, et al. PRL-3 and PRL-1 promote cell migration, invasion, and metastasis. Cancer Res. 2003;63(11):2716–22.
Park JE, Lee DH, Lee JA, Park SG, Kim NS, Park BC, et al. Annexin A3 is a potential angiogenic mediator. Biochem Biophys Res Commun. 2005;337(4):1283–7. https://doi.org/10.1016/j.bbrc.2005.10.004.
Sekine Y, Ikeda O, Hayakawa Y, Tsuji S, Imoto S, Aoki N, et al. DUSP22/LMW-DSP2 regulates estrogen receptor-alpha-mediated signaling through dephosphorylation of Ser-118. Oncogene. 2007;26(41):6038–49. https://doi.org/10.1038/sj.onc.1210426.
Bu L, Jin Y, Shi Y, Chu R, Ban A, Eiberg H, et al. Mutant DNA-binding domain of HSF4 is associated with autosomal dominant lamellar and Marner cataract. Nature Genet. 2002;31(3):276–8. https://doi.org/10.1038/ng921.
You YY, Bai C, Liu XF, Xia MH, Yin YQ, Chen YC, et al. A novel missense mutation in the HSF4 gene of giant pandas with senile congenital cataracts. Sci Rep. 2021;11:5411.
Halazonetis TD, Georgopoulos K, Greenberg ME, Leder P. C-Jun dimerizes with itself and with c-Fos, forming complexes of different DNA binding affinities. Cell. 1988;55(5):917–24. https://doi.org/10.1016/0092-8674(88)90147-X.
Tulchinsky E. Fos family members: regulation, structure and role in oncogenic transformation. Histol Histopathol. 2000;15(3):921–8. https://doi.org/10.14670/HH-15.921.
Waagbø R, Trösse C, Koppe W, Fontanillas R, Breck O. Dietary histidine supplementation prevents cataract development in adult Atlantic salmon, Salmo salar L., in seawater. Br J Nutr. 2010;104(10):1460–70. https://doi.org/10.1017/S0007114510002485.
Sambraus F, Fjelldal PG, Remø SC, Hevrøy EM, Nilsen TO, Thorsen A, et al. Water temperature and dietary histidine affect cataract formation in Atlantic salmon (Salmo salar L.) diploid and triploid yearling smolt. J Fish Dis. 2017;40(9):1195–212. https://doi.org/10.1111/jfd.12594.
Mohr S, Liew CC. The peripheral-blood transcriptome: new insights into disease and risk assessment. Trends Mol Med. 2007;13(10):422–32. https://doi.org/10.1016/j.molmed.2007.08.003.
Jeoung JW, Ko JH, Kim YJ, Kim YW, Park KH, Oh JY. Microarray-based analysis of gene expression profiles in peripheral blood of patients with acute primary angle closure. Ophthalmic Genet. 2017;38(6):520–6. https://doi.org/10.1080/13816810.2017.1300922.
Rosenbaum JT, Harrington CA, Searles RP, Fei SS, Zaki A, Arepalli S, et al. Revising the diagnosis of idiopathic uveitis by peripheral blood transcriptomics. Am J Ophthalmol. 2021;222:15–23. https://doi.org/10.1016/j.ajo.2020.09.012.
Ye HH, Zhang JM, Lu YF, Qian YY. Expression of nitric oxide synthase in peripheral blood of cataract patients and its clinical significance. Chin J Ophthalmol. 2015;4:351–5.
Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, Tanabe M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 2021;49(D1):D545–51. https://doi.org/10.1093/nar/gkaa970.
Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30. https://doi.org/10.1093/nar/28.1.27.
Kanehisa M. Toward understanding the origin and evolution of cellular organisms. Protein Sci. 1947-1951;2019:28.
This study was supported by the National Natural Science Foundation of China (NSFC 31872257). The project funding agency only provided the research funds.
Ethics approval and consent to participate
All samples were authorized by the source institution (Beijing Zoo, Chongqing Zoo, or Strait (Fuzhou) Giant Panda Research and Exchange Center). All three source institutions conform to the guidelines laid down by the Beijing Zoo Academic and Ethics Committee. The study is reported in accordance with ARRIVE guidelines. All blood samples were collected in accordance with the Wildlife Protection Law of the People’s Republic of China (President of the People’s Republic of China No. 16), and the experimental approach and samples were approved by the Beijing Zoo Academic and Ethics Committee.
Consent for publication
The authors declare no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Alignment statistics. Total reads = number (percentage) of sequences in the clean data (after sequence filtering). Total mapped = number (percentage) of sequences that can be located on the reference sequence. Multiple mapped = number (percentage) of sequences with multiple comparison positions on the reference sequence. Unique mapped = number (percentage) of sequences with unique comparison positions on the reference sequence. Read-1/Read-2 mapped = number (percentage) of Read-1 and Read-2 sequences compared to reference sequence (only the unique mapped sequences were calculated). Reads mapped to +/− = the number (percentage) of positive and negative chains on the reference sequence compared to the sequencing sequence (only the unique mapped sequences were calculated). Non-splice reads = number (percentage) of all sequences compared to exons. Splice reads = number (percentage) of segmented comparisons of sequences (also known as junction reads) on two exons. Reads mapped in proper pairs = number (percentage) of sequences for simultaneous alignment of two terminal reads.
New transcript prediction statistics.
Correlation analysis between samples.
Genome coverage statistics. (a) Homogeneity distribution curves. The x-axis represents the length of a gene, with 0 as the 5′ end and 100 as the 3′ end. The y-axis shows the total number of sequences that mapped to the corresponding gene position. Each color represents one sample. (b) Pie chart showing the gene coverage ratio. The percentage value represents the percentage of the total area of the corresponding gene in the region under which the gene is measured, with the number of genes that can be measured within the interval in parentheses. (c) Distribution of genome coverage by exon, intron and intergenic regions.
Frequency of different types of single-nucleotide polymorphism, with red representing transversions and blue representing transitions.
Directed acyclic graph of significant Gene Ontology molecular functions. Each box represents a GO term, showing the GO term ID, GO description, GO enriched p-value, and the number of differentially expressed/background genes under each GO term. The depth of color represents the degree of enrichment. (a) Acyclic graph of significant GO biological processes. (b) Acyclic graph of significant GO molecular functions. (c) Acyclic graph of significant GO cellular components.
About this article
Cite this article
You, Y., Bai, C., Liu, X. et al. RNA-Seq analysis in giant pandas reveals the differential expression of multiple genes involved in cataract formation. BMC Genom Data 22, 44 (2021). https://doi.org/10.1186/s12863-021-00996-x