Transcriptome profiling of developing testes and spermatogenesis in the Mongolian horse

Background Horse testis development and spermatogenesis are complex physiological processes. Methods To study these processes, three immature and three mature testes were collected from the Mongolian horse, and six libraries were established using high-throughput RNA sequencing technology (RNA-Seq) to screen for genes related to testis development and spermatogenesis. Results A total of 16,237 upregulated genes and 8,641 downregulated genes were detected in the testis of the Mongolian horse. These genes play important roles in different developmental stages of spermatogenesis and testicular development. Five genes with alternative splicing events that may influence spermatogenesis and development of the testis were detected. GO (Gene ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analyses were performed for functional annotation of the differentially expressed genes. Pathways related to “spermatogenesis,” male gamete generation,” “spermatid development” and “oocyte meiosis” were significantly involved in different stages of testis development and spermatogenesis. Conclusion Genes, pathways and alternative splicing events were identified with inferred functions in the process of spermatogenesis in the Mongolian horse. The identification of these differentially expressed genetic signatures improves our understanding of horse testis development and spermatogenesis.


Background
The mammalian testis is an important male reproductive organ that produces sperm and androgens. Spermatogenesis is a complex developmental process with a unique mechanism of cell division-meiosis-as a major defining event. Sperm development consists of three stages: (1) proliferation and differentiation of spermatogonia, (2) meiosis of spermatocytes, and (3) sperm maturation. Spermatogonial stem cells differentiate into spermatogonia and, with replication of their DNA content, give rise to primary spermatocytes. Secondary spermatocytes and spermatids are derived from the primary spermatocytes by DNA replication and tetraploid meiosis. Haploid sperm cells undergo at least four distinct morphological changes into sperm cells, namely chromatin condensation, acrosome formation, flagellar formation, and cytoplasmic reduction.
In addition to spermatogenic cells, spermatogenesis also involves multiple somatic cells in testicular tissues, such as Sertoli and Leydig cells. In mammalian testes, spermatogenesis occurs within seminiferous tubules, where germ cells associate with Sertoli cells, which provide the appropriate microenvironment for spermatogenesis. Related genes in these cells play important roles in specific stages of germ cell development, and the transcription and translation of these genes vary in different stages of spermatogenesis [1,2].
RNA sequencing technology (RNA-Seq) technology, which provides a global view of gene expression profiles, has been developed for mapping and quantifying transcriptomes [3,4]. This methodology has several advantages over other transcriptome technologies, such as higher resolution and sensitivity, a large dynamic range of gene expression levels, and the ability to identify novel transcribed regions and splicing isoforms of known genes [5]. Ramsköld [6] assessed different mammalian tissues by RNA-Seq and found that the vast majority of genes are specifically expressed in the testis. Furthermore, Djureinovic [7] categorized 20,050 putative human genes based on RNA-Seq expression patterns. These genes were specifically expressed in normal human testicular tissues compared to 26 other normal human tissue types in seven individuals. The analysis showed that the testis has the highest number of tissue-specific genes by far. Anand [8] identified differentially expressed genes (DEGs) in the human testis relative to other tissues by microarray analysis, and a total of 2,868 upregulated and 2,011 downregulated transcripts were identified. The testis thus seems to be highly metabolically active relative to other normal tissues as indicated by functional annotation. Currently, most of the studies on the relationship between the testis development and spermatogenesis have focused on humans or mice. However, little is known regarding the spermatogenesis process in the horse.
The Mongolian horse is one of the most famous native breeds in China, and in the world. It is a dual-purpose breed that can adapt to harsh climate and extensive breeding conditions. Normal development of the testis and spermatogenesis is extremely important to ensure high level semen production and protect population continuity, and genes that are important for different stages of testis development and spermatogenesis are likely to have roles in maintaining fertility. Unfortunately, the transcriptional expression of genes during testicular development and spermatogenesis in the horse is not well understood, thereby limiting the understanding of the mechanisms of deformation and maturation in spermatogenesis. This study aimed to study the transcriptome profiles of immature and mature male Mongolian horse testes utilizing high-throughput RNA-Seq technology and bioinformatics analysis. For this purpose, a testis transcript database was obtained, and DEGs and regulatory pathways were analyzed. This study provides new insights into gene expression and the regulatory mechanisms of horse testis development and spermatogenesis.

mRNA expression profiles in Mongolian horse testis development
To screen for genes related to testis development and spermiogenesis, a total of six libraries-three from immature colts (BSM1, BSM2, and BSM3) and three from mature stallions (ASM1, ASM2, and ASM3)-were constructed and sequenced on the Illumina Hiseq platform. An overview of the information for these libraries is listed in  (Table S1). These results verify that the six libraries were of high quality.

Analysis of alternative splicing
Alternative splicing (AS) is a common occurrence in most eukaryotic cells and can result in the translation of different forms of proteins at various time points and environments, which increases the adaptability and physical fitness of the species. In our study, the AS events were classified into five kinds by rMATS (http://rnaseq-mats.sourceforge.net/index.html): (1) SE: Skipped exon, (2) MXE: Mutually exclusive exon, (3) A5SS: Alternative 5′ splice site, (4) A3SS: Alternative 3′ splice site, and (5) RI: Retained intron. The kinds and quantities of AS events were counted, and the expression of each type of AS event was analyzed.
The results indicate that numerous and varied AS events exist in the spermatogenesis process (Table  S3).

Analysis of DEGs
The DEGs between libraries were screened based on DESeq analysis, taking padj < 0.05 as the cut-off. By considering libraries, the 16,237 up-and 8,641 down-regulated genes were identified in total that were up-regulated in ASM versus BSM (Fig. 1a). Venn diagrams were constructed to summarize the data, which show that the ASM and BSM libraries shared 16,327 genes that were expressed in both BSM and ASM (Fig. 1b). Figure 1c shows the results of hierarchical clustering, with the DEGs of the six libraries divided into two clusters. These results show that the ASM and BSM libraries had overall distinct patterns of differential expression but that the replicates within each developmental stage were similar.
For further assessment of DEGs that are transcription factors. A total 922 transcription factors belong to 60 transcription factors families.

GO enrichment analysis of DEGs
Gene ontology (GO) enrichment analysis was used to explore the functions of the DEGs in testis development. A total of 70 GO terms related to "biological processes," "molecular functions," and "cellular components" were significantly enriched across the ASM vs BSM. DEG groups (Fig. 2). Approximately 32 GO terms belong to Fig. 1 a Volcanic map of differentially expressed genes. There were significant differences in the expression of genes indicated by red (up) and green (down) regulation in before sexual maturation (BSM) and after sexual maturation (ASM) samples. No significant differences were observed in the expression of genes indicated by blue. The abscissa represents the fold change of genes in different samples; the ordinate represents the significant statistical difference of gene expression changes. b Venn diagram of gene expression. The sum of the numbers in each large circle represents the total number of genes expressed by each group, and the overlapping part of the circle represents the commonly expressed genes between groups with FPKM > 1. c Differential gene cluster. Overall FPKM hierarchical clustering diagram is based on the log 10 (FPKM + 1) value for the normalization of the conversion (scale number) and clustering. Red represents high expression genes, and blue represents low expression genes "biological processes," and these were mainly focused on "metabolic process" (2,454 genes), "organic substance metabolic process" (2,044 genes), and "cellular metabolic process" (1,948 genes). Approximately 20 GO terms belong to the functional category of "cellular component," and these were mainly concentrated in "intracellular" (1,951 genes), "intracellular part" (1,844 genes), and "organelle" (1,668 genes) classification groups. Approximately 18 GO terms belong to "molecular function," which centered on "binding" (3,037 genes), "protein binding" (1,719 genes), and "catalytic activity" (1,659 genes).

Quantitative real time PCR (qRT-PCR) validation of DEGs
To verify the differential expression of genes in ASM vs. BSM testis, we selected eight DEGs, including MEI1 (two different sites), EHMT2, BTRC, CLOCK, IQCG, ANAPC5 and SKP1, to validate the expression patterns by qRT-PCR. These genes belong to "oocyte meiosis" KEGG pathways and GO terms "spermatogenesis," "male gamete generation," and "spermatid development" (Table S4). There was a relatively high correlation between the mRNA expression levels of the eight genes, as detected by qRT-PCR and RNA-Seq (correlation coefficient = 0.853). This suggests that our RNA-Seq results are reliable. Fig. 2 The most highly enriched GO terms. The ordinate represents the enriched GO terms, and the abscissa represents the number of differential genes. Different colors were used to distinguish "biological processes," "cellular components" and "molecular functions." "*" indicates significant enrichment of GO terms Discussion Currently, with the development of improved detection techniques, a growing number of sperm-specific mRNAs have been identified. RNA-Seq technology has become an efficient and inexpensive method for the discovery of novel transcripts and genes. In our study, three ASM testes and three BSM testes were used as research samples, and 24,878 genes were screened by RNA-Seq. The results led to the identification of 16,237 genes that were upregulated and 8,641 genes that were downregulated during development. Using the next-generation platform, we found that most of the highly expressed genes are differentiation-related with identifiable roles in the development of the testis and spermatogenesis. Our analysis also identified related genes that were upregulated in ASM, such as SPATA3, SPATA18, SPATA19, SPATA45, SPATC1, SPEM1, SPERT, SHBG, TEX35, TEX43 and GSG1. Among these genes, SPATA3, SPATA18, SPATA19, SPATA45 and SPATC1, have known roles in spermatogenesis; TEX35 and TEX43 are associated with testis expression; GSG1 is related to germ cell function; SPEM1 influences spermatid maturation; and SHBG might regulate sex hormone-binding globulin. In addition, some novel genes detected are likely to be related to spermatogenesis.
Alternative splicing is a crucial mechanism for regulating gene expression and generating proteomic diversity. Recent estimates indicate that the expression of nearly 95% of human multi-exon genes involves AS [11,12]. In metazoans, AS plays an important part in generating different protein products that function in diverse cellular processes, including cell growth, differentiation, and death. In our study, five kinds of AS events were detected, most of them involving SE. The SE events and the results of GO and KEGG analyses were comprehensively analyzed to determine the influence of AS events on the function of related genes [13]. In our study, IQ motif-containing G (IQCG) is regulated by 2 SE events as categorized under the GO terms "spermatogenesis," "male gamete generation," and "spermatid development". The causality of the mutation was confirmed with a targeted null allele. Loss of IQCG disrupts spermiogenesis wherein tail formation is either incomplete or breaks apart from the sperm heads. Because IQ motif-containing genes typically regulate calmodulin, which in turn can impact the actin cytoskeleton, these findings suggest a potential role for localized calcium signaling in sperm flagellum morphogenesis, Ca 2+ signaling is Fig. 3 Scatter plot of differentially expressed KEGG genes. Note: The vertical axis represents the name of the pathway; the horizontal axis represents the Rich factor. The differential size of the points represents the number of differential expression genes, and the color of points signifies different ranges of Q values important for regulating sperm motility, most notably in the context of flagellar function and regulation [14][15][16][17][18][19]. 11 SE and 4 MXE events of the meiotic double-stranded break formation protein 1 (MEI1) gene also occurred in ASM. MEI1 was upregulated in ASM and is related to the GO terms "spermatogenesis," "male gamete generation," and "spermatid development." MEI1 is expressed specifically in the testis and the first meiosis-specific mutation identified by forward genetic approaches in mammals. Mutations in the MEI1 gene disrupt meiosis during spermatogenesis, especially in Americans of European descent [20][21][22].
Other enriched genes in this study that belong to the "spermatogenesis," "male gamete generation," and "spermatid development" GO terms include SLC26A8, MEI1, EHMT2, DPY19L2, HIST1H2BA, CAPZA3, TSSK2, TSSK1B, KDM3A, ALMS1, and IQCG. Testis anion transporter 1(Tat1) is a protein that, in humans, is encoded by the SLC26A8 gene. SLC26A8 gene was localized to developing spermatocytes [23][24][25]. The SLC26A8 protein specifically expressed in male germ cells and mature sperm. In the mouse, deletion of Tat1 caused male sterility as a results of lack of sperm motility, impaired sperm capacitation and structural defects of the flagella [26,27]. It acts as a DIDS-sensitive anion exchanger that mediates chloride, sulfate, and oxalate transport. It also has critical anion exchange functions in male germ lines during meiosis, and therefore, may play a role in spermatogenesis [25]. The DPY19L2 protein, which belongs to the dpy-19 family, is highly expressed in testis and is required for sperm head elongation and acrosome formation during spermatogenesis. Mutations in DPY19L2 are associated with an infertility disorder known as spermatogenic failure type 9 (SPGF9) [28][29][30][31][32][33][34][35]. Testis-specific serine kinase 2 (TSSK2), and Testis-specific serine kinase 1B (TSSK1B) are required during spermatid development. During spermatogenesis, TSSK2 is highly expressed in the testis and is required for the transformation of a ring-shaped structure around the base of the flagellum that originates from the chromatoid body [36][37][38][39][40][41]. The above genes have wellestablished functions in spermatogenesis; however, the regulatory mechanisms in the horse have not been verified and thus require further investigation.
In our study, six upregulated KEGG pathways were detected. Though the "oocyte meiosis" pathway is named according to its role in female fertility, genes detected in the pathway also regulated meiosis of sperm. Meiosis plays a critical role in the formation of sperm. A total genes in the oocyte meiosis pathway were detected, with functions that influence meiosis I and II. Cell division cycle 25C (CDC25C), which encode threonine/tyrosine phosphatases related with in meiotic cell cycle regulation, were found to be differentially localized in rat testicular germ cells during spermatogenesis [42]. Aurora kinase a, also known as serine/threonine-protein kinase 6, is an enzyme that, in humans, is encoded by the AURKA gene and is a member of a family of mitotic serine/threonine kinases [43]. This gene is implicated to have an essential role during mitosis and meiosis [44], and its activity peaks during the G2 phase to M phase transition in the cell cycle [45]. The G2/mitotic-specific cyclin-B1 (CCNB1) is encoded by the CCNB1 gene in humans [46]. The CCNE protein belongs to the highly conserved cyclin family, whose activity is required for cell cycle G1/S transition. This protein accumulates during the G1-S phase and is degraded as cells progress through the S-phase [47]. PLK1 is an early trigger for G2/M transition and supports the functional maturation of the centrosome in late G2/early prophase. PLK1 phosphorylates and activates components of the anaphasepromoting complex (APC). The function of APC can maintain cohesion of sister chromatids and anaphase inhibitors [48]. Meiotic recombination protein REC8 homolog is a meiosis-specific component of the cohesin complex that binds sister chromatids in preparation for the two divisions of meiosis [49,50]. F-box only protein 43 (FBXO43), also called endogenous meiotic inhibitor 2 (EMI2), was originally identified in a yeast two-hybrid screen as a novel Xenopus polo-like kinase (Plx-1 ) [51,52]. FBXO43 is also essential for spermatocytes to complete meiotic divisions in mice. In male FBXO43 knockout mice, spermatocytes cannot complete meiotic divisions, and spermatids are absent in the testes [53].
Our study focused on increasing understanding of the process of testis development and spermatogenesis in the Mongolian horse, which ultimately may lead to the development of approaches to improve fertility. Due to the lack of information available in regards to the genetics of horse spermatogenesis, we have inferred the function of certain DEGs according to available information in humans and mice. Furthermore, DEGs that have functions in development may not be directly applicable to many situations of infertility in the horse, such as azoospermia and asthenozoospermia.

Conclusions
In our study, six mRNA libraries were constructed and sequenced from horse testes from ASM and BSM. A total of 16,237 upregulated and 8,641 downregulated genes were identified; most genes had functions within biological process categories and are known to participate in spermatogenesis and testis development. We analyzed a large number of genes related to spermatogenesis and development of the horse testis, including those that play important roles in different developmental stages. We also found that the AS events might influence the function of these genes and indirectly affect the spermatogenesis and development of the horse testis. These findings will help us to further understand how gene expression is regulated during testis development and spermatogenesis. Future research to evaluate the influence of AS events on testis development and spermatogenesis-related genes, or studies that combine other experimental results, such as miRNA and piRNA expression patterns, will provide more fundamental information in understanding the regulatory mechanisms of equine testis development and spermatogenesis at the molecular level.

Ethics statement
This study was approved by and conducted according to the guidelines of the Institutional Animal Ethics Committee and Animal Care Guidelines of the Inner Mongolia Agricultural University. The animals did not unnecessarily suffer at any stage of this study.

Animals and sample collection
We received permission to geld six healthy male Mongolian horses in Xilingol League, Inner Mongolia, China. The ages of the horses were determined based on physical examination of their teeth, as well as information from the owner. Three were sexually immature colts (before sexual maturation; samples BSM1-3), between 11 and 13 months old. The other three were sexually mature stallions, between 3 and 4 years old (after sexual maturation; samples ASM1-3). We surgically collected the testes of all six horses. The excised testes were stored in cryogenic vials with RNA/DNA sample protector (TaKaRa). Small aliquots (~5 g) from the testes of each animal from the parenchyma, including the seminiferous tubules and Leydig cells from the middle area, were immediately frozen in liquid nitrogen for quantitative real-time polymerase chain reaction (qRT-PCR) analysis. All horses were reared by their owner after our study.

RNA quantification and qualification
Total RNA was extracted using TRIzol™ Reagent (Invitrogen) according to the manufacturer's protocols and resuspended in nuclease-free water. Then, the samples were treated with DNase I (Invitrogen) to eliminate any contaminating genomic DNA. RNA degradation and contamination were monitored on 1% agarose gels. RNA purity was assessed using the NanoPhotometer® spectrophotometer (IMPLEN). RNA concentrations were measured using the Qubit® RNA Assay Kit with the Qubit® 2.0 Fluorometer (Life Technologies). RNA integrity was assessed using the RNA Nano 6000 Assay Kit with the Bioanalyzer 2100 system (Agilent Technologies).

Library preparation for strand-specific transcriptome sequencing
Six libraries were constructed. The three immature testes libraries were designated BSM1, BSM2, and BSM3; and the three mature testes libraries were designated ASM1, ASM2, and ASM3. Approximately 3 μg RNA per sample were used as input material for the RNA sample preparations. Sequencing libraries were generated using NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina® (NEB) according to the manufacturer's recommendations, and index codes were added to attribute sequences to each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-conjugated magnetic beads. Fragmentation was conducted using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5X). First-strand cDNA was synthesized using random hexamer primer and M-MuLV reverse transcriptase (RNaseH-). Secondstrand cDNA synthesis was subsequently performed using DNA polymerase I and RNase H. In the reaction buffer, dNTPs with dTTP were replaced by dUTP. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activity. After adenylation of the 3′ ends of the DNA fragments, NEBNext Adaptors with hairpin loop structure were ligated for hybridization. To select cDNA fragments of preferentially 150~200 bp in length, the libraries were purified with the AMPure XP system (Beckman Coulter). Then 3 μL of USER Enzyme (NEB) were used with size-selected, adaptor-ligated cDNA at 37°C for 15 min, followed by 5 min at 95°C before PCR. PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers, and Index (X) Primer. Finally, the products were purified (AMPure XP system), and the library quality was assessed using the Agilent Bioanalyzer 2100 system.

Clustering and sequencing
Clustering of the index-coded samples was performed on a cBot Cluster Generation System using the TruSeq PE Cluster Kit v3-cBot-HS (Illumina) according to the manufacturer's instructions. After cluster generation, the library preparations were sequenced on an Illumina Hiseq platform and 150 bp paired-end reads were generated.

Quality control
In order to guarantee data quality in our analysis, a Perl script was used to filter the Raw Data. The steps were as follows: The clean data were obtained by removing reads containing adapter, reads containing ploy-N and low quality reads from raw data.

Reads mapping to the reference genome
The reference genome and gene model annotation files were directly downloaded from the www.ensembl.org.
The index of the reference genome was built using Bowtie v2.2.3, and paired-end clean reads were aligned to the reference genome using TopHat v2.0.12 [54,55]. TopHat was selected as the mapping tool because it can generate a database of splice junctions based on the gene model annotation file and thus provides a better mapping result than other non-splice mapping tools [56][57][58].

Quantification of gene expression
HTSeq v0.6.1 was used to count the read numbers mapped to each gene [59]. Then, the expected number of Fragments Per Kilobase of transcript sequence per Million base pairs sequenced (FPKM) of each gene was calculated based on the length of the gene and read count mapped to the gene. The FPKM simultaneously considers the effect of the sequencing depth and gene length for the read count, and is currently the most commonly used method for estimating gene expression levels [60].

Differential expression analysis
Differential expression analysis of two stages (three biological replicates per stage) was performed using the DESeq R package (1.18.0). DESeq provides statistical routines for determining differential expression within digital gene expression data using a model based on the negative binomial distribution. The resulting P-values were adjusted using the Benjamin and Hochberg's approach for controlling the false discovery rate. Genes with an adjusted P-value < 0.05 found by DESeq were assigned as differentially expressed [61,62].

GO and KEGG enrichment analyses of DEGs
GO enrichment analysis of DEGs was implemented by the GOseq R package, in which the gene length bias was corrected. GO terms with corrected P-value < 0.05 were considered significantly enriched DEGs [63]. KEGG is a database resource for understanding high-level functions and utilities of biological systems, such as the cell, the organism and the ecosystem, based on molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other highthroughput experimental technologies (http://www.genome.jp/kegg/) [64]. We used KOBAS software to test the statistical enrichment of DEGs in the KEGG pathways [65].

Differential gene transcription factor analysis
Putative transcription factors were analyzed using the animal transcription factor database AnimalTFDB 2.0 [66]. For genes included in the database, the transcription factors were directly screened by Ensembl geneID; Non-Ensembl geneID genes were screened through the horse transcription factor protein sequence database by BLASTX.

Novel transcript prediction and alternative splicing analysis
The Cufflinks v2.1.1 Reference Annotation Based Transcript (RABT) assembly method was used to construct and identify both known and novel transcripts from TopHat alignment results. Alternative splicing events were classified by the software rMATS (http://rnaseqmats.sourceforge.net/index.html). The number of AS events in each sample was estimated separately.

qRT-PCR validation
A piece of testis was excised and frozen immediately in liquid nitrogen for qRT-PCR analysis. Total RNA was extracted using TRIzol™ Reagent (Invitrogen) according to the manufacturer's protocols. About 0.5 μg of total RNA was used as template to synthesize first-strand cDNA using a miScript II RT Kit (QIAGEN) according to the manufacturer's protocols. The resultant cDNA was diluted to 0.1 μg/μL for further analysis by the MX3000P Real-Time PCR System (Agilent Technologies) using SYBR® Premix Ex Taq™ II(TaKaRa). β-actin was chosen as an internal reference gene to eliminate sample-to-sample variations based on evidence of its role as a constitutive house-keeping gene for qRT-PCR and its use in normalizing changes in specific gene expressions [67]. The relative gene expression levels were calculated using the 2 −ΔΔCt method. The DEGs between ASM and BSM testis were analyzed by t-test.