Transcriptome profiling of developing testes and spermatogenesis in Mongolian horse

Introduction: The development of horse testis and spermatogenesis is a complex physiological process. Methods: To study those physiological processes, 3 immature and 3 mature testes of Mongolian horse were collected, and six libraries were established using a high-throughput RNA sequencing technology (RNA-Seq) to screen for genes that were related to Mongolian horse testis development and spermatogenesis. Results & Discussion: A total of 16,237 upregulated genes and 8,641 downregulated genes in the testis of Mongolian horse were detected. These genes play important roles in different developmental stages of spermatogenesis and testicular development. Five alternative splicing (AS) event genes were detected, and different AS events can also influence both spermatogenesis and developing of testis. GO (Gene ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analyses were performed for functional annotation of the differentially expressed genes during testis development and spermatogenesis. For example, oocyte meiosis pathways were significantly involved in different stages of testis development and spermatogenesis. Conclusion: These genes were associated with spermatogenesis, male gamete generation, spermatid development, and oocyte meiosis.The finding that gene is a vital element in horse testis development improves our understanding of horse testis development and spermatogenesis.

Spermatogonial stem cells differentiate into type B spermatogonia and divide into primary spermatocytes. Secondary spermatocytes and spermatids derived from the primary spermatocytes of DNA replication and tetraploid meiosis. Haploid sperm cells undergo at least four distinct morphological changes into sperm cell, namely nuclear pyknosis, acrosome formation, flagellar formation, and cytoplasmic reduction. Spermatogenesis involves not only the spermatogenic cells of testis but also multiple cells in testicular tissues such as Sertoli cells and Leydig cells [1]. Related genes play an important role in the specific stage of germ cell development, and the transcription and translation of these genes have specific expression in different stages of spermatogenesis [2,3].
Spermatogenesis is a complex developmental process with a unique cell division, meiosis, as a major defining event. In mammalian testes, spermatogenesis occurs within seminiferous tubules, where all germ cells associate with one kind of somatic cell, namely, Sertoli cells, which provide the appropriate niche and microenvironment for spermatogenesis [4]. Increasing compelling investigations have attempted to investigate the physiological process of testis development and spermatogenesis in Mongolian horse through the RNA sequencing technology [5]. Ramsköld [6] assessed different mammalian tissues by RNA-Seq and found that a vast majority of genes are specifically expressed in the testis. Anand [7] identified differentially expressed genes (DEGs) in human testis by microarray analysis. A total of 2,868 upregulated and 2,011 downregulated transcripts in the human testis relative to other normal tissues were identified. The testis thus seems to be highly metabolically active relative to other normal tissues as indicated by functional annotation.
Currently, most of the studies on the relationship between testis and spermatogenesis focus on human and mouse. Djureinovic [8] classified 20,050 putative human genes into categories based on expression patterns by RNA-Seq. Those genes were expressed specifically in normal human testicular tissues from seven individuals and compared to 26 other normal human tissue types. The analysis showed that the testis has the highest number of tissue-specific genes by far. Gong [1]  Mongolian horse is one of the most famous native breeds in China and even in the world, it is a dual-purpose type breeding and can adapt to harsh climate and extensive breeding conditions. The normal development of testis and spermatogenesis to protect population continuity and to produce excellent male semen play a very important role, and genes take part in different stages of testis development and spermatogenesis. Unfortunately, the transcriptional expression of genes during testicular development and spermatogenesis is not well understood, thereby limiting the understanding of the nature of gene expression and the mechanism of deformation and maturation in spermatogenesis. In addition, horse testis transcriptome sequencing has not been performed, to date. This study aimed to study the transcriptome profiles of immature and mature male Mongolian horse testes utilizing a high-throughput RNA-Seq technology and bioinformatics analysis. A testis transcript database was obtained, and DEGs and regulatory pathways were analyzed. This study provides new insights into gene expression and the regulation of the horse testis development and spermatogenesis.

Ethics statement
This study was conducted according to the guidelines of the declaration of Institutional Animal Ethics Committee and Animal Care Guidelines of the Inner Mongolia Agricultural University. All procedures involving animal subjects were approved by the Institutional Animal Ethics Committee and Animal Care Guidelines of the Inner Mongolia Agricultural University. The animals did not unnecessarily suffer at any stage of this study.

Animals and sample collection
We received permission from the owner to geld six healthy male Mongolian horses in Xilingol League, Inner Mongolia, China. The ages of the horses were determined based on a physical examination of their teeth, and on information from the owner. Three were sexually immature colts (samples BSM1-3), between 11 and 13 months old. The other three horses were sexually mature stallions, between three and four years old (samples ASM1-3). We surgically collected the testes of all six horses. Removed testes were stored in cryogenic vials with an RNA/DNA sample protector (Takara). Small samples (~1 mg) from the testes of each animal were immediately frozen in liquid nitrogen for quantitative real-time polymerase chain reaction (qRT-PCR) analysis. All horses were reared by their owner after our study.
RNA quantification and qualification RNA degradation and contamination were monitored on 1% agarose gels. RNA purity was assessed using the NanoPhotometer® spectrophotometer (IMPLEN index codes were added to attribute sequences to each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was conducted using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5X). First-strand cDNA was synthesized using random hexamer primer and M-MuLV reverse transcriptase (RNaseH-). Second-strand cDNA synthesis was subsequently performed using DNA polymerase I and RNase H. In the reaction buffer, dNTPs with dTTP were replaced by dUTP. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of the 3' ends of the DNA fragments, NEBNext Adaptors with hairpin loop structure were ligated to prepare for hybridization. To select cDNA fragments of preferentially 150~200 bp in length, the library fragments were purified with AMPure XP system (Beckman Coulter). Then 3 µL of USER Enzyme (NEB) were used with size-selected, adaptor-ligated cDNA at 37°C for 15 min, followed by 5 min at 95°C before PCR. Then, PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers, and Index (X) Primer. Finally, the products were purified (AMPure XP system), and library quality was assessed on the Agilent Bioanalyzer 2100 system.

Clustering and sequencing
The clustering of the index-coded samples was performed on a cBot Cluster Generation System using TruSeq PE Cluster Kit v3-cBot-HS (Illumina) according to the manufacturer's instructions. After cluster generation, the library preparations were sequenced on an Illumina Hiseq platform and 125-bp/150-bp paired-end reads were generated.

Data Analysis Quality control
Raw data (raw reads) of fastq format were first processed through in-house Perl scripts. In this step, clean data (clean reads) were obtained by removing reads containing adapter, reads containing poly-N, and low-quality reads from the raw data. At the same time, the Q20, Q30, and GC content of the clean data were calculated. All the downstream analyses were based on high-quality clean data.

Reads mapping to the reference genome
Reference genome and gene model annotation files were directly downloaded from the genome website. Index of the reference genome was built using Bowtie v2.

Differential expression analysis
Differential expression analysis of two conditions/groups (two biological replicates per condition) was performed using the DESeq R package (1.18.0). DESeq provide statistical routines for determining differential expression in digital gene expression data using a model based on the negative binomial distribution. The resulting P-values were adjusted using the Benjamini and Hochberg's approach for controlling the false discovery rate.
Genes with an adjusted P-value <0.05 found by DESeq were assigned as differentially expressed [17,18].

Gene Ontology and KEGG enrichment analyses of differentially expressed genes
Gene ontology (GO) enrichment analysis of differentially expressed genes was implemented by the GOseq R package, in which gene length bias was corrected. GO terms with corrected P-value < 0.05 were considered significantly enriched by differential expressed genes [19,20]. KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies (http://www.genome.jp/kegg/). We used the KOBAS software to test the statistical enrichment of differential expression genes in the KEGG pathways [21].

Novel transcript prediction and alternative splicing analysis
The Cufflinks v2.1.1 Reference Annotation Based Transcript (RABT) assembly method was used to construct and identify both known and novel transcripts from TopHat alignment results. Alternative splicing events were classified to 12 basic types by the software Asprofile v1.0. The number of AS events in each sample was estimated, separately. PPI analysis of differentially expressed genes PPI analysis of differentially expressed genes was based on the STRING database, which consists of known and predicted proteinprotein interactions. For the species existing in the database, we constructed the networks by extracting the target gene list from the database. Otherwise, Blastx (v2.2.28) was used to align the target gene sequences to the selected reference protein sequences, and then the networks was built according to the known interaction of selected reference species.

Quantitative real time PCR (qRT-PCR) validation
A piece of testis was excised and frozen immediately in liquid nitrogen for subsequent qRT-PCR analysis. Total RNA was extracted using TRIzol (TaKaRa) according to the manufacturer's protocols. The total RNA was resuspended in nuclease-free water, and the concentration was measured using NanoDrop (Thermo Scientific Nanodrop 2000). About 0.5 μg of total RNA was used as template to synthesize first-strand cDNA using a PrimerScript RT reagent kit (TaKaRa) according to the manufacturer's protocols. The resultant cDNA was diluted to 0.1 μg/μL for further analysis by qRT-PCR (Bio-Rad) using a SYBR Green Realtime PCR Master Mix (TaKaRa). β-Actin was chosen as an internal reference gene to eliminate sample-to-sample variations. The primers of tested genes were designed by Primer 5.0 according to alternative splicing events of those genes. The relative gene expression levels were calculated using the 2 − ΔΔCt method. The differentially expressed genes between ASM and BSM testis were analyzed by ANOVA using SAS software 9.0.

mRNA expression profiles in Mongolian horse testis development
To screening for genes related to testis development and spermiogenesis, a total of six libraries, BSM1. BSM2, BSM3, ASM1, ASM2, and ASM3 were constructed and sequenced by Illumina Hiseq platform. The major information of six libraries is listed in Table 1 BSM3, ASM1, ASM2. and ASM3 libraries. Furthermore, more than 40% of clean reads were non-splice reads in every sample. These results showed that the six libraries were of high quality.
The clean reads were aligned to the reference genome of E. caballus using TopHat v2.0.12 (Table 2). In total, 84.85%, 85.05%, 84.25%, 84.38%, 84.14%, and 84.23% of clean reads from these above-mentioned libraries were uniquely mapped to the reference genome. In addition, more than 40% of clean reads were non-splice reads in each sample. However  Table 2.

Analysis on differentially expressed genes
The differentially expressed genes between libraries were screened based on DEGSeq analysis taking |log 2 (fold-change)|> 1 and P < 0.05 as the cut-off. Taking a viewpoint on ad-joining libraries, 16,237 up-and 8,641 downregulated in total genes (Table S1, Fig. 1a).
Venn diagrams were generated using their data, which depicted that these two groups shared 16,327 genes (Fig. 1b). Figure 1c shows the results of hierarchical clustering, with the differentially expressed genes divided into six libraries with two clusters.
Analysis on GO of differentially expressed genes GO and pathway enrichment analyses were used to explore the functions of the differentially expressed genes in testis development. GO enrichment analysis identified a total of 602 GO terms related to biological processes, molecular functions, and cellular components were significantly enriched across the abovementioned three different DEG groups (Table S2, upregulated GO terms; Table S3, downregulated GO terms, Fig. 2a).
KEGG pathway enrichment analysis of differentially expressed genes KEGG pathway enrichment analysis was performed on these differently expressed genes and a cut-off criterion of corrected P-value < 0.05 was also used. A total of 10 enriched pathways were detected (Table S4, upregulated KEGG pathways; Table S5,  Quantitative real time PCR (qRT-PCR) validation of differential expression To verify the differential expression of genes in ASM and BSM testis, we selected eight differentially expressed genes, including MEI1 (two different sites), EHMT2, BTRC, CLOCK, IQCG, ANAPC5, SKIP, and SMC1B to validate the expression patterns by qRT-PCR. Those genes were belonging to oocyte meiosis pathways and GO term spermatogenesis, male gamete generation, and spermatid development. Meanwhile, the AS events were detected in those eight genes. There was a relatively high correlation between the mRNA expression levels of the eight genes tested as detected by qRT-PCR and RNA-Seq (Fig. 4).
For these eight genes, the mRNA expression levels determined by qRT-PCR and RNA-Seq were significantly correlated (correlation coefficient = 0.853; P < 0.05). This suggested that our RNA-Seq results were reliable.

Discussion
Currently, with the development of detection techniques, a growing number of sperm mRNAs have been identified. RNA-Seq technology has become one of the efficient and inexpensive technologies for the discovery of novel transcripts and genes. In our study, the three ASM testes and three BSM testes were used as research subjects, 24,878 genes were screened by RNA-Seq, 16,236 genes were upregulated, and 8,641 genes were downregulated. Using the next-generation platform, we found that most of the highly were upregulated in BSM. However, in our study, the SOX5, SOX6 SOX10 and SOX30 genes were upregulated in ASM. Every family member may have its own function in determining sex, and thus, further investigations are warranted. The testicular hormone Insl3 (insulinlike 3), which is highly expressed during BMS, plays a crucial role in mouse gubernacular development and testis descent [23]. Some genes were upregulated in ASM, such as SPATA3, SPATA45, SPATC1, SPATA18, SPEM1, SPERT, SHBG, TEX35, TEX43, GSG1, and SPATA19. Among those genes, SPATA3, SPATA45, SPATC1, SPATA18, and SPATA19 are defined to be related to spermatogenesis; TEX35 and TEX43 are associated with testis expression; GSG1 is related to germ cell; SPEM influences spermatid maturation; and SHBG controls sex hormone-binding globulin. In addition, most of the novel genes detected were related to spermatogenesis.
AS is a crucial mechanism for gene regulation and in generating proteomic diversity.
Recent estimates indicate that the expression of nearly 95% of human multi-exon genes involves AS [24,25]. In metazoans, AS plays an important part in generating different protein products that function in diverse cellular processes, including cell growth, differentiation, and death [26]. In our study, five kinds of AS events were detected, most of them centered upon the SE event. The SE events and the results of GO and KEGG analyses were comprehensively analyzed to determine the influence of AS events on the function of related genes. In our study IQ motif-containing G (IQCG) belongs to SE events, GO term spermatogenesis, male gamete generation, and spermatic development. The causality of the mutation was confirmed with a targeted null allele. Loss of IQCG disrupts spermiogenesis wherein tail formation is either incomplete or breaks apart from the sperm heads. Because IQ motif-containing genes typically regulate calmodulin (CaM), which in turn can impact the actin cytoskeleton, these findings suggest a potential role for localized calcium signaling in sperm flagellum morphogenesis, Ca 2+ signaling is important for regulating sperm motility, most notably in this context for flagellar function and regulation [6, [27][28][29][30][31][32]. Exon skipping restores the reading frame within a gene. In our study, IQCG incurred SE events from 33,961,411 bp to 33,961,540 in ASM, whereas IQCG was upregulated in ASM. SE event also occurred from 38,149,431 to 38,149,546 of the meiotic double-stranded break formation protein 1 (MeI1) gene in ASM. MeI1 was upregulated in ASM and is related to the GO terms spermatogenesis, male gamete generation, and sperm development. MEI1 is expressed specifically in the testis and the first meiosis-specific mutation identified by forward genetic approaches in mammals.
In this study, the three upregulated GO terms that were significantly enriched were related to spermatogenesis, male gamete generation, and spermatid development. In this study, some genes belonged to the three terms such as SLC26A8, MEI1, EHMT2, DPY19L2, HIST1H2BA, CAPZA3, TSSK2, TSSK1B, KDM3A, ALMS1, and IQCG. Testis anion transporter 1 is a protein that, in humans, is encoded by the SLC26A8 gene [36,37]. This gene is expressed primarily in spermatocytes. Two transcript variants encoding different isoforms have been reported [38-41]. It acts as a DIDS-sensitive anion exchanger that mediates chloride, sulfate, and oxalate transport. It also has critical anion exchange functions in male germ line during meiosis, and hence, may play a role in spermatogenesis [42]. The DPY19L2 protein encoded by this gene belongs to the dpy-19 family. It is highly expressed in testis and is required for sperm head elongation and acrosome formation during spermatogenesis. Mutations in DPY19L2 are associated with an infertility disorder, spermatogenic failure type 9 (SPGF9) [43-50] TSSK2 (testis specific serine kinase 2), and TSSK1B (testis-specific serine/threonine-protein kinase 1) are required during spermatid development. DPY19L2 is also involved in the late stages of spermatogenesis and the reconstruction of the cytoplasm. During spermatogenesis, TSSK2 is highly expressed in the testis, because it is required for the transformation of a ring-shaped structure around the base of the flagellum that originates from the chromatoid body [51-56]. Based on the functions of the genes listed below it is clear that these genes are closely related to spermatogenesis. However, the regulatory mechanism of genes involved in spermatogenesis in horse remains unclear and thus requires further investigation.
In our study, the upregulated six KEGG pathways were detected. One of the pathways, namely, oocyte meiosis, was thought to be related to spermatogenesis. Meiosis plays a critical role in the formation of sperm. In males, meiosis occurs during spermatogenesis in the seminiferous tubules of the testicles. Meiosis during spermatogenesis is specific to a type of cell called spermatocytes that will later mature into spermatozoa. The 47 genes in the oocyte meiosis pathways were detected, their functions influence meioses I and II. genes were upregulated in meiosis II. The G2/mitotic-specific cyclin-B1 (CCNB) gene is a protein encoded by the CCNB1gene in humans [57]. The CCNE protein belongs to the highly conserved cyclin family, whose activity is required for cell cycle G1/S transition. This protein accumulates during the G1-S phase and is degraded as cells progress through the S phase [58]. PLK1 is an early trigger for G2/M transition, and supports the functional maturation of the centrosome in late G2/early prophase. PLK1 phosphorylates and activates components of the anaphase-promoting complex (APC). The function of APC can maintain cohesion of sister chromatids, and anaphase inhibitors. [59]. There is evidence that PLK1 might regulate during human meiosis S [60]. Those genes effected the G1-S, G2/M phase in meiosis II. Otherwise, alternate transcription initiation sites exist in those genes. What changes in function were caused by AS? Our research team plans to conduct further investigations in the near future.
The BTRC, CUL1, AURKA, FBXO43, ANAPC11, CDC27, CDC20, ANAPC5, ESPL1, SMX1B, and REC8 genes influence both meioses I and II. Aurora kinase a, also known as serine/threonine-protein kinase 6, is an enzyme that, in humans, is encoded by the AURKA gene [ 61,62]. Aurora A is a member of a family of mitotic serine/threonine kinases. It is implicated with important processes during mitosis and meiosis. [63], and its activity peaks during the G2 phase to M phase transition in the cell cycle [64]. Meiotic recombination protein REC8 homolog is a protein that, in humans, is encoded by the REC8 gene [65,66]. Rec8 is a meiosis-specific component of the cohesin complex that binds sister chromatids in preparation for the two divisions of meiosis.

Conclusions
In our study, six mRNA libraries were constructed and sequenced with porcine testes from ASM and BSM. A total of 16,237 upregulated and 8,641 downregulated genes were identified; most genes belonged to biological progress and participated in spermatogenesis and testis development of Mongolian horse. We analyzed a large number of genes related to spermatogenesis and somatic development of the horse testis, which play important roles in different developmental stages. We also found that the AS events might influence the function of those genes and indirectly affect the spermatogenesis and development of the horse testis. These findings should help us to further understand how gene expression is regulated during testis development and spermatogenesis. Our research will mainly focus on some related genes through experimental methods, such as the influence of AS events regulate function of testis development and spermatogenesis-related genes, or combine other experimental results, such as miRNA and piRNA identified, expecting to provide more fundamental information in understanding the regulatory mechanisms of equine testis development or spermatogenesis at the molecular level.

Abbreviations
Not applicable Ethics approval and consent to participate All procedures involving animals were approved and authorized by the Inner Mongolia Agricultural University. The animals used in our study were consented by their owners.

Availability of data and material
The accession number for the raw and processed profiling by highthroughput RNA sequencing technology reported in this paper is GEO: GSE101697.

Consent for publication
Not applicable

Competing interests
The authors declare no competing interests.  5. Reads map to '+'Reads map to '-'The comparison of sequence alignment to positive and negative chains in genome.
6. Splice readsmeans reads mapped to the border of exon, also called junction reads.
Non-splice readsmeans reads for the entire sequence are mapped to one exon.     Scatter plot of differential gene KEGG Note: The vertical axisaxis represents the name of the pathway the horizontal axis represents the Rich factor. The differential size of points show number of differential expression genes, and the color of points mean different range of Qvalue.