Hox genes reveal genomic DNA variation in tetraploid hybrids derived from Carassius auratus red var. (female) × Megalobrama amblycephala (male)

Background Allotetraploid F1 hybrids (4nF1) (AABB, 4n = 148) were generated from the distant hybridization of Carassius auratus red var. (RCC) (AA, 2n = 100) (♀) × Megalobrama amblycephala (BSB) (BB, 2n = 48) (♂). It has been reported that Hox gene clusters are highly conserved among plants and vertebrates. In this study, we investigated the genomic organization of Hox gene clusters in the allotetraploid F1 hybrids and their parents to investigate the polyploidization process. Results There were three copies of Hox genes in the 4nF1 hybrids, two copies in RCC and one copy in BSB. In addition, obvious variation and pseudogenization were observed in some Hox genes from 4nF1. Conclusion Our results reveal the influence of polyploidization on the organization and evolution of Hox gene clusters in fish and also clarify some aspects of vertebrate genome evolution. Electronic supplementary material The online version of this article (10.1186/s12863-017-0550-2) contains supplementary material, which is available to authorized users.


Background
Polyploidization is a widespread mechanism for speciation in eukaryotes, especially plants and vertebrates [1][2][3][4][5]. Polyploids with duplicated genomes may originate from a single species (autopolyploidy) or from different species through interspecific hybridization (allopolyploidy) [6]. Allopolyploids are prevalent in nature, suggesting there is an evolutionary advantage to obtaining multiple sets of genetic material for adaptation and development [7]. However, the molecular mechanisms underlying the processes and consequences of allopolyploidy remain unclear [8]. Polyploidy is relatively rare in animals compared with plants, and the influence of polyploidization on intragenomic variation in polyploid animals is poorly understood. In our earlier study, we successfully obtained fertile tetraploid hybrids from Carassius auratus red var. (RCC)(♀) × Megalobrama amblycephala (BSB)(♂) [9,10]. RCC has 100 chromosomes and belongs to the Cyprinidae subfamily, while BSB has 48 chromosomes and belongs to the Cultrinae subfamily [11]. These new polyploid hybrids represent unique specimens for studying genomic changes in F 1 hybrids and could significantly contribute to our understanding of evolution.
Hox genes, a set of important developmental regulatory genes, are highly conserved and typically organized cluster [12]. In vertebrates, Hox genes encode two exons, and the highly conserved homeodomain (60 aa) is encoded by the second exon [13]. Recent research has shown that gene duplication, sequence variation, and selective pressure played crucial roles in the origin and evolution of Hox genes [14]. The earliest indications of genome duplication came from the comparative analysis of Hox genes and clusters from different chordate lineages [15][16][17][18]. In general, polyploidization plays an important role in fish evolution [19]. The purpose of this research was to study the effects of allopolyploidization on Hox gene organization and evolution. In this article, three distinct Hox duplicates were observed in the 4nF 1 genome, compared with two copies in RCC and one copy in BSB. Our data reveal the genetic variation and evolutionary characteristics of the Hox gene family in 4nF 1 and provide new insights into their evolutionary patterns.

Results
Sequence information for RCC, BSB and 4nF 1 clones Using 11 pairs of degenerate primers (Additional file 1: Table S1), we obtained partial sequence information for eight putative Hox genes from RCC, four putative Hox genes from BSB, and 32 putative Hox genes from the 4nF 1 . All these fragments were between 700 and 1500 bp long and included the exon1-intron-exon2 region ( Table 1). To avoid biased amplification of only one Hox gene copy, we selected 20 clones of each gene from 4nF 1 , 20 clones of each gene from RCC and 80 clones from BSB (20 clones for each Hox gene PCR fragment). All fragments from RCC, BSB and the 4nF 1 were confirmed to be Hox gene sequences, and each included the homeobox. All Hox sequences have been submitted to GenBank; their accession numbers are listed in Table 1.

Molecular organization of the Hox gene sequence
We comparatively analysed the inferred amino acid sequences of the Hox genes in 4nF 1 with those in zebrafish, fugu, medaka, and BSB (Additional file 1: Table S2), which indicated that the 4nF 1 sequences were similar to those of the other species. The organization of the Hox clusters in 4nF 1 is shown in Fig. 1. The clusters can be summarized as HoxAai, HoxAaii, HoxAaiii, HoxAbi, HoxAbii, HoxAbiii, HoxBai, HoxBaii, HoxBaiii, HoxBbi, HoxBbii, HoxBbiii, HoxCai, HoxCaii, HoxCaiii, HoxCbi, HoxCbii, HoxCbiii, HoxDai, HoxDaii, and HoxDaiii (Table 1). Among these copies, we found that HoxD4aiiiΨ, HoxD9aΨ, and HoxD10aΨ in 4nF 1 were pseudogenes (Fig. 2). Two deletions at codons 316 and 317 in the coding region of HoxD4aiiiΨ suggested that it was a pseudogene. The alignment of the putative HoxD4a sequences is shown in Fig. 2a. HoxD9aΨ has become a pseudogene because a stop codon prematurely terminates expression of the full-length functional product (Fig. 2b). An insertion was observed at codon 593 in the HoxD10aΨ coding region; alignment of the putative HoxD10a duplicated sequences is shown in Fig. 2c. HoxD10aΨ had an inserted G nucleotide compared with HoxD10aiii, whereas a T in HoxD4aΨ was replaced by a G compared with HoxD4ai. Thus, non-functionalization is a possible fate for some duplicated Hox genes. The GC levels of the pseudogenes tended to be lower than that of their counterpart genes (Additional file 1: Table  S3). For instance, in 4nF 1 , the exons of the pseudogene HoxD4aiiiΨ exhibited a GC content of 50.1%, which was lower than that of its functional counterparts HoxD4ai and HoxD4aii (51.3%, 52.1%). As shown in Additional file 1: Table S3, the exon GC content of the pseudogene HoxD10aiΨ was 49.4%, which was lower than those of its putative functional counterparts HoxD10aii and Hox-D10aiii (49.6% and 49.9%, respectively) in 4nF 1 . Similarly, the exon GC content of the pseudogene HoxD9aΨ (43.3%) was slightly lower than that of its putative functional HoxB1b paralogues (50.1%, 50.2%, and 50.2%). During duplication, one copy typically remains functional, whereas the other copy may lose its function, which generally leads to a decreased GC level for the non-functional gene.

Phylogenetic relationships
For most genes, such as HoxA4a, HoxB1b, and HoxD10a, three distinct orthologues of the zebrafish genes were identified in 4nF 1 . These duplicated genes shared a high identity percentage for the deduced amino acid sequences (Additional file 1: Tables S2 and S3). An identity analysis of the putative amino acid sequences suggested that the duplicated sequences were more closely related to each other than to the reported zebrafish orthologues except for the HoxC4aiii sequences. For instance, the percentage nucleotide identity between the HoxA11bi, HoxA11bii, and HoxA11biii orthologues from 4nF 1 and HoxA11b from zebrafish was only 89.9%, 89.9%, and 92.4%, respectively. Conversely, the identity between the paralogues HoxA11bi and HoxA11bii, Fig. 1 Hox cluster architecture in 4nF 1 compared with zebrafish. We identified a total of 32 Hox genes. Nine Hox genes were present in three copies, one Hox gene was present in four copies, and one was present as a single copy in 4nF 1 . Copies of the HoxD9a, HoxD4a, and HoxD10a genes were pseudogenes. Black boxes represent Hox genes from Danio rerio, and "E" refers to EVX (even-skipped related gene). Aa, Ab, Ba, Bb, Ca, Cb, Da and Db refer to classes of genes HoxA11bi and HoxA11biii, and HoxA11bii and HoxA11biii in 4nF 1 was 98.6%, 96.4%, and 96.0%, respectively (Additional file 1: Table S2 and Fig. 2a). The identity between HoxB1bi and HoxB1bii, HoxB1bi and HoxB1biii, and HoxB1bii and HoxB1biii was 99.5%, 95.7% and 96.2%, whereas the similarity to their zebrafish orthologues was 91.0%, 90.6% and 91.5% (Additional file 1: Table S2 and Fig. 3b). These results showed that Hox-A11bi, HoxA11bii, and HoxA11biii as well as HoxB1bi, HoxB1bii and HoxB1biii all share a mostly closed ancestral cluster and are true orthologues of the zebrafish genes HoxA11b and HoxB1b. Analysis of the sequences obtained for HoxC4a suggested that four distinct copies of this gene exist in 4nF 1 , which were named HoxC4ai, HoxC4aii, HoxC4aiii and HoxC4a-1. The putative amino acid sequence of HoxC4a-1 shares approximately 100%, 100% and 99% similarity to those of HoxC4ai, HoxC4aii, and HoxC4aiii, respectively. However, the nucleotide similarity to all three sequences is 100%, which suggests the mutation was synonymous.
To evaluate the speciation of 4nF 1 , the nucleotide identity percentages among all known representatives of the HoxA4a, HoxA9a, HoxA2b, and HoxD4a gene groups in RCC, BSB, and the 4nF 1 were examined (Table 2, Fig. 4). The identities of orthologous 'i' or 'ii' genes between 4nF 1 and RCC were much higher than those between 4nF 1 and BSB. For example, the nucleotide identity percentages of the orthologous HoxA4ai, HoxA9ai, HoxA2bi, and HoxD4ai genes between 4nF 1 and RCC were 99.5%, 99.4%, 99.6% and 99.6%, respectively. Conversely, the similarity of these genes between 4nF 1 and BSB was 97.0%, 92.3.0%, 97.2%, and 93.7%, respectively. Although similarly high identity was observed, the 'iii' gene in 4nF 1 did not exhibit higher similarity to the gene in RCC or BSB for all four Hox sequence groups, suggesting no obvious orthologous relationship between the two species. Thus, we speculated that the 'iii' genes were variants of RCC or BSB genes. For example, the HoxA4aiii, HoxA9aiii, HoxA2biii, and HoxD4aiii genes from 4nF 1 and the HoxA4a, HoxA9a, HoxA2b, and HoxD4a genes from BSB shared 98.0%, 92.2%, 97.7%, and 94.0% identity ( Table 2).

Discussion
The structure of cloned Hox gene sequences Prior PCR surveys and genomic library screening have identified interesting variability in Hox gene content among teleosts [12,15,16,20,21]. Luo et al. [22] estimated there were 14-16 Hox gene clusters in goldfish. Our data suggested 18-21 Hox gene clusters were present in 4nF 1 , with each was located on a different acrocentric chromosome. The Hox gene clusters in 4nF 1 were approximately the sum of the clusters in RCC and BSB, except that some clusters were lost. The topology of the Hox gene maximum likelihood tree (Fig. 3) further suggested that some of the Hox genes orthologous to zebrafish genes were present in two copies in RCC, one copy in BSB, and three copies in 4nF 1 . However, the third copy did not exhibit notably higher similarity to the gene in RCC or BSB. We speculated that variation and reorganization of the genome likely occurred during polyploidization, resulting in new copies in 4nF 1 . This might be evidence that allopolyploidization induces a variety of rapid genomic changes in a 4nF 1 population [23,24]. Using sequence alignment in 4nF 1 , we isolated 32 fragments that can be characterized as HoxA, HoxB, HoxC, and HoxD family genes. However, amplified RCC and BSB DNA were only characterized as the HoxA and HoxD genes. We speculated that the increase in the number of 4nF 1 genes might be related to polyploidization. This situation was also observed in our previous study [25,26]; the number of 4nF 1 fragments increased, and some genes from RCC and BSB were lost. At present, although we have no precise data explaining this outcome, we speculate that allotetraploidization might lead to rapid changes in 4nF 1 genome diversity. Our study is the first to evaluate the organization of Hox clusters in a 4nF 1 population. This theory is also strongly supported by other studies examining Hox genes [22], other gene families [27], and DNA content [28].

The significance of polyploidization
Polyploidization likely increases genomic variation rates, which can result in the formation of new polyploid species [29]. First, the process of polyploidization can itself generate species that are reproductively isolated from their diploid progenitors, increasing the number of species as a by product. For example, several studies have indicated that a polyploidization event occurred in an ancestor of teleost fish shortly after this lineage diverged from the lineage leading to tetrapods [30][31][32]. Second, an entirely different trait can result in increased rates of polyploidization [6]. Synonymous mutations increase genomic variation. For example, the putative amino acid sequence of HoxC4a-1 shares approximately 100%, 100%, and 99% similarity with those of Hox-C4ai, HoxC4aii, and HoxC4aiii, respectively. The identity of their nucleotide sequences is 100%. In the polyploidization process, genome duplication produces abundant genomic DNA, so the organism maintains the dosage balance or rapidly stabilizes the duplicated genomes via retention/ exclusion of redundancy. Lynch et al. [33] suggested there are three outcomes in the evolution of duplicate genes: non-functionalization, neo-functionalization and subfunctionalization. Interestingly, we found some pseudogenes in 4nF 1, such as HoxD4aiiiΨ, HoxD9aΨ and HoxD10aΨ. Pseudogenes are formed either by random mutations that create stop codons and prematurely terminate full-length functional product expression or by insertions/ deletions that shift the reading frame, rendering the translated protein non-functional. We speculate that dosage effects generated selection pressure from the loss of Hox genes or the formation of pseudogenes after whole genome duplication. This pressure is consistent with the expectation that there are Hox clusters in the 4nF 1 genome that have lost functional Hox genes due to the reduction of redundancy following the polyploidization event. However, 4nF 1 required genetic recombination, mutation, and pseudogenization to reduce the amount of incompatible genetic material and improve fertility [34]. Thus, we unexpectedly Values before slashes (/) denote nucleotide identity, values after slashes denote amino acid identity obtained autotetraploids with greater fertility among the 4nF 1 progeny, and we successfully established an autotetraploid fish line [35]. Our characterization of the Hox gene clusters in tetraploid hybrids improves our understanding of the evolutionary processes occurring after Hox gene duplication in vertebrates.

Conclusions
We identified three copies of Hox genes in 4nF 1 , two copies in RCC and one copy in BSB. In addition, obvious variation and pseudogene generation were observed in some 4nF 1 Hox genes. These results reveal the effects of polyploidization on the organization and evolution of Hox gene clusters in fish and also help to clarify aspects of vertebrate genome evolution.  Fish treatments were carried out according to the regulations for protected wildlife and the Administration of Affairs Concerning Animal Experimentation, and approved by the Science and Technology Bureau of China. Approval from the Department of Wildlife Administration was not required for the experiments conducted in this paper. The fish were deeply anesthetized with 100 mg/L MS-222 (Sigma-Aldrich, St Louis, MO, USA) before dissection. Narcotic drugs was fed before blood sampling. Total genomic DNA was isolated from peripheral blood cells using the standard phenol chloroform extraction procedures described by Sambrook et al. [36].

Cloning and sequencing of Hox genes
We amplified fragments of Hox genes from genomic DNA by PCR amplification using several combinations of degenerate primers (

Sequence comparison and analysis
Sequence homology and variation among the fragments amplified from RCC, BSB and the 4nF 1 were analysed in BioEdit [37,38]. Partial DNA sequences for each gene were verified using a BLASTx search. To increase the probability of detecting duplicated paralogues and circumventing PCR errors, we sequenced 20 clones for each gene from 4nF 1 , RCC and BSB. The obtained sequences were screened for Hox gene fragments using different BLAST searches (BLASTn, BLASTp, and BLASTx) against GenBank (http://www.ncbi.nlm.gov/ Blast.cgi). Then, we evaluated the organization of the 4nF 1 Hox clusters compared to RCC and BSB to characterize the Hox genes.

Phylogenetic analysis
Using Clustal X 1.81, the derived amino acid sequences of these fragments were aligned with the Hox genes from BSB, zebrafish, fugu, medaka and other teleosts retrieved from GenBank [38]. Regions of sequences that were difficult to align were removed from the alignment. Gaps were also removed from the alignment. The maximum likelihood method implemented in the online software RAxML was used to construct a phylogenetic tree [39].

Additional file
Additional file 1: Table S1. The PCR Gene-specific degenerate primers. Gene-specific degenerate primers designed based on the alignment and identification of consensus orthologous Hox gene sequences from zebrafish (Danio rerio), medaka (Oryzias latipes), pufferfish (Fugu rubripes), mouse (Mus musculus), cichlids, and humans (Homo sapiens). Table S2. The Percentage of the amino acid. Percentage amino acid identity between paralogous Hox sequences obtained from 4nF 1 and reported orthologues from zebrafish, fugu, and medaka.