Genome-wide characterization of copy number variations in genetic resistance to Marek’s disease using next generation sequencing


 Background: Marek’s disease (MD) is a highly neoplastic disease primarily affecting chickens, and remains as a chronic infectious disease that threatens the poultry industry. Copy number variation (CNV) has been examined in many species and is recognized as a major source of genetic variation that directly contributes to phenotypic variation such as resistance to infectious diseases. Two highly inbred chicken lines 63 (MD-resistant) and 72 (MD-susceptible), as well as their F1 generation and six recombinant congenic strains (RCSs) with varied susceptibility to MD, are considered as ideal models to identify the complex mechanisms of genetic and molecular resistance to MD.
Results: In the present study, to unravel the potential genetic mechanisms underlying MD, we performed a genome-wide CNV detection using next generation sequencing on the inbred chicken lines with the assistance of CNVnator. As a result, a total of 1,649 CNV regions (CNVRs) were successfully identified after merging all the nine datasets, of which 90 CNVRs were overlapped across all the chicken lines. Within these shared regions, 1,360 harbored genes were identified. In addition, 55 and 44 CNVRs with 62 and 57 harbored genes were specifically identified in line 63 and 72, respectively. Bioinformatics analysis showed that the nearby genes were significantly enriched in 36 GO terms and 6 KEGG pathways including JAK/STAT signaling pathway. Ten CNVRs (nine deletions and one duplication) involved in 10 disease-related genes were selected for validation by using qRT-PCR, all of which were successfully confirmed. Finally, qRT-PCR was also used to validate two deletion events in line 72 that were definitely normal in line 63. One high-confidence gene, IRF2 was identified as the most promising candidate gene underlying MD in view of its function and overlaps with previous study.
Conclusions: Our findings provide valuable insights for understanding the genetic mechanism of resistance to MD and the identified gene and pathway could be considered as the subject of further functional characterization.

previous study. Conclusions: Our findings provide valuable insights for understanding the genetic mechanism of resistance to MD and the identified gene and pathway could be considered as the subject of further functional characterization.

Background
Marek's disease (MD) is a T cell lymphoma induced by the highly oncogenic αherpesvirus II disease virus [1], which goes through a complex life with four main phases [2,3]: an early cytolytic phase at 2-7 days post infection (dpi), a latent phase around 7-10 dpi, a late cytolytic phase with the presence of tumors that is triggered between 14-21 dpi and finally a proliferation phase after 28 dpi. Currently, MD is a commercially important neoplastic disease of chickens and also one of the main chronic infectious diseases concern threatening the global poultry industry.
The control of MD mainly relies on vaccination, however, the vaccination efficacy has been experiencing erosion because of new emerging strains of Marek's disease virus (MDV) with escalated virulence. Enhancing genetic resistance to MD in poultry is an important long-term goal to control MD. Therefore, understanding of genetic basis of MD and improving MD resistance in chickens are of great interest for the poultry industry and animal welfare. In order to optimally implement this control strategy through marker assisted selection (MAS) and to understand the etiology and mechanisms of MD, it is necessary to identify more specific variants and genes with respect to MD latency.
Genetic variations play crucial roles in phenotypic diversity [4], some of which may underlie major mechanisms that account for variations in disease resistance. The identification of structural variations and potential genetic markers is very important for better understanding of disease resistance, as well as genomic predication and MAS. There are several types of genetic variations, where copy number variation (CNV) is one of the important genetic variants. To our knowledge, CNV is a type of important genomic structural variation which includes intermediately sized DNA segments that have undergone submicroscopic insertion, deletion, segmental duplication, and complex changes of greater than 1 kilobases (Kb) to several megabases (Mb) in size [5]. It is also a major source of genetic variation underlying phenotypic diversity [6]. Since the first two genome-wide scans for CNVs in human [7,8], an large amount of CNV detection studies have been performed, which revealed that CNVs are ubiquitously distributed in the genome and can influence the phenotype via regulations of gene expression and gene dosage [9][10][11]. Besides, numerous studies in other species have also shown that CNVs contributed to phenotypic variation of complex diseases and traits [12][13][14][15][16][17][18][19], including MD studies [20][21][22]. Two major platforms based on SNP chip, comparative genomic hybridization (CGH) array and SNP genotyping array, are traditionally used for CNVs detection. However, due to the limitation in resolution and sensitivity, it is difficult to detect small CNVs shorter than 10 kb in length and to identify the precise boundaries of CNVs. In recent years, a variety of CNV detection approaches based on next-generation sequencing (NGS) were proposed and offer promising alternatives as they have higher effective resolution and sensitivity to discover CNVs with more types and wider size ranges. Advances in NGS have provided a new platform for more detailed characterization of CNVs in genomes [23-25].
The primary aim of the present study is to perform a genome-wide CNV analysis to detect CNVs contribute to MD resistance or susceptibility. We applied deep sequencing on nine different genetic chicken lines that significantly vary in genetic resistance to MD. CNVnator [26] was employed to identify a comprehensive map of CNVRs and genes. qRT-PCR was used to verify the detected CNVRs. Our analysis provides valuable insights for understanding the genetic mechanism of resistance to MD and the identified gene and pathway could be considered as the subject of further functional characterization.

Mapping statistics and CNV detection
In this study, we performed whole genome sequencing in nine different lines ( Fig. 1 Table S2. A total of 1,649 CNV regions (CNVRs) ( Table 1) allowing for CNV overlaps of 1 bp or greater were obtained across all the chicken lines after merging, covering autosomes 1-28, and sex chromosomes Z and W. The chicken CNV map across the genome is shown in Fig. 2 (Fig. 3B). In addition, the CNVRs belonging to gain and loss account for 47.1% (776 events) and 52.9% (873 events), respectively.

Gene annotation and functional analysis
The genes harbored in the inferred CNVRs were extracted using custom Perl scripts.
As a result, a total of 2,588 RefSeq genes within the regions of the 1,649 CNVRs were obtained, where a majority of these genes were involved in immune, tumor and diseases. The identified genes were submitted to DAVID (version 6.8) for GO and pathway enrichment analyses. Using functional annotation clustering, at the highest classification stringency, 145 clusters were formed, where only 9 clusters were chosen after using an enrichment cutoff at > 1.0 (Additional file 3: Table S3).
GO terms and KEGG pathways analyses invoked in DAVID yielded 36 significant enriched (28 terms of biological process, 2 terms of cellular component, and 6 terms of molecular function) functional terms (P < 0.05, Fig. 4), and 6 significant enriched pathways (P < 0.05, Table 2), including the JAK/STAT signaling pathway (gga04630, Additional file 4: Figure S1). The detailed information of all the GO terms and pathways are shown in Additional file 5: Table S4.

PCA analysis and cluster
To investigate genetic structure in nine inbred chicken lines, we performed a principal component analysis (PCA) using the CNV segments by custom R scripts.
Nine principal components were calculated and the first two PCs were used in the plot (Fig. 5A). The nine lines were clustered to approximate four groups with the similar patterns, as indicated by dashed circles (Fig. 5A), which were consistent with their susceptibility to MD (Fig. 5B). Lines RCS-A, M and 7 2 were well clustered together with high MD incidence. Lines RCS-D, J, L and X were clustered together with high resistance to MD. Interestingly enough, as expected, F 1 individuals with a medium MD incidence were in an intermediary position between line 7 2 and line 6 3 , which provided the theoretical basis of heterosis and identification of imprinting genes for disease resistance.

Shared versus line-specific CNVRs
To investigate how frequently CNVRs were shared or lineage-specific across different lines, we calculated the CNVR numbers among the nine inbred chicken lines (Table 1). In total, 90 CNVRs were detected across all the individuals, which represented the commonly shared CNVRs. A total of 55, 44, 82, 15, 14, 72, 190, 18, and 5 CNVRs were detected as line-specific CNVRs in line 6 3 , 7 2 , F 1 , RCS-A, D, J, L, M, and X, respectively, as compared to other lines (Table1, Additional file 6: Table   S5). Importantly, the line 6 3 and 7 2 lineage-specific CNVRs could potentially offer certain clues to explore the genetic mechanisms of MD resistance or susceptibility.
So, a total of 62 and 57 harbored genes were identified in line 6 3 and 7 2 , respectively, including several immune-, tumor-and disease-related genes, such as interferon regulatory factor 2 ( IRF2), which was a line 7 2 line-specific gene in this study (Additional file 7: Table S6). Interestingly, our lab also found a MD-resistant associated differentially methylated region (DMR, chr4: 38,999,001-39,000,000), which was hypermethylated in line 6 3 than 7 2 , in the previous DNA methylation study (not published). The harbored region was also IRF2, which is involved in immune response IFN alpha/beta signaling pathway. This gene could be a candidate gene associated with MD susceptibility.

CNVRs validation by qRT-PCR
To confirm the identified CNVRs, 10 CNVRs containing gains (duplications) and losses (deletions) detected here were validated by qRT-PCR using two reference genes (THRSP and β-actin). We found that all of the 10 CNVRs were confirmed in agreement with the CNVnator results ( . For CNVR6, a total deletion was detected in line 7 2 , while line 6 3 had a normal status. For CNVR7, line 7 2 had two third of the normal copy numbers, while line 6 3 also had a normal status (Fig. 6B). Therefore, the copy numbers of these two loci were found significantly lower in line 7 2 as compared to line 6 3 , again supporting our CNV calls and suggesting that they are potentially linked to MD susceptibility.

Comparison with other studies on CNV in chickens
Considering that most of the previous CNV detection studies based on the galGal2 and galGal3 genome assembly, coordinates of the CNVRs were converted using the UCSC liftOver tool (http://genome.ucsc.edu/cgi-bin/hgLiftOver). We migrated all chromosomal CNVRs from galGal2 and galGal3 (used in previous studies) to galGal4.
We eventually obtained 585 CNVRs in the present study for comparison. Our results were then compared to 12 previous reports on chicken genomic CNV (  [22] results that also involved in MD were validated in our study. Taken together, 42.2% of our CNVRs overlapped with these three MD studies. The detailed information of CNVRs identified in this study and previous studies is provided in Additional file 8: Table S7. Discussion MD, a complicated tumor disease, has been used as a model for human tumor study [29]. The genetic mechanism underlying MD is likely to be very complex and remains incompletely understood. Thus, it is important to understand the genetic basis of MD-resistant or MD-susceptible for poultry, which can provide crucial clues for human diseases. In the present study, based on the high throughput sequencing platform, some bioinformatics analyses were conducted to identify CNVs, genes and enriched pathways taking full advantage of identical genetic background in nine inbred chicken lines. Copy number variations in the chicken genome have been explored by many research groups in the past decade. However, most of the previous studies focused on the CNV discoveries using low-density SNP arrays [30,31]. With the development of high-throughput genotyping technology, the NGS data has been used to detect the complex diseases and traits-related CNVs. CNV detections based on NGS data, which has much higher density compared to SNP chip data, have been developed and implemented in different tools [32]. There are four main methods for detecting CNVs with NGS data: Read-Pair (RP), Spit Read (SR), Read Depth (RD), and assembly (AS) based methods, including CNVnator [26] used here, Pindel [33], ReadDepth [34], PEMer [35], and some other useful methods. However, each of the methods have different advantages and limitations in their applicability and suitability for NGS data. CNVnator based on RD method was the only software employed in this study. To our knowledge, it uses the established mean-shift approach with additional corrections for multiple-bandwidth partitioning and GC correction for more accurate CNV detection. Previous approaches using RD were limited to only unique regions of the genome, discovered only large CNVs with poor breakpoint resolution, or could not perform genotyping. CNVnator is able to discover CNVs in a vast range of sizes, from a few hundred bp to several Mb in length, in the whole genome. Therefore, our results here could reveal additional novel genetic variations underlying MD than those revealed by SNP arrays alone.
In the present study, we performed comparisons with the previous CNV studies, especially three researches also involved in MD. We found 247 CNVRs covering 93.9 Mb length overlapped with these three MD studies, where only 7 (1.2%) and 15 (2.6%) CNVRs shared with Luo et al.'s [20] and Xu et al.'s [22] results using SNP chip data, which may be, in part, related to limited sample sizes, different platforms, different analysis methods, and different chicken genome references (although we converted the genome positions from two previous genome assemblies (galGal2 and galGal3) to a newer one (galGal4) with the help of LiftOver based on UCSC, some information may still be missed). More importantly, we also used different chicken lines, especially the selected RCSs from a total of 19 RCSs ( Figure   1). We also compared with the CNV identified in lines 6 3 and 7 2 using NGS data for MD study [21], and found 228 (39.0%) CNVRs overlapped, which provided more effective information for our study. Moreover, our study explored the genetic structure based on CNV in different inbred chicken lines. The PCA analysis showed clearly that the first two PCs can divide all chickens into four unique groups, which is similar to the results of Xu et al. [22]. Therefore, our study further confirms that CNV markers can be used to study the genetic variability in diverse chicken lines, which could possibly contribute to lineage-specific phenotypes.
The genetic mechanism underlying MD is likely to be very complex and not clear yet. It may be determined by some specific structural variations but not a single gene or a SNP mutation, though several candidate genes have been reported in previous studies described above. In the current study, we investigated CNVs among diverse inbred lines and found 55 and 44 unique CNVRs in line 6 3 and 7 2 , respectively, which could be associated with MD. Notably, we successfully identified a CNVR, which was a deletion and a normal copy number in all individuals from line 7 2 and line 6 3 , respectively, including a nearby gene IFR2. Fortunately, the IFR2 gene was also highlighted in our former DNA methylation study (not published) involved in a critical DMR identified by methyl-CpG binding domain protein enriched genome sequencing (MBD-seq) with a false discover rate (FDR) < 0.1 and validated by bisulfite cloning sequencing, which was hypermethylated in line 6 3 than 7 2 . The region of the DMR identified in previous study and the CNVR identified here was almost completely the same, which has the same start site with 1000 bp overlaps.
The nearby gene IRF2 is a disease-and virus-related gene involved in interferon gamma signaling pathway and immune response IFN alpha/beta signaling pathway [36]. This gene is conserved in human and some other species like chimpanzee, Rhesus monkey, dog, cow, mouse, rat, zebrafish, and frog. Thus, some mutations or structural variations of this gene could be key factors that related to disorders or diseases. It was reported that IRF2 gene was associated with several diseases in chickens like necrotic enteritis [37], pancreatic cancer [38], and atopic dermatitis and eczema herpeticum [39]. More interestingly, this gene can specifically bind to the upstream regulatory region of type I IFN and IFN-inducible MHC class I genes, which could be an important clue to explore the genetic mechanisms of MD resistance because, to our knowledge, MHC plays an important role in the determination of resistance to MD [1]. Therefore, IRF2 may be a very important gene with structural variation identified here related to MD according to the known functions and our former studies. Another useful information obtained in this study is the JAK/STAT signaling pathway, which was also considered as a potential pathway responding to MDV infection reported by Perumbakkam et al. [40]. The JAK/STAT signaling pathway is one of a handful of pleiotropic cascades used to transduce a multitude of signals for development and homeostasis in animals [41].
JAK activation stimulates cell proliferation, differentiation, cell migration and apoptosis. These cellular events are critical to immune development and some other processes. Importantly, mutations that constitutively activate or fail to regulate JAK signaling properly cause inflammatory disease, including several chicken diseases [42,43]. Additionally, a previous study reported that IRF2 can regulate macrophage apoptosis through a STAT1/3 [44], which provides valuable and potential interaction of IFR2 and JAK/STAT pathway who might jointly contribute to the genetic resistance to MD. Therefore, we hypothesize a probable mechanism of complex disease: the deletions in CNV could associate with different epigenetic effects, which further regulate an interacting pathway leading to occurrence of diseases.

Conclusion
In summary, we investigated copy number variations in inbred chicken lines using next generation sequencing. We have successfully identified a number of linespecific CNVRs, as well as revealed genes and pathways that may be involved in genetic resistance to MD. Combining with our previous study and due to the complexity of MD, we ultimately found a high-confidence candidate gene IRF2, and an immune-or disease-related pathway, JAK/STAT signaling pathway, which could jointly play potentially important roles in response to MD resistance. Overall, our findings in the present study will provide valuable insights for understanding the genetic mechanism of resistance to MD and will be worthy of further functional characterization.

Experimental population
A total of 26 chickens without treatments were used for blood collection in this study, including three chickens from each of the line 6 3 (MD-resistant), line 7 2 (MDsusceptible) and six recombinant congenic strains (RCSs, RCS-A, D, J, L, M, and X), and two chickens from reciprocal cross F 1 hybrid 6 3 ×7 2 (USDA-ARS ADOL, East Lansing, Michigan, USA) [45]. RCSs were developed using line 6 3 as the parental strain mated to line 7 2 and then backcrossed to line 6 3 twice followed by full-sib mating for more than 20 generations (Fig. 1). Eventually, diverse RCSs were generated and they contain 87.5% of line 6 3 and 12.5% of line 7 2 in the genetic background but differ in MD resistance/susceptibility [46].

Library construction and sequencing
Blood samples were collected from the brachial vein by venipuncture. Genomic DNA

Read alignment and CNV calling
Chicken genome assembly (galGal4) was retrieved from the UCSC Genome Browser website (http://hgdownload.soe.ucsc.edu/goldenPath/galGal4/bigZips/) [47]. In order to minimize the mapping errors, quality control was performed by FastQC [48] and low quality reads were removed with the help of FastX Toolkit [49] and Trimmomatic [50] with default parameters. The resulting FastQ files of mapping reads of each sample were aligned to the reference genome individually using Burrows-Wheeler Aligner (BWA-MEM) (v0.7.15) [51] with mainly default parameters. SAMtools (v1.3) [52] was then used to convert the alignment results (SAM format) to BAM format and all converted BAM files were sorted with command SAMtools. Duplicate reads were removed from individual sample alignments using MarkDuplicates in the Picard package (http://broadinstitute.github.io/picard/) to avoid any influence on variant detection, and reads were merged using MergeSam-Files. We additionally performed local realignment using Genome Analysis Toolkit (GATK, v3.5) [53] to enhance the alignments in regions of indel polymorphisms, which can greatly improve the sensitivity and specificity in CNV calling [54].

.3) software [26]
based on read depth (RD) method to predict genomic CNVs between the nine chicken lines and the reference. CNVnator firstly calculated the counts of mapped reads within user specified non-overlapping bins of equal size as the RD signal, and then adjusted the signal in consideration of the potential correlation between RD signal and GC content of the underlying genomic sequence. The mean-shift algorithm was employed to segment the signal with presumably different underlying CN. Then CNVs were predicted by applying statistical significance tests to the segments. We then ran CNVnator with a bin size of 100 bp for our data. CNV calls were filtered using stringent criteria including P-value < 0.05 and minimum size > 1 Kb, and calls with > 50% of q0 (zero mapping quality) reads within the CNV regions were removed (q0 filter). All CNV calls overlapping with gaps in the reference genome were excluded from consideration. CNVs located on random contigs (chrN_random), unlocalized chromosomes (chrUn), or in overlapping gaps were discarded for further analysis due to the shorter length of the chrUn contigs and mapping ambiguity of chrUn sequence reads. In order to compare our results with previous studies, we converted all chromosomal CNVRs from galGal2 and galGal3 (used in previous studies) to galGal4 with the assistance of LiftOver based on UCSC (http://genome.ucsc.edu/cgi-bin/hgLiftOver).

Gene detection and functional analysis
Results from CNVnator were combined to obtain a collective set of unique CNVs with different start or end coordinates. These CNVs were then merged into non-

Validation of CNVRs by qRT-PCR
To experimentally validate the detected CNV calls by CNVnator, we performed qRT-PCR confirmation of ten CNVRs randomly selected from line 6 3 and line 7 2 , respectively. All the primers were designed based on the GenBank reference sequences using the Primer 3.0 webtool (http://frodo.wi.mit.edu/primer3/) ( Table   S8). The β-actin gene and thyroid hormone responsive (THRSP) gene served as reference genes. For each chicken line, at least three individuals were used to do the validation. qRT-PCR using SYBR Green PCR Kit was performed in triplicate based on iCycler iQ PCR System (Bio-Rad). qRT-PCR reaction program was run as follow: pre-incubation (95 °C for 10 min), 40 cycles of amplification (95 °C for 10 s, 60 °C for 10 s, and 72 °C for 10 s), melting curves using a heat ramp and cool down. Cycle threshold values (Ct values) were obtained from iCycler iQ PCR software. The 2 -ΔΔCT method was used to calculate the copy number [56][57][58]. The corresponding equation

was (see Equation in Supplementary Materials)
,where CT is the threshold cycle, sample A is the tested individual, and sample B is the control individual with single copy or no variation in copy number. Samples with Normal Ratio (NR) about 1 denote normal individuals (two copies), samples with NR about 0.5 denote one copy loss individuals, and samples with NR about 1.5 or more denote copy number gain individuals [59,60]. (http://www.nap.edu/openbook.php?record_id=5140) and University of Maryland (R-08-62). All efforts were made to minimize discomfort and suffering.

Consent for publication
Not applicable.

Availability of data and material
The sequencing data have been submitted to the NCBI Sequence Read Archive    Figure 1 The chicken population used in this study. Chicken lines labeled blue were selected for CNV d Figure 2 Circos plot illustrating CNV regions in nine chicken lines. Regions with copy number events a