Description of association population
We used a population consisting of 125 unrelated C. fargesii individuals that collected from all of the provenances throughout its whole natural distribution range in China, for the initial SNP association mapping. The distribution zone from which these individuals were collected, could be divided into four geographic regions: Fenhe River Valley, Jinghe River Valley, Jialingjiang River Valley, and Yellow River Valley. In the year 2009, branch segments of 125 native C. fargesii individuals were collected from eight cities in four provinces and grafted, and the population was grown in Xiaolongshan National Nature Reserve, Gansu Province, China (33°40′N, 106°23′E). The clonal plantation was established using a randomised complete block design with two plants per clone in each block (row spacing is 2 m and plant spacing is 2 m) and totally six replicates. Within the association population, 93 individuals were randomly selected (for each location, at least one C. fargesii individual were selected) to identify SNPs within the gene via PCR amplification and sequencing.
Phenotypic data
The 125 individuals of the association population were sampled in 2012 to characterise their wood quality traits. Cores from bark to pith were collected in the south-facing direction of the original stems to measure wood density and other wood properties with an increment borer (7 mm) at breast height (1.3 m above the ground). The wood samples were fixed in formalin–acetic acid–alcohol (FAA) after collection, and nine wood property traits were measured in 2013: wood basic density (WBD), pore rate, cell wall percentage (the percentage of cell wall in whole cells), cell wall thickness, radial lumen diameter, chordwise lumen diameter, radial fibre central cavity diameter, chordwise fibre central cavity diameter, and average fibre central cavity diameter.
WBD was measured according to the formula WBD = W2/(W1 − W2 + W2/ρcw), where W1, W2 and ρcw represent the water-saturated weight, oven dry weight, and wood cell wall component density, respectively. We used the constant 1.53 g/cm3 for ρcw [11, 12]. The other wood properties were detected according to Li et al. [13]: Cores from the xylem to pith were split in 3-cm segments and cross-sections (10–15-μm-thick) were prepared using a sliding microtome (Leica, Heidelberg, Germany), stained with 1% safranin, and fixed with Eukitt (Bio-Optica, Milan, Italy) on an object glass. To measure the wood microstructure characteristic parameters, a digital image processing system combining a light microscope (80i, Nikon), video camera sensor (Penguin 600CL; Pixera Corp., Santa Clara, CA, USA), and TDY-5.2 colour image analysis system was used [14]. The phenotype data are listed in Additional file 1: Table S1 and the frequency distributions of all traits could be found in Additional file 2: Figure S1. We selected these nine wood property traits because these traits may influence the final mechanical properties of wood productions [13].The mean, maximum, minimum values and coefficient of variation of the nine phenotypic traits were calculated using SPSS software (ver. 18.0; SPSS Inc., Chicago, IL, USA) and are listed in Additional file 3: Table S2.
cDNA isolation and genomic DNA amplification of CfSUS
Xylem were collected by scraping the thin and partially lignified layer on the exposed xylem surface from the branches of a 1-year-old “Xianhuiqiu” clone. The tissue was frozen immediately in liquid nitrogen and stored in the laboratory at − 80 °C for later RNA extraction. Total RNA was extracted using a RNeasy Plant Kit (Qiagen, Shanghai, China) according to the manufacturer’s instructions. First-strand cDNA was synthesised from 2 g RNA using PrimeScript™ 1st Strand cDNA Synthesis Kit (TaKaRa, Tokyo, Japan). We obtained the 2887-bp complete coding sequence(CDS) of CfSUS, including a 2418-bp open reading frame (ORF) from a previous RNA sequencing data. We amplified CfSUS cDNA using SUS-CDS-specific primers (Additional file 4: Table S3).
Total genomic DNA was extracted from young leaves of a 1-year-old “Xianhuiqiu” clone with the DNeasy Plant Kit (Qiagen). Four specific primers (SUS-a, SUS-b, SUS-c, and SUS-d) were designed to sequence the introns in CfSUS based on the cDNA sequence on the conserved domain (Additional file 3: Table S2). After PCR amplification, four fragments were linked to the clone vector T-Vector pMD19 (TaKaRa) and sequenced, and the entire DNA sequence of CfSUS was obtained according to the assemblage results of the four sequenced fragments with DNAMAN software (Lynnon Biosoft, Vaudreuil, Quebec, Canada). Finally, 5067 bp genomic DNA sequences of CfSUS, including a 1394-bp 5′UTR, 3406-bp coding region, and 267-bp 3′UTR were obtained. The entire DNA sequence of CfSUS was verified using the SUS-e primers (Additional file 3: Table S2). The CfSUS sequence is deposited in GenBank under the accession number MH394454.
CfSUS and phylogenetic analyses
Amino acid sequences of CfSUS was used for BLAST in the GenBank database and multiple SUS proteins from other species were selected for phylogenetic tree construction using MEGA software and alignment with DNAMAN software. We analysed the phylogenetic relationship of CfSUS with the amino acid sequences of SUS from other species identified from NCBI (http://www.ncbi.nlm.nih.gov) using BLAST (Altschul et al. 1997): Orobanche ramosa (AEN79500.1), Nicotiana tabacum (AHL84158.1), Solanum tuberosum (NP_001274911.1 and NP_001275286.1), Solanum lycopersicum (CAA09681.1 and ADM47608.1), Cichorium intybus (ABD61653.1), Actinidia chinensis (AFO84090.1), Camellia sinensis (AHL29281.1), Gossypium hirsutum (ADY68848.1), Gossypium barbadense (ADY68844.1), Arachis hypogaea var. vulgaris (AEF56625.1), Jatropha curcas (AGH29112.1), Gossypium aridum (AEN71079.1), Gossypium tomentosum (AEN71067.1), Eucalyptus grandis (ABB53602.1), Populus tomentosa (ADW80558.1), Manihot esculenta (ABD96570.1), Hevea brasiliensis (AGQ57012.1, AGM14948.1, and AGM14949.1), Hordeum rulgare (CAA46701.1 and CAA49551.1), Lolium perenne (BAE79815.1), Triticum aestivum (CAA04543.1 and CAA03935.1), Bambusa oldhamii (AAV64256.2, AAL50571.1, AAL50570.1, and AAL50572.2), Oryza sativa (CAA46017.1, CAA41774.1, and AAC41682.1), Saccharum officinarum (AAF85966.1), Zea mays (AAA33514.1), and Tulipa gesneriana (CAA65639.1). Phylogenetic tree was conducted using MEGA ver. 5(maximum likelihood method) [15]. The statistical confidence of the nodes of the tree was based on 1000 bootstrap replicates.
CfSUS expression in different tissues
Total RNA was extracted from six different tissues, including leaf, bark, phloem, xylem, flower, and young branch, from three 11-year-old “Xianhuiqiu” clone (as three times repetition) using RNeasy Kits (Qiagen, Duesseldorf, Germany) and reverse-transcribed into cDNA using PrimeScript™ 1st Strand cDNA Synthesis Kit (TaKaRa, Tokyo, Japan). The cDNA samples were used to analysis the expression of CfSUS in different tissues using RT-qPCR.
RT-qPCR was performed with a LightCycler 480 System (Roche, Basel, Switzerland) using the SYBR Premix Ex Taq Kit (TaKaRa, Tokyo, Japan) with the recommended amplification system by the manual. The primers for amplification (SUS-q; Additional file 3: Table S2) were designed using Primer Express 5.0 software (Applied Biosystems, Life Technologies, New York, NY, USA), and a primer pair of an actin gene (Additional file 3: Table S2) was selected as internal control according to Jing et al. [16]. The PCR program was performed according to the recommended program in the LightCycler 480 System manual, as follows: initial denaturation at 95 °C for 30 s; 40 cycles of 5 s at 95 °C and 30 s at 60 °C; and then one cycle of 5 s at 95 °C, 60 s at 60 °C, and 95 °C (acquisition mode, continuous; acquisitions, five per degree Celsius). Four technical replicates and three biological replicates were performed for all experiments and the results obtained for the different tissues were standardised to the levels of actin using the 2−ΔΔCT method.
SNP discovery and genotyping
To identify SNPs in the CfSUS gene (without considering insertions/deletions), DNA sequences including 53-bp 3′UTR and 3406-bp coding region, was cloned, sequenced, and analysed in 93 randomly selected individuals from the C. fargesii association population. To ensure the sequencing accuracy, four pairs of primers (SUS-1, SUS-2, SUS-3, and SUS-4) were used to amplify four fragments (800–1500 bp) of the whole sequence (Additional file 3: Table S2) by PCR using Takara Ex Taq (TaKaRa, Tokyo, Japan) and ligated to pMD™18-T vector (TaKaRa, Tokyo, Japan). Eight clones for each fragment were randomly selected for sequencing. The four fragments were used to assemble the complete CfSUS sequence. DNAMAN and ClustalX2 were used for the sequence alignment. The 93 genomic clones were aligned and compared using MEGA ver. 5.0 and DnaSP v5 to identify SNPs and analyse nucleotide polymorphisms [17]. Subsequently, common SNPs (minor allele frequencies ≥0.05) were genotyped across all 125 DNA samples of the association population.
Nucleotide diversity and linkage disequilibrium analysis
We used Phase v2.1 software to disambiguate the DNA sequences into haplotypes (10,000 iterations applying the Bayesian Markov Chain Monte Carlo approach) [18] and DnaSP v5 software to calculate the summary statistics of the SNPs. Nucleotide diversity was evaluated using π value [19] and θw value [20], which represent the average number of pairwise differences per site between sequences and the average number of segregating sites, respectively [6].
The decay of LD with the increase of physical distance between SNPs within the candidate region of CfSUS was estimated by linear regression analysis of LD using DnaSP v4.90.1. The LD level between the 47 common SNP markers were valued as r2 (squared correlation of allele frequencies) using the HAPLOVIEW (https://www.broadinstitute.org/haploview/haploview), where the interval of the parameter varies from 0 to 1. The significance of r2 (P-values) for all the SNP sites were calculated using 100,000 permutations. Genotypic data of CfSUS identified in this population were showed in Additional file 5: Table S4.
Association mapping
The associated mapping was carried on according to the method of Wang et al. [6]. In the association population (n = 125), 47 common SNPs and nine wood properties associated traits were considered, and a mixed linear model (MLM) was selected to fit each SNP-phenotype combination using TASSEL v2.0.1 [21]. We used a mixed linear model described as the following formula: y = μ + Qυ + Zu + e. In this formula, “y”, “μ”, “υ”, “u” and “e” respectively represent a vector of phenotype observation, (a vector of) intercepts, (a vector of) population effects, (a vector of) random polygene background effects and (a vector of) random experimental errors. Q is a matrix used to define the population structure by STRUCTURE and Z is a matrix relating y to u. Var(u) = G = σa2κ with σa2 is the unknown additive genetic variance and κ is the kinship matrix. In this model, the Q and K represent the estimated membership probability and pairwise kinship, respectively. The Q matrix was identified based on the population structure pattern (K = 3) within the association population (125 unrelated individuals), which was assessed by STRUCTURE v2.3.1 [22]. The K matrix was obtained by the SPAGeDi ver. 1.2. Positive false discovery rate (FDR) was calculated by QVALUE software [23] and used to correct the multiple testing. The percentage of phenotypic variation (R2) explained by each SNP was calculated use the formula R2 = SSt/SST, where SSt and SST represented the variance between genotypes and the total variance, respectively [3].
Haplotype frequencies and haplotype association tests were evaluated and performed using Haplotype Trend Regression software (Golden Helix, Inc., Bozeman, MT, USA) on a three-marker sliding window. The significance of the associations was tested using 1000 permutation. Only haplotypes with a frequency not less than 1% were selected and positive FDR (Q ≤ 0.1) was used to correct the multiple test.
The modes of gene action were defined as the ratio of dominant (d) to additive(a) effects(|d/a|) estimated from the least-square means for each genotypic class. The algorithm and formulas used for calculating dominance (d) and additive(a) effects were described by Du et al. [24]. The values of |d/a| ≤ 0.5 was defined as additive effects, whereas partial or complete dominance was defined as the values within the range 0.50 < |d/a| < 1.25, and values of |d/a| ≥ 1.25 were regarded as overdominance.