Animals, samples and racing results
The analysis was performed on 254 pure Arabian horses (185 mares and 69 stallions). The analyzed Arabian horses were that offspring of 95 stallions (an average 2.67 individual per sire) and 208 mares (an average 1.22 individual per dam). Additionally, based on SNP microarray data (from the other project on the same population) the inbreeding coefficient - FIS (Excess homozygosity-based inbreeding estimate) was calculated and was 0.049472, which indicated on low relatedness (however, the FIS value above 0 might denote on slight increasing the relationships between animals in analysed population). All horses are registered as pure-breed in Polish Arabian Stud Book (PASB) being under World Arabian Horse Organization. Horses were 3 to 5 years old and had all taken part in flat races. Horses participated in flat races at distances ranging from 1400 to 3000 m. For each animal, several racing results were collected: the number of wins and rate of placing first, second, third, fourth and fifth, the number of races run and the distances of the races in which each horse participated. The total money winnings and dam line effect were also taken into consideration. All horses which participated in flat-races were genotyped regardless of their racing results, thus analysed population included horses with excellent or good results and horses which participated in races, but they did not win any race/money. The analyzed Arabian horses competing in flat races under the management of Horse Racing Authority in Poland affiliated by International Federation of Arabian Horse Racing Authorities and International Federation of Horseracing Authorities. The winning classification was consistent with outlines of the above societies.
Whole blood or hair follicles were collected from the horses. The protocol was approved by the Animal Care and Use Committee of the Institute of Pharmacology, Polish Academy of Sciences in Kraków (no.1173/2015).
SNP identification and genotyping
All SNPs within SLC16A1 were detected based on RNA-seq data previously obtained from Arabian horses according to EquCab2.0 reference [22, 29]. Then, the fast and less cost-effective PCR-RFLP method was designed to identify polymorphisms in exon 5 (ss#3021042926), and PCR-HRM was used to identify mutation in the 5’UTR (ss#3021042925). The details of the methods used are presented in Table 1. For 48 randomly selected samples, the both amplicons were sequenced using Sanger sequencing to confirm the results obtained. The sequencing was performed using BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosysytems, Thermo Fisher Scientific), PCR products were purified using BigDye XTerminator Purification Kit (Applied Biosysytems) and next sequenced on 3500xl Genetic Analyzer (Applied Biosysytems). DNA samples from blood and hair follicles were isolated with the use of Sherlock AX DNA Isolation kit (A&A Biotechnology, Gdynia, Poland), according to protocol. Both polymorphisms were genotyped for all horses.
Statistical analysis
The races in which the analysed horses participated and won first place were assigned to one of three distance groups:
-
distance 1 – short – 1400 m
-
distance 2 – middle – 1600 m – 2000 m
-
distance 3 – long – 2200 m – 3000 m
The differences in genotype frequencies between distance groups were calculated with the use of a chi-square test (R Package). The normality of racing data distribution was tested using Shapiro-Wilk test. The association between the both identified mutations and the racing results were estimated using the GLM procedure. The GLM model in the most expanded form included all the factors of interest:
$$ {\mathrm{Y}}_{\mathrm{i}\mathrm{jklm}}=\upmu +{\mathrm{AGE}}_{\mathrm{i}}+{\mathrm{GEN}}_{\mathrm{j}}+{\mathrm{SEX}}_{\mathrm{k}}+{\mathrm{MATL}}_{\mathrm{l}}+{\mathrm{RSD}}_{\mathrm{m}}+{\mathrm{e}}_{\mathrm{i}\mathrm{jklm}} $$
where:
Yijklm – the trait measured (1st place winner; 2nd place; 2nd place; total of placed from 1st to 3rd places; average numbers of all starts per horse; ratio- placed (1st to 3nd) to average number of starts; total wins showed in money).
μ – the overall mean for the trait measured,
AGEi - the fixed effect of i horse age (from 3.5 year to 5 year),
GENj – the fixed effect of j genotype group of SLC16A1 gene (TT, TG, GG for g.55589063 T > G or CC, CT TT for g.55601543C > T),
SEXk – the fixed effect of k sex,
MATLj – the fixed effect of l maternal line (from 1 to 23)*,
RSDm – the random d effect of m rider or sire or dam (each of the effects of the rider, sire or dam was included in the model separately),
eijklm – random error.
Next, the factors with p-values more than 0.05 were removed and final model was:
$$ {\mathrm{Y}}_{\mathrm{ijklm}}=\upmu +{\mathrm{GEN}}_{\mathrm{j}}+{\mathrm{SEX}}_{\mathrm{k}}+{\mathrm{MATL}}_{\mathrm{l}}+{\mathrm{e}}_{\mathrm{j}\mathrm{kl}} $$
where:
Yijklm – the trait measured,
μ – the overall mean for the trait,
GENj – the fixed effect of j genotype group of SLC16A1 gene (TT, TG, GG for g.55589063 T > G or CC, CT TT for g.55601543C > T),
SEXk – the fixed effect of k sex (significant for number of races starts per horse).
MATLj – the fixed effect of l maternal line (from 1 to 23)*,
ejkl – random error.
*The maternal line classification was done based on the pedigrees of the horses (http://www.janow.arabians.pl/pl/rodowod-form.php). First, all horses were divided into groups according to the matrilineal founders. Second, the sub lines were extracted where the founder progeny had at least 6 generations and had established its own line with living and approved successors. The detailed information about dam lines is presented in Additional file 2: Table S1.
As a post-hoc test we used Duncan’s new multiple range test. The analysis was performed for whole population and separately for horses winning at middle and long distances.
The linkage disequilibrium between SNPs were calculated using R Bioconductor package – Chopsticks v1.46.0 [30].