Whole-genome linkage analysis in mapping alcoholism genes using single-nucleotide polymorphisms and microsatellites

There is currently a great interest in using single-nucleotide polymorphisms (SNPs) in genetic linkage and association studies because of the abundance of SNPs as well as the availability of high-throughput genotyping technologies. In this study, we compared the performance of whole-genome scans using SNPs with microsatellites on 143 pedigrees from the Collaborative Studies on Genetics of Alcoholism provided by Genetic Analysis Workhsop 14. A total of 315 microsatellites and 10,081 SNPs from Affymetrix on 22 autosomal chromosomes were used in our analyses. We found that the results from the two scans had good overall concordance. One region on chromosome 2 and two regions on chromosome 7 showed significant linkage signals (i.e., NPL ≥ 2) for alcoholism from both the SNP and microsatellite scans. The different results observed between the two scans may be explained by the difference observed in information content between the SNPs and the microsatellites.


Background
There is currently great interest in using SNPs in genetic linkage and association studies because of the abundance of SNPs as well as the availability of high-throughput genotyping technologies. Kruglyak [1] predicted in a theoretical study that maps with approximately two to three times the density of SNPs with a heterogeneity of 0.5 would be equivalent to the current microsatellites maps. With current high-throughput SNP genotyping technologies, it is now feasible and affordable to collect genotype data from tens of thousands of SNPs. John et al. [2] described the first whole-genome scans with linkage analysis of a complex disease, rheumatoid arthritis, to com-pare SNPs with microsatellites directly. In this paper, using the Collaborative Studies on Genetics of Alcoholism (COGA) data provided by Genetic Analysis Workshop 14 (GAW14), we compared the results based on wholegenome scans of 143 pedigrees using 315 microsatellites and 10,081 SNPs from Affymetrix across 22 autosomal chromosomes.

Nonparametric linkage analysis
COGA data provided by GAW14 include 143 pedigrees with 1,614 individuals genotyped with both microsatellites and SNPs. In addition, the genetic maps for both the microsatellites and the SNPs were provided. We used the nonparametric linkage analysis implemented in MERLIN [3] for linkage analysis. Individuals were defined as unaffected with alcoholism if they never drank alcohol or if they showed some alcohol-related syndromes but did not meet the criteria for alcoholism [4]. Allele frequencies were estimated using all genotyped individuals, and the Whittemore and Halpern "ALL" statistic [5] was applied for the scan procedure, in which the NPL scores based on all affected pedigree members were calculated. Both the SNP scan and the microsatellite scan were performed at each marker locus.

Genotyping error detection
To avoid potential bias caused by possible genotyping errors on linkage signals, the error-checking algorithm implemented in MERLIN was applied. This algorithm identifies unlikely genotypes based on the inferred double recombination events, when erroneous genotypes can imply excessive and unlikely recombination events between tightly linked markers [3]. We used the default parameter in MERLIN, where the likelihood ratio of an erroneous genotype with p ≤ 0.025 was excluded [2]. The two whole-genome scans were carried out both with and without the erroneous genotypes to exam the effect of genotyping error on the scan results.

Information content (IC)
The major advantage of using high density SNPs versus microsatellites is the increased information content (IC). IC was calculated using MERLIN to compare the microsatellites and the SNPs in order to investigate factors contributing to the differences between the two scans. The microsatellites were spaced an average of 13 cM apart, whereas the SNPs were spaced an average of 0.35 cM apart. To assess the effect of the reduced IC on the SNP scan, a 3,360-SNP map with an average spacing of 1.0 cM was randomly extracted from the full set of SNPs as a subset for a separate scan.

Nonparametric linkage analysis
The results from the whole-genome scans using the microsatellites and the SNPs had good overall concordance.
Six regions showed some evidence of increased allele sharing, with a NPL cutoff value of 2 for either the SNP scan, the microsatellite scan, or both. The results were summarized in Table 1, which also included analyses containing erroneous genotypes. Overall, the scan using the SNPs gave stronger linkage signals than those using the microsatellites. Except for two regions on chromosomes 2 and 13 that showed significant linkage evidence using the microsatellites but not using the SNPs (there was no SNP genotyped in the region on chromosome 13), the SNP scan gave stronger linkage signal. Four regions on chromosomes 1, 2, 12, and 13 showed significant linkage evidence when using the SNPs but not using the microsatellites. Both the SNP and the microsatellite scans indicated strong linkage signals on chromosome 7, and relatively strong linkage signals on chromosome 2. Results for these two chromosomes (excluding the errone-Multipoint nonparametric linkage scores and IC from 1-cM SNP scan for chromosomes 2 and 7 ous genotypes) and one-LOD confidence intervals of these peaks are shown in Figure 1. In general, the peaks were better defined by the SNP scan, where peaks from the SNP scan had narrower 1-LOD intervals than those from the microsatellite scan (SNP 1-LOD interval was 20 cM, compared with a 40-cM 1-LOD interval with the microsatellites for the peak on chromosome 7 around 100 cM. For the peak on chromosome 7 around 60 cM, the SNP 1-LOD interval was 9 cM, compared with a 16-cM 1-LOD interval with the microsatellites. One-LOD intervals for the peaks on chromosome 2 around 10 cM had similar width for the SNPs and the microsatellites.) The NPL scores decreased in the SNP 1-cM scan for all but one region on chromosome 2 at about 18 cM compared to those from the SNP full set scan. With the NPL cutoff of 2, several regions on chromosomes 2, 7, and 12 that were significant in the SNP full set scan were no longer significant in the SNP subset scan. We also noted that the effect of genotyping error on the linkage results was small for this particular data set, although potential genotyping errors seemed to increase the linkage signal slightly, which contradicted with the finding of John et al. [2], who suggested that removal of unlikely genotypes could increase the significance of nominal loci. The discrepancy may due to the different genotyping error rates in the two data sets. There were 1,295 microsatellite genotypes that were likely to be errors and were set missing with MERLIN's error checking algorithm. Among the 1,614 individuals and 315 microsatellites, there were a total of 353,015 genotypes, so the error rate for the microsatellite markers was estimated to be 0.367%. Similarly, there were 27,338 SNP genotypes that were likely to be errors and were set missing with MERLIN's error checking algorithm. The error rate for the SNPs was estimated to be 0.204% as among the 1,614 individuals and 10,081 SNPs, there were a total of 13,395,832 genotypes.

Information content (IC)
The mean IC for each individual chromosome for the full SNP set, SNP subset, and microsatellites across 22 autosomal chromosomes when the erroneous genotypes were either excluded or included were summarized in Table 2.
The IC for the full SNP set was significantly and uniformly higher than that for the microsatellites. When the erroneous genotypes were excluded, the mean genome-wide IC for the microsatellites was 0.783, with an inter-quartile range of 0.134, and was 0.950 for the SNPs with an interquartile range of 0.017. The mean IC for the SNP subset was 0.905 with an inter-quartile range of 0.044 when erroneous genotypes were excluded. When erroneous genotypes were included, the mean IC for the SNP subset was 0.910 with an inter-quartile range of 0.042. We noted that although genotyping errors were expected to reduce the values of IC slightly, their impact was quite small, which may be due to the small genotyping error rate.

Discussion
We have compared the genome-wide linkage analyses based on the microsatellites and the SNPs. We used the software MERLIN to conduct nonparametric linkage analysis to map regions associated with alcoholism on 22 autosomal chromosomes. The results from the two scans had good concordance in general, although more significant signals were obtained using the SNPs versus the microsatellites. Both scans suggested strong linkage evidence on chromosomes 2 and 7, where the two scans agreed especially well. The microsatellite scan had a peak at the marker D7S820 at 107.5 cM with an NPL score of 2.56 on chromosome 7, and the SNP scan had a peak at the marker tsc0046246 at 100.9 cM with an NPL score of 2.81. For chromosome 2, the microsatellite scan had a peak at the marker D2S1329 at 4.9 cM with an NPL score of 2.13, and the SNP scan had a peak at the marker tsc0056805 at 243.6 cM with an NPL score of 2.80. The differing results observed in the two scans were likely explained by the difference between the IC in the microsatellites and the SNPs. In fact, the higher IC is one major advantage of the high-density SNPs compared with the conventional microsatellite sets. The IC across the genome for the SNPs was uniformly higher than that for the microsatellites.
As expected, the analysis based on the SNP subset showed decreased IC and reduced linkage signals compared with the SNP full set, which suggested that the difference in IC might be one key factor that contributed to the observed difference in the two scans. This was consistent with the conclusion from John et al. [2], who examined possible reasons for the observed difference between the scans using the SNPs and the microsatellites comprehensively, including the genotyping errors of the SNPs and the microsatellites, the possible errors in the two maps used, the presence of linkage disequilibrium (LD), and the differences in IC. We have also investigated the possible effect of genotyping errors on the linkage results. Out results suggested that the impact of genotyping errors was quite small for the COGA dataset, which may be due to the small genotyping error rate (0.37% for the microsatellites and 0.20% for the SNPs) and may not be generalized to other data sets. It is worth noting that for the full SNP set with an average spacing of 0.35 cM, it is highly possible that there is LD between SNPs, which may influence the linkage results from MERLIN since MERLIN assumes linkage equilibrium between all markers. John et al. [2] explored the possible effect of LD on the two scans by keeping one SNP from a group of SNPs in LD, or by assigning haplotypes to individuals for clusters of SNPs in LD and treating them as multi-allelic markers. They found that for both cases, there were losses in IC, which made it difficult to assess whether the difference observed in the two scans were due to LD or to losses in IC. They concluded that overall the results were qualitatively similar when SNPs in LD were included or excluded.
Finally, we noted that the SNP subset scan was able to detect some regions detected by the SNP full set scan, and the SNP subset had an average IC of 0.910 compared to the average IC of 0.950 for the full SNP set. With the NPL cutoff of 2, the SNP subset scan resulted in some loss of significance of several regions on chromosomes 2, 7, and 12.

Conclusion
We have identified two regions that showed some evidence of linkage with alcoholism on chromosome 2 and chromosome 7 from both the microsatellite and the SNP scans. For these regions, we had stronger linkage signals using the SNPs than those using the microsatellites. Although results from the two scans had good overall concordance, three regions of significant linkages were detected in the SNP scan but not in the microsatellite scan. Lastly, the difference in IC between the SNPs and the microsatellites might explain the different results observed in the two scans.