### Participants

Participants were part of the Twins Early Development Study (TEDS), a longitudinal study involving a representative sample of over 11,000 sets of twins born in England and Wales between 1994 and 1996 [35, 36]. The TEDS project has received ethical approval from The Joint South London and Maudsley and the Institute of PsychiatryResearch Ethics Committee (approval number: 05/Q0706/228), and the study of the genetics of mathematical cognition within the TEDS sample was approved by the King's College London Research Ethics Committee (PNM/07/08-47). Informed parental consent was obtained before all tests were conducted. Comparisons to UK census data show that the TEDS sample continues to be representative of the UK population in terms of demographic characteristics [37]. We excluded children with specific medical syndromes such as Down's syndrome and other chromosomal anomalies, cystic fibrosis, cerebral palsy, hearing loss, autism spectrum disorder, organic brain damage, extreme outliers for birth weight, gestational age, maternal alcohol consumption during pregnancy, special care after birth, non-white ethnic origin (to mitigate population stratification), English spoken as second language at home (to facilitate a fair comparison of test performance scores), and those without DNA samples available. Following this the sampling frame consisted of 4517, 4555 and 4562 children with genotypes available for rs11225308, rs363449 and rs17278234, respectively. The size of the sampling frames differ slightly here as the genotype data in our sample is incomplete, i.e. our sample contains missing genotypes. For the SNP-set analyses, the sampling frame consisted of 3919 children with complete genotype data on the 10 SNPs from the 10 SNP set, and 2895 children with complete genotype data on the 43 SNPs in the 43 SNP-set. However, as described below, for 1024 children with genotype data available on at least 40 of the 43 SNPs, we substituted the population mean for missing data, resulting in a total of 3929 children with 43-SNP-set data. From these sampling frames, all individuals possessing the relevant cognitive data were used in tests of association. As the amount of available data varies between the different cognitive measures assessed here - especially across ages - the N involved in each test of association also varies widely.

### SNP genotyping

As part of a recent genomewide association study of mathematical ability [18], the following 46 SNPs were selected from a high vs. low mathematical ability scan of pooled DNA in 10-year-old children, because they showed the most significant between-group differences: rs11225308, rs363449, rs17278234, rs11154532, rs12199332, rs12613365, rs6588923, rs2300052, rs6947045, rs1215603, rs40941, rs1881396, rs4649372, rs2593170, rs9300810, rs4314720, rs39118, rs694598, rs11778957, rs7085203, rs9670398, rs4956093, rs2278677, rs16964420, rs16907131, rs7932127, rs4236383, rs4144132, rs10098370, rs6502244, rs6701879, rs1502885, rs4771280, rs7791660, rs8043884, rs17085111, rs700965, rs7115849, rs1369458, rs7745469, rs952312, rs12962177, rs2059357, rs10501162, rs12601191 and rs696244. These SNPs progressed to the individual-genotyping stage of the study - where they were genotyped in a sample of 5000 TEDS subjects containing only one member of a twin-pair. At the time, only 2356 of these 5000 individuals possessed the relevant data for the original study of mathematics, however the current study of a number of traits is able to make use of the full sample. 41 SNPs were genotyped using the Sequenom MassARRAY iPlex Gold^{®} system (Sequenom, San Diego, USA), and 5 were genotyped using the Applied Biosystems' TaqMan^{®} assay (Applied Biosystems, California, USA). The medium-throughput Sequenom MassARRAY iPlex Gold^{®} system processes 'plexes' of up to 40 SNP-assays simultaneously. Only compatible assays may be combined into a single plex. Because of this, and to economise on cost and man-hours, the 41 SNPs investigated using the Sequenom iPlex Gold^{®} system were coupled with SNP-assays from other studies and spread across three plexes of 26, 33 and 36 SNPs. Individuals calling on fewer than 70% of the SNPs within each 'plex', and also within the TaqMan^{®}-genotyped samples, were re-typed, as were SNPs with a call rate lower than 95%. 73 individuals with persistently low call-rates were removed entirely from the present study, leaving a sampling frame of 4927 individuals. However, on a 'within-plex' basis, 478, 434, 599 and 185 individuals were removed from the analysis of SNPs within the 26-plex, 33-plex, 36-plex and Taqman-genotyped SNPs, respectively. 3 SNPs with persistently low call rates were also removed: rs10501162, rs12601191 and rs696244. The 43 remaining SNPs were in Hardy-Weinberg equilibrium at the p > 0.01 level.

### SNP set

SNP-set scores have been used in a number of studies to aggregate the small effects of groups of SNPs [21, 22, 38, 39]. This is especially useful in samples which may be underpowered to detect the effects of the SNPs when analysed separately. Two SNP-set scores were created for the current analysis. The first combines all 43 genotyped SNPs, and the second combines only the 10 SNP associations which replicated (p < 0.05) in the original study of mathematical ability (rs11225308, rs363449, rs17278234, rs11154532, rs12199332, rs12613365, rs6588923, rs2300052, rs6947045 and rs1215603). Using the direction of association observed in the original pooling stages, genotypes at each SNP were additively coded 0, 1 and 2 - with 0 representing the homozygote genotype associated with lower mathematical ability, 1 representing the heterozygote genotype, and 2 representing the genotype associated with higher mathematical ability. As none of the 43 SNPs are in linkage disequilibrium with one another, the additive genotypic scores should be independent. These genotypic scores were then summed to create a 10-SNP-set score of 0-20, and a 43-SNP-set score of 0-86. For the 10 SNP set, only the 3919 individuals with complete genotyping data were included in analyses. To increase the size and power of the sample to assess the influence of the 43-SNP set, 1024 individuals with at least 40 of the 43 genotypes available were included alongside the 2895 individuals with complete genotype data across all 43 SNPs, with the sample mean substituted for each missing genotype - giving a total of 3929 individuals with the 43-SNP-set score. As none of our SNPs were in linkage disequilibrium, and as estimates based upon mathematics scores or other genotypes may have biased our results, we could not impute missing data. By substituting missing genotypes with the mean genotypes for that SNP within our sample, we were able to analyse the available genotype data on at least 40 SNPs each for 1024 extra individuals, without affecting the mean genotype score of any of our SNPs.

### Measures

The validity and reliability of the composite measures of ability used in the present study, along with the testing procedures involved, have been described in detail previously [see [9, 40]]. All web-based tests were undertaken in the participants' homes. Where internet access was unavailable in the home, participants were encouraged to use a school or library computer.

### Mathematics

#### National Curriculum teacher ratings of mathematics at ages 7, 9 and 10

Mathematical ability was measured by teachers' assessments on UK National Curriculum (NC) criteria for mathematical attainment at ages 7, 9 and 10 [41]. The National Curriculum is a framework used by all government-maintained schools across the UK to ensure that teaching and learning is balanced and consistent. NC-based ratings therefore provide a reliable and uniform measure of mathematical ability across our sample. Teacher assessments have been shown to be valid measures of academic achievement, particularly for mathematics, reading and language [42]. The teachers assessed three aspects of mathematical ability: Using and applying mathematics; Numbers and algebra; and Shapes, space and measures (see [9] for further details). In addition to using these components individually, we created a composite mean score by summing standardized scores for the three ratings.

### Web-based tests of mathematics at ages 10 and 12

The merits of web-based approaches have been well documented and findings appear consistent with traditional methods of data collection [43]. The battery used at ages 10 and 12 in this study included questions from three components of mathematics: 'Understanding Number', 'Computation and Knowledge' and 'Non-Numerical Processes' [12] (see Supplementary Materials for a more detailed description). These components correspond to the UK National Curriculum and thus increase the relevance of the study to education. Battery items were based on the National Foundation for Educational Research 5-14 Mathematics Series, which is linked closely to curriculum requirements in the UK and the English Numeracy Strategy [44]. In addition to analysis of each component, the results across the three components were standardised and then combined to generate a composite score.

### Composite score of mathematics at age 10

A composite mathematics score was generated from teacher-ratings and web-test results at 10 years of age following the same method used in the recent molecular genetic study of the trait [18]. Each measure was standardised to a mean of zero and standard deviation of one. For the 2976 TEDS children with data available the mean of the two measures was then standardised to form the composite score. For an additional 1106 children, only teacher ratings were available and for 942 children only web-based measures were available. To increase power, these children were also included in the original study, with their one available score standardised to a mean of zero and standard deviation of one.

### Reading at age 10

Reading performance was assessed at 10 years using both teacher assessments of achievement based on National Curriculum criteria, and the results of a web-based adaptation of the reading comprehension portion of the Peabody Individual Achievement Test (PIAT) [45] (See [9] for full details). The mean of the two standardised scores was also used as a composite reading measure.

### General cognitive ability at age 10

General cognitive ability was assessed at 10 years using web-based adaptations of two verbal tests - WISC-III Multiple (General Knowledge) choice Information and WISC-III Vocabulary Multiple Choice [46] - and two non-verbal reasoning tests - WISC-III-UK Picture Completion [46] and Raven's Standard Progressive Matrices [47]. Principal components analysis has revealed that the first principal component accounts for 55% of the variance in these measures at age 10 [9]. Thus, where results on all four tests were available, the four standardised scores were summed, and the mean standardised score was used as a composite measure of general cognitive ability.

### Statistical Analyses

All measures were standardised to a mean of 0 and a standard deviation of 1 and corrected for sex and age at time of testing. All analyses involved the estimation of Pearson product-moment correlations in R [48]. These were used to assess i) the phenotypic correlations of each of the cognitive measures with the composite measure of mathematical ability at age 10 ii) the association between each of the cognitive measures and the 10- and 43-SNP sets, and iii) the association between each of the cognitive measures and the three SNPs (included in both the 10- and 43-SNP sets) showing the strongest associations with the mathematics composite measure - rs11225308, rs363449 and rs17278234. In the original genome-wide association study of mathematical ability, the associations of these three SNPs remained after Bonferroni correction for multiple testing. Association was tested under the additive model by additively coding the genotypes of the individual SNPs and the SNP-set scores. One-tailed p-values are reported for all SNP analyses because we expect SNP associations in the same direction as previously observed with the composite measure of mathematics. P-values were also subject to Bonferroni correction for the five tests of association - with the two SNP sets and three individual SNPs - conducted simultaneously for each measure.

### Power

Power was estimated post-hoc using the Genetic Power Calculator [49]. The sample sizes involved in the present study varied greatly across measures. The mean N across all analyses was 2112. A sample of this size has 80% power to detect an association with an effect size of 0.45% (i.e a correlation of 0.067) for a causal QTL of 20% minor allele frequency. The smallest sample of 1431 individuals with data available on both the 10-SNP set and a measure of g at 10 has 80% power to detect an effect size of 0.7% (i.e. a correlation of 0.084). The largest sample of 3891 individuals with available data on both rs17278234 genotype and mathematics ratings at 7 years, had 80% power to detect an effect size of 0.25% (i.e. a correlation of 0.05).