Participants
We enrolled healthy Chinese college students from Beijing Normal University in Beijing, China, and Southwest University in Chongqing, China. One thousand three hundred ninety-one participants (529 male and 862 female, age = 20.2+/− 1.8 years) completed the behavioral tests. One thousand two of them (401 male and 601 female, age = 20.4+/− 1.9 years) were genotyped. All participants were Han Chinese and reported no history of psychiatric diseases, head injuries, or stroke/seizure. Participants were paid for their participation.
Behavioral measures
Parental warmth (PW)
Parental Warmth and Acceptance Scale [41] measures perceived parental warmth with 11 items, such as “My parents really understand me” and “My parents like me the way I am; they don’t try to ‘make me over’ into someone else”. Participants rated each item on a 6-point scale, 1 = “Disagree strongly” to 6 = “Agree strongly”. The total score of all items was used for analysis.
Executive function (EF)
Following the model of Miyake et al. [1, 21, 42], the current study used 9 EF tasks that were used in one of their original studies [21], with some modification in terms of materials or presentation parameters. These nine tasks measure the three latent components of EF: the keep track, letter 3-back, and spatial 2-back tasks were used to assess updating; the number-letter, color-shape, and category switch tasks were used to assess shifting, and all nine tasks (the above six plus the antisaccade, stop signal, and Stroop tasks) were used to assess the common EF component [1] (Fig. 1). We also used the same outcome index for each task as in Miyake’s model. A brief description of each task appears below.
Keep track
A list of 15 words were presented in the center of the screen one by one for 1.5 s each. The words belong to several categories and were presented in random order. Subjects had to memorize the last presented word of each category and write them down after the presentation. Six categories were used, including animals, colors, countries, distances, metals, and relatives. Twelve word lists were tested, with 4 lists containing words of 2 categories, 4 containing 3 categories, and 4 containing 4 categories. The total number of words correctly written down was used.
Letter 3-back
A sequence of 13 single letters were presented on the screen one by one, shown for 750 ms, followed by a blank screen of 2250 ms. Subjects had to memorize the latest 3 letters and judge if the current letter was the same as the one presented 3 items before. Subjects had to make response within 3 s as accurately and fast as possible. The whole task included 6 sequences. The overall accuracy of the 6 sequences was used. Before the formal test, subjects were given a practice session to familiarize themselves with the task. Practice ended if the subjects achieved an accuracy higher than 0.7 or practiced for 3 sequences, whichever came first, to avoid over-training.
Spatial 2-back
A sequence of 12 squares were presented one by one on the screen at random positions. Each square was presented for 0.5 s with an interval of 1.5 s between 2 squares. Subjects had to memorize the positions of the last two squares and judge if the current square was at the same position as the one 2 items before, and a response was required within 2 s. The whole task included 4 sequences. The overall accuracy was used. Before the formal test, subjects were given a practice session. Practice ended if the subjects achieved an accuracy higher than 0.7 or practiced for 3 sequences, whichever came first, to avoid over-training.
Number-letter
A number-letter pair (e.g. 7G) was shown in a rectangle on the screen. If the pair appeared on the upper side of the screen, participants judged if the number was odd or even by key pressing. If the pair appeared on the lower side, participants judged if the letter was a consonant or a vowel. Subjects had to respond as fast and accurately as possible. The stimuli disappeared immediately after the response. Stimuli were presented in pseudorandom order, resulting in equal numbers of trials with the same judgment task as the trial before it (the repeat condition) or with a different judgment task (the switch condition). Response time of the switch condition minus that of the repeat condition was used.
Color-shape
This task was similar to the number-letter task, but with different stimuli. A red or green circle or triangle was presented in the center of the screen, with a cue on top indicating whether participants should make judgment by color or by shape. Stimuli were also presented in pseudorandom order to ensure an equal number of repeat and switch trials. The response time difference between the two conditions was used.
Category switch
Similar to the number-letter task, a two-character Chinese word (i.e., 钥匙, key) was presented in the center of the screen, with a cue on top it. Participants had to judge according to the cue if the word describes a living or nonliving object, or if it is larger or smaller than a shoe case. Trials were presented in pseudorandom order to ensure an equal number of repeat and switch trials. The response time difference was used.
Antisaccade
A fixation “+” was presented for a duration randomly drawn from nine durations between 1.5 and 3.5 s in 0.25 s interval, followed by a 0.32 cm black square cue presented for 0.15 s on one side of the screen. Then a target of a 0.79 cm arrow within a 1.11 cm square was presented on the other side of the screen for 0.175 s and then masked by a grey square. Subjects had to control their attention not to the cue but to the target to identify the direction of the arrow by key pressing (left, up, right). Subjects practiced on 22 trials to learn the task, followed by 90 test trials. Accuracy of these 90 trials was used.
Stop signal
Participants were asked to press left or right button according to an arrow presented in the center of the screen for 1 s as accurately and quickly as possible (go trials). On 25% of trials, a red circle appeared around the arrow following the presentation of the arrow, participants had to withhold their response (nogo trials). This delay between the onset of the red circle and the arrow was adaptive based on task performance with an aim of 50% success at inhibiting responses during the nogo trials. The task consisted of 4 blocks, each with 64 trials. Stop-signal reaction time (SSRT) was calculated as the median response time of the go trials minus the mean delay of the nogo trials, using only trials with correct responses within the last two blocks.
Stroop
We adopted the classical Stroop task with 4 Chinese color words 红 (red), 绿 (green), 黄 (yellow), and 蓝 (blue). Each word was presented either in the color of the word’s meaning (i.e. the word “red” presented in red) under the congruent condition or in one of the other 3 colors (e.g., the word “red” presented in green) under the incongruent condition. Each condition had 12 trials, resulting in (12 congruent + 12 incongruent)*4 words = 96 trials, which were presented in random order. For each trial, a word was presented in the center of the screen, and participants had to respond to the printed color by pressing one of four keys as fast and accurately as possible. The response time difference between the two conditions was used as an index of inhibition. To make sure participants were familiar with the color-key association, they were first given a practice session, in which a color square was presented at the center of the screen and participants had to press a corresponding key quickly. The practice session ended when participants obtained an accuracy rate higher than 70%.
Genotyping
1-2μg genome DNA (gDNA) was extracted from 250ul blood using Axypre Blood Genomic DNA Kit (Corning Life Sciences cat.no.11313KC3). The concentration of all gDNA was quantified with the Qubit2.0 Fluorometer (Life Technologies, cat. no. Q32866) and the Qubit dsDNA HS Assay Kit (Life Technologies, cat. no. Q32854). Six hundred twenty-nine samples were genotyped on the Infinium Human Omni-Zhonghua-8 chips, 239 samples were genotyped on the Infinium Human Omni2.5–8 exome chips, and 134 were genotyped on Infinium OminiExpress-12 chips (Illumina, San Diego, CA, USA), all according to the manufacturer’s specifications. Genotyping module of Genome studio v3.0 (Illumina, San Diego, CA, USA) was used to call the genotypes based on the fluorescent signal with standard cluster algorithm. Samples with call rate less than 98% (4 samples, with genotype call threshold of 0.15) were re-genotyped thus all passed this threshold in the final dataset. Further data cleaning was performed separately for each kind of chips, using PLINK2 (https://www.cog-genomics.org/plink/1.9/) [43, 44]: SNPs with missing data on more than 5% samples, or HWE p < 1E-6, or MAF < 0.01, were excluded, and subjects missing more than 5% SNPs were discarded too (no subjects were excluded by this threshold).
Autosome genotypes of 3 chips were then imputed separately using Michigan Imputation Server (https://imputationserer.sph.umich.edu/index.html) following their protocol: (1) HRC tools (www.well.ox.ac.uk/~wrayner/tools) were used to check strand and to flip to forward strand when necessary; (2) data were transformed to VCF files and sorted for each chromosome; (3) data were uploaded to the server, and imputed using 1000G Phase 3 EAS population as reference. Imputed data were cleaned using home-made codes, only SNPs with imputation quality r2 > 0.8 and MAF > 0.05 were retained. Then these datasets were merged and cleaned again (MAF > 0.05, HWE > 1E-6), retaining 4,856,474 SNPs. No duplicated or related subjects were identified (maximum PI_HAT = 0.0537, calculated with PLINK2).
To estimate the ancestry of our sample and determine whether we had a potential population stratification problem, we ran a principal component analysis on the 1000 Genomes Project’s data and projected our sample to the first two principal components using EIGENSTRAT software [45]. The 1000 Genomes Project’s phase3 data were downloaded from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/, converted into Plink format, combined with the current dataset. Overlapping SNPs were retained, cleaned with standard criteria (−-geno 0.05 --maf 0.02 --hwe 1e-6), pruned considering the EAS population (plink pruning: --indep-pairwise 200 5 0.25). Ambiguous strand SNPs were removed (AT or CG). A total of 105,831 SNPs remained for this analysis.
To further explore possible biological mechanism of the SNPs that showed the most significant results, we searched the BrainSeq Consortium database (http://eqtl.brainseq.org/phase1/eqtl/) for gene expression information. BrainSeq provides information about the associations between genotypes and RNA sequencing data collected from postmortem DLPFC tissues of 175 schizophrenia patients and 237 controls.
Experimental procedure
The current study is part of a larger project that included extensive measures of executive function, decision making, memory, personality, and wellbeing. Tasks of different domains were interleaved and participants finished all tasks in the same order. It took about 4 h (2 h in the morning and 2 h in the afternoon) to complete all the tasks. To reduce the habituation and fatigue effects, each task was designed to be about 10 min long and if they would like to, participants were allowed to take a break after each task. A mandatory break was enforced after each hour. Blood sample were collected after the morning session and before lunch.
Statistical analysis
Behavioral indices of the EF tasks were calculated. Values outside of 3 standard deviations for each index were treated as missing data. Latent EF components were modeled with IBM SPSS AMOS 22 using a nested factors model [21, 23]. That is, all nine tasks were used to define a common EF component; the keep tract, letter 3-back, and spatial 2-back tasks were used to extract the updating component; and the number-letter, color-shape, and category switch tasks were used to extract the shifting component.
Genome-wide environment interaction analysis was run using Plink2 linear regression, using each EF component as the dependent variable; PW, genotype and their interaction as independent variables; and age, gender, and first two principal components of the genome as co-variates. Interaction p values from Plink2 were inputted to MAGMA [46] for gene-set enrichment analysis. Gene definition was downloaded from the MAGMA website (https://ctg.cncr.nl/software/magma), using the NCBI37.3 version, resulting in 17,287 genes. The sum of –log(p) within a gene was calculated as the gene-level statistics (MAGMA default model). Seventeen thousand seven hundred seventy-nine gene sets from msigdb.v6.0 (http://software.broadinstitute.org/gsea/msigdb/) were used for pathway enrichment analysis. FDR correction was applied to the selection of significant genes and pathways.