Genome-wide association study of morbid obesity in Han Chinese

Background As obesity is becoming pandemic, morbid obesity (MO), an extreme type of obesity, is an emerging issue worldwide. It is imperative to understand the factors responsible for huge weight gain in certain populations in the modern society. Very few genome-wide association studies (GWAS) have been conducted on MO patients. This study is the first MO-GWAS study in the Han-Chinese population in Asia. Methods We conducted a two-stage GWAS with 1110 MO bariatric patients (body mass index [BMI] ≥ 35 kg/m2) from Min-Sheng General Hospital, Taiwan. The first stage involved 575 patients, and 1729 sex- and age-matched controls from the Taiwan Han Chinese Cell and Genome Bank. In the second stage, another 535 patients from the same hospital were genotyped for 52 single nucleotide polymorphisms (SNPs) discovered in the first stage, and 9145 matched controls from Taiwan Biobank were matched for confirmation analysis. Results The results of the joint analysis for the second stage revealed six top ranking SNPs, including rs8050136 (p-value = 7.80 × 10− 10), rs9939609 (p-value = 1.32 × 10− 9), rs1421085 (p-value = 1.54 × 10− 8), rs9941349 (p-value = 9.05 × 10− 8), rs1121980 (p-value = 7.27 × 10− 7), and rs9937354 (p-value = 6.65 × 10− 7), which were all located in FTO gene. Significant associations were also observed between MO and RBFOX1, RP11-638 L3.1, TMTC1, CBLN4, CSMD3, and ERBB4, respectively, using the Bonferroni correction criteria for 52 SNPs (p < 9.6 × 10− 4). Conclusion The most significantly associated locus of MO in the Han-Chinese population was the well-known FTO gene. These SNPs located in intron 1, may include the leptin receptor modulator. Other significant loci, showing weak associations with MO, also suggested the potential mechanism underlying the disorders with eating behaviors or brain/neural development.


Background
Obesity is a chronic phenomenon of positive energy balance, leading to the long-term and excessive accumulation of body fat. Epidemiological studies have revealed the substantial increase in the risk of Non-Communicable Diseases (NCD) in people with morbid obesity (MO) [1].
The latest evidence indicates the sharp rise in the prevalence of MO worldwide in both men and women [2]. In the US, the prevalence MO has increased by more than four-fold (1.4 to 6.3%) within the last three decades [3]. Notably, the prevalence of MO (body mass index [BMI] ≥ 35 kg/m 2 ) [4,5] in Taiwan has also increased from almost null to 1.3% during the past two decades, as per the data collected by the Nutrition and Health Survey in Taiwan (NAHSIT) from 1993 to 1996 to 2013-2016 [4]. As MO is accompanied with multiple comorbidity [6,7], including shorter life expectancy and higher all-cause mortality rate [7,8] than that in general public, the associated medical cost and social economic burden are tremendous [9]. Lifestyle intervention measures are less efficient for MO cases, and bariatric surgery is expensive and could induce complications [10].
The Global Burden of Disease study has pointed out poor diet (western or super-processed) in combination with physical inactivity/sedentary lifestyles as the main risk factors of non-communicable diseases, including obesity, diabetes [11][12][13][14], and associated cardio-metabolic diseases. However, BMI distribution is very wide, indicative of the differences in individual responses to the same obesogenic environment. It is worthy to investigate the genetic mechanisms underlying the development of extreme cases of obesity [15][16][17][18].
According to twin, family, and adoption studies, the heritability of BMI is estimated to be around 40-70% [19][20][21][22], and approximately 27% of BMI heritability may be attributed to common single nucleotide polymorphism (SNP) in adults [23]. A review on genome-wide association studies (GWAS) has documented at least 741 BMI-or obesity-related SNPs and numerous biological pathways [24]. MO, as the extreme type of obesity, may be highly associated with the common BMI-raising variants [25,26].
Several GWAS have been performed on severe obesity and MO [27][28][29][30][31][32][33]. However, some of these MO-GWAS involved children or adolescents with high BMI percentile values, and all included European populations. Our study is the first MO-GWAS conducted in Chinese population in the Asian region.

Discussion
This is the first MO-GWAS conducted using the Han-Chinese population in Asia. This GWAS, with 1110 MO patients and 10,852 matched controls in Han-Chinese population, established that the top 6 SNPs (rs8050136, rs9939609, rs1421085, rs9941349, rs1121980, and rs9937354) were all located in the most replicable obesity gene: the FTO.
In 2007, the well-known obesity gene, FTO, was first identified in a European ancestry population [34]. Since then, FTO has been replicated and validated in many other ethnic populations, including African [35] and Asian [36] populations. The association between FTO and severe obesity or MO is also reported in the European [37] and Japanese [38] populations. However, the evidence has been very limited for Han-Chinese, the largest population in the world.
In this two-stage GWAS, we found that six SNPs on FTO top all SNPs in association with morbid obesity in Han-Chinese (rs8050136, rs9939609, rs1421085, rs9941349, rs1121980, and rs9937354), especially with the rs8050136 and rs9939609 and rs1421085 reaching p ≤ 5 × 10 − 8 . According to our data and HapMap data, these six SNPs are within the same LD block in the intron 1 of FTO gene (Additional file 1: Figure S1). Of these, rs9941349 was found to be associated with obesity for the first time.
Claussnitzer et al. [47] suggested that rs1421085 may be the causal variant, instead of rs9939609 on FTO gene, as a single nucleotide variant alteration in rs1421085 (Tto-C) may cause disruption from the ARID5B-mediated suppression of IRX3 and IRX5, leading to adipocyte developmental shift from browning (energy expenditure) to whitening (energy storage), and suppression of mitochondrial thermogenesis.
The SNPs rs8050136, rs9937354, rs1421085, and rs1121980, in the first intron of FTO, are located in an enhancer region. Recent studies have indicated that the links between the intronic variance within FTO and body composition are mediated through functional interactions with neighboring genes. The first intron of FTO carries a binding site for the transcription factor CUX1, which modulates the leptin receptor localization within neurons, through the regulation of RPGRIP1L expression. This intron also contains an enhancer sequence that directly binds to the promoter of IRX3 [48,49]. Therefore, the mechanisms underlying the contribution of FTO to the risk of obesity are apparently more complex than expected.
Two significant SNPs of RBFOX1 gene (RNA-binding fox-1 homolog 1) were discovered in this study, rs12925846 and rs17235335. This gene has been associated with several complex diseases, including schizophrenia, autism, mental retardation in epilepsy, attention deficit disorder, and obesity [50]. RBFOX1 is thought to affect adiposity through the hypothalamic melanocortin 4 receptor (MC4R) pathway [51]. Mutations of MC4R are known to cause a monogenic form of obesity in humans [52] via leptin. In the brain, the hypothalamus is known as the control center for satiety/hunger and social defeat. RBFOX1 gene, also known as ataxin-2binding protein 1 gene (A2BP1), could regulate neuronspecific splicing by binding to the pentanucleotide (U) GCAUG sequences upstream of the regulated exon [53]. The involvement of RBFOX1 in obesity development is questionable and warrants further investigation.
One MO-associated SNP, rs2126015, is located on the RP11-638 L3.1 gene, a long noncoding RNA. Previous studies indicated the association of this SNP with neurological disorders such as attention deficit hyperactivity disorder (ADHD), and early-onset recurrent major depressive disorder (MDD) [54]. This gene is also highly expressed in the adipose tissue. lncRNAs are known to play important epigenetic regulatory roles in some important molecular processes, such as gene expression, genetic imprinting, histone modification, chromatin dynamics, and other activities, including formation of specific structures and interactions with all kinds of molecules [55]. The involvement of epigenetic modifications in the development of obesity is becoming increasingly evident [56,57]. Obesity is associated with environmental pollutants (obesogens) [58], gut microbiota [59], and unbalanced food intake, all of which may result in weight gain, and altered metabolic consequences through epigenetic mechanisms. Further studies with a larger sample size are warranted to examine the interactions between genes and environmental factors, particularly dietary factors.
The gene TMTC1 (rs159702) has been associated with heart failure in an African ancestry population [60]. Moreover, the interaction of TMTC1 with abdominal obesity may contribute to phenotypic variation of left ventricular mass (LVM) [61]. However, the mechanism of TMTC1 involvement in MO remains unclear.
The proteins encoded by gene CBLN4 (rs6069477) are involved in the regulation of neurexin signaling during synapse development. Agouti related protein (AGRP)-expressing neurons are a key starvation-sensitive hypothalamic population, activated during energy deficit and increases appetite and weight gain. An animal study has shown that, CBLN4 is downregulated in AGRP neurons after food-deprivation [62]. It is worth further investigating the mechanism between this gene and MO.
The rs16883931 is located in the CSMD3 (CUB and Sushi Multiple Domains 3). This gene is a large protein expressed in the fetal and adult brain and is involved in dendrite development. Mutations of the CSMD3 gene were identified in schizophrenic and autistic patients. However, biochemical properties and functions of the CSMD3 protein remain unknown [63].
Another MO-associated gene ERBB4 (rs29944391) is a member of the EGF receptor family. Genetic studies have indicated a link between ERBB4 and type 2 diabetes, and obesity. Neuroregulin 4 (NRG4), a ligand that specifically binds to ERBB4, has been reported to promote browning of white fat, fuel oxidation, prevention of high-fat diet-induced obesity, and improvement of insulin sensitivity [64].
The SNP rs116917414 was the most significant SNP in the first stage GWAS (p-value = 1.15 × 10 − 12 ). However, this SNP was not included in the second stage owing to the failure in probe design. While we searched for a proxy SNP for rs116917414 using 1000 Genome database, we were unable to detect any SNP in strong LD (r 2 > 0.8) with rs116917414. Hence, we used the nextgeneration sequencing data (N = 1445) collected from the Taiwan Biobank to investigate the association between rs116917414 and BMI. No significant association was found between this SNP and BMI (p GA vs. GG = 0.6, p AA vs. GG = 0.5) (Additional file 1: Table S4), indicating the necessity for a larger sample size to confirm its effects. This SNP resides in the conserved noncoding region close to the RP11-380P13.1 (ENSG00000250137) pseudogene promoter 5′-region. Notably, a study using Framingham data has reported the location of rs2130928 in the RP11-380P13.1 and its association with BMI (p = 0.0012) [65]. As only little is known about the RP11-380P13.1, it is worthy of further research.
A recent GWAS for BMI in the Japanese population identified 85 SNPs [66]. We have investigated the association of these SNPs in our Han-Chinese population. Only six of these SNPs could be replicated in our study population (p < 0.05) (Additional file 1: Table S3), probably owing to the differences in studied traits, designs, and populations, as one involves cross-sectional GWAS with BMI as a quantitative trait in the Japanese general population, and the other was a case-control GWAS study of Chinese MO.
As this is the first large-scale MO-GWAS performed in the Han-Chinese population, the biological mechanisms or pathways known for some of the discovered genes are limited. Validation and mechanistic studies of these discovered genes are crucial. Patients with MO are those at the extreme tail of BMI distribution in population, within the same obesogenic environment. These patients show much higher increase in mean BMI in obesogenic environments, owing to genetic susceptibility [15][16][17][18]. A recent thought on the genetic underspin of the common complex traits is that "genes load the gun, but the environment pulls the trigger [67]." There were no obese individuals during famines, and the prevalence of obesity increased with increase in food supply. The subjects that present with greater genetic susceptibility to obesity are likely to gain more weight or fat in obesogenic environments. Individuals that carry the risk allele of FTO gene tend to have a higher protein [68] and calorie [69] intake. The interaction between genetic risk scores (from known obesity-related variants), and total fried food consumption and physical activity has been reported in NHS, HPFS, and Women's Genome Health Study [70]. Moreover, the behavioral susceptibility theory has also suggested that genes control the response to food cues (smell, sight, and taste), and determine sensitivity to satiety in obesogenic environments [67].

Conclusions
In summary, this is the first study illustrating genetic characteristics of MO in the Han-Chinese population. The most significantly associated locus of MO, in Han-Chinese population, was the well-known FTO gene. These SNPs, located in the intron 1, may include the leptin receptor modulator. In addition, other significant loci, including RBFOX1, RP11-638 L3.1, TMTC1, CBLN4, CSMD3, and ERBB4, showing weak associations with MO, suggested the potential mechanism underlying disorders with altered eating behaviors or brain/neural development, warranting further study on satiety control. Our results highlight the complexity of genetic involvement in the development of MO in humans.

Study design and sample size
We conducted a two-stage GWAS in Taiwan Han-Chinese population of 1110 patients with MO between 19 to 55 years of age. In total, 575 patients were included in the first stage and 535 patients, in the second stage. At the end, we carried out a joint analysis for those SNPs showing significant tendency in the first stage.
The study flow chart is provided in Fig. 2. MO cases, defined by BMI ≥ 35 kg/m 2 [4,5], were recruited from the Minimally Invasive Surgery Center of Min-Sheng General Hospital, Taoyuan city, Taiwan. Patients diagnosed with psychosis, developmental diseases, and cancer were excluded. In western countries, MO is defined as BMI ≥ 40 kg/m 2 . Bariatric surgery is an optional treatment for people with MO that meet the following criteria: BMI ≥ 40 kg/m 2 or between 35 and 40 kg/m 2 and other significant diseases (for example type 2 diabetes or high blood pressure). However, it is generally accepted that the BMI cut-off points for defining obesity should be lower for Asians [71]. In 2011, the Asian Pacific Metabolic and Bariatric Surgery Society suggested that [5] bariatric surgery should be considered as a treatment option for obesity in people with Asian ethnicity when (1) BMI > 35 kg/m 2 with or without co-morbidities, or (2) BMI ranged from 32 to 35 kg/m 2 with co-morbidities. We used the definition of Asian Pacific Metabolic and Bariatric Surgery Society to recruit patients with MO.
For the control groups. In the first discovery stage, 1707age (± 3 years) and sex matched controls (BMI < 35 kg/m 2 ) were included from Han-Chinese Cell and Genome Bank in Taiwan (HanBKT) established from October 1, 2002, to January 14, 2004. The recruitment procedure and data collection have been previously reported [72]. In brief, it aimed to collect representative genetic samples to document genetic diversity in Taiwan Han-Chinese and to serve as controls in disease association studies. In the second confirmatory stage, another independent set of 9145age (± 5 years) and sex matched controls (BMI < 35 kg/m 2 ) was included from Taiwan Biobank (TWB) [73]. Details on the TWB can be found on its official website (https://taiwanview.twbiobank.org. tw/index). Altogether, 10,852 subjects (1110 MO cases and 9742matched controls) were included in the joint GWAS.

DNA extraction and genotyping
DNA from MO cases was extracted from buffy coats of whole blood using the phenol-chloroform method [74]. Genomic DNA of controls collected by HanBKT and TWB were isolated from leukocytes using Puregene® DNA purification kit (Gentra Systems, Minneapolis, MN, USA) [72,73,75] and its quality was assessed from the ratio of absorbance recorded at 260 and 280 nm wavelengths using a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, DE, USA) [72][73][74][75]. Fig. 2 The study flow chart of two-stage GWAS Genotyping was carried out by the National Center for Genome Medicine (NCGM) in IBMS, AS (http://ncgm. sinica.edu.tw/ncgm_02/index.html).
In the first-stage GWAS, Affymetrix Axiom™ Genome-Wide CHB 1 Array (Thermo Fisher Scientific Inc., US) was used as the genotyping platform for both MO cases and controls. The array had 640,674 markers. The quality of genotyping was evaluated by genotype calling rate (CR), minor allele frequency (MAF), and Hardy-Weinberg Equilibrium (HWE). SNPs that failed to pass the quality control (CR < 97%, MAF < 5%, or HWE < 0.001) were excluded. The remaining 562,523 SNPs were used in the first-stage GWAS.
In the second stage, the top SNPs selected from the first stage were validated using an independent sample set, as previously described (535 MO cases and 6242 controls). For MO subjects, the SNPs were genotyped using MassARRAY® iPLEX Gold array from SEQUE-NOM MassARRAY® System. For the TWB controls, SNPs were genotyped by Axiom™ Genome-Wide TWB Array.

Statistics
To search for SNPs associated with MO, logistic regression (dichotomous MO status as outcome) analysis was performed at both stages, and joint analysis was conducted after sex and age adjustment. To adjust for population stratification and batch effects, principle components (PCs) from 1 to 10 derived from the principle component analysis (PCA) were included in the regression model. We adopted an ordinal genotype coding system (number of minor allele: 0, 1, and 2). Haploview software [76] was used to analyze linkage disequilibrium (LD) structure of the identified SNPs. Data were analyzed with PLINK and SAS 9.4 (SAS Inc., NC, USA).
Additional file 1: Table S1. Comparison of the basic characteristics of the MO and controls between the stage 1 and stage 2. Table S2. The top 80 significantly associated SNPs in the first stage of the GWAS. Table  S3. Replication study of the 85 loci associated with BMI in the Japanese population. Table S4. The association between the rs116917414 and BMI.