CYP19A1 fine-mapping and Mendelian randomization: estradiol is causal for endometrial cancer

Candidate gene studies have reported CYP19A1 variants to be associated with endometrial cancer and with estradiol (E2) concentrations. We analyzed 2937 single nucleotide polymorphisms (SNPs) in 6608 endometrial cancer cases and 37 925 controls and report the first genome wide-significant association between endometrial cancer and a CYP19A1 SNP (rs727479 in intron 2, P=4.8×10−11). SNP rs727479 was also among those most strongly associated with circulating E2 concentrations in 2767 post-menopausal controls (P=7.4×10−8). The observed endometrial cancer odds ratio per rs727479 A-allele (1.15, CI=1.11–1.21) is compatible with that predicted by the observed effect on E2 concentrations (1.09, CI=1.03–1.21), consistent with the hypothesis that endometrial cancer risk is driven by E2. From 28 candidate-causal SNPs, 12 co-located with three putative gene-regulatory elements and their risk alleles associated with higher CYP19A1 expression in bioinformatical analyses. For both phenotypes, the associations with rs727479 were stronger among women with a higher BMI (Pinteraction=0.034 and 0.066 respectively), suggesting a biologically plausible gene-environment interaction.


Introduction
Established risk factors for endometrial cancer include high BMI (IARC 2002), early menarche, late menopause, nulliparity, estrogen-only hormone replacement therapy (HRT) (Beral et al. 2005) and tamoxifen use (Cuzick et al. 2003), while cigarette smoking and the use of oral contraceptives or combined HRT (Beral et al. 2005) are associated with lower risks. It has been hypothesized that these factors alter endometrial cancer risk by increasing exposure to estrogens (Key & Pike 1988); indeed, higher concentrations of circulating estradiol (E 2 ) in postmenopausal women have been associated with an increased risk of endometrial cancer (Zeleniuch-Jacquotte et al. 2001, Lukanova et al. 2004, Allen et al. 2008. After the cessation of ovarian estrogen production at menopause, endogenous estrogens are primarily synthesized from testosterone (T) in adipose tissue via aromatase, encoded by CYP19A1. Candidate gene studies have found levels of E 2 in pre-menopausal and postmenopausal women, and also in men, to be associated with genetic variants within or close to CYP19A1 (Dunning et al. 2004, Paynter et al. 2005, Haiman et al. 2007, Ahn et al. 2009, Eriksson et al. 2009, Kidokoro et al. 2009, Travis et al. 2009, Beckmann et al. 2011, Lundin et al. 2012, Prescott et al. 2012, Flote et al. 2014. Candidate studies have also identified associations between several different CYP19A1 variants and endometrial cancer (Paynter et al. 2005, Tao et al. 2007, Setiawan et al. 2009, Low et al. 2010, with some evidence of stronger associations in women with higher BMI (Setiawan et al. 2009).
None of the published studies have attempted a systematic assessment of all common CYP19A1 variants in order to determine i) which are most likely to be causal for endometrial cancer and/or E 2 concentration, ii) whether multiple independent causal variants exist at this locus for either trait, and iii) whether the same variant or variants are responsible for both traits. The latter would help to address the question as to whether the reported association between E 2 and endometrial cancer seen in epidemiological studies is causal or a consequence of confounding ( Fig. 1). If the association is causal, then variants causally associated with E 2 levels should also be associated with endometrial cancer, with a magnitude that can be predicted using a Mendelian randomization methodology (C-Reactive Protein Coronary Heart Disease Factors potentially involved in the reported association between circulating post-menopausal E 2 and endometrial cancer risk. Question marks highlight the issues to be addressed in this study. Genetics Collaboration et al. 2011, Interleukin-6 Receptor Mendelian Randomisation Analysis Consortium 2012), a form of instrumental variable analysis in which the instrument is a genetic variant(s) known to be associated with the biomarker in a particular direction.
To address the question of whether the same CYP19A1 variant(s) are associated with E 2 levels and endometrial cancer with compatible effect sizes and directions, we used genotype information for 2937 single nucleotide polymorphisms (SNPs) across a 1.2 Mb region encompassing CYP19A1 in 6608 endometrial cancer cases and 37 925 controls of European ancestry, 1733 of whom (all controls) were post-menopausal and had measured E 2 and T concentrations.

Endometrial cancer case-control studies
The association between SNPs at the CYP19A1 locus and endometrial cancer was tested using data from four separate case-control studies: The ANECS, SEARCH, and NSECG genomewide association studies The results presented here are based on the ANECS, SEARCH, and NSECG genomewide association studies (GWAS) and country-matched datasets (McGregor et al. 1999, WTCCC 2007, Houlston et al. 2010, McEvoy et al. 2010, Painter et al. 2011, Spurdle et al. 2011, as shown in Supplementary Table 1A and B, see section on supplementary data given at the end of this article, and described in detail in (Painter et al. 2015). All cases and controls were of European ancestry.
The ECAC iCOGS study The fourth dataset comprised 4402 women of European ancestry with a confirmed diagnosis of endometrial cancer (3535 with confirmed endometrioid histology), recruited via 11 separate studies in seven countries in the Endometrial Cancer Association Consortium (ECAC; Painter et al. 2015) and 28 758 healthy female controls from the same countries, all participating in the Breast Cancer Association Consortium (BCAC; Michailidou et al. 2013) Michailidou et al. 2013, Pharoah et al. 2013, Painter et al. 2015 designed by the Collaborative Oncological Gene-environment Study ('COGS'). The iCOGS array includes 134 SNPs located within the w1.2 Mb region of chromosome 15 between 50 899 000-52 095 000 surrounding CYP19A1, 22 of which had been specifically selected for the study of post-menopausal E 2 levels.
Post-genotyping quality control for all four studies was as described in (Spurdle et al. 2011) and (Painter et al. 2015). Individuals with !85% estimated European ancestry based on Identity-By-State (IBS) scores between study individuals and individuals in HapMap (http:// hapmap.ncbi.nlm.nih.gov/) were excluded.

E 2 datasets
The EPIC Norfolk study Sex-hormone levels, including E 2 and T concentrations, were measured on subsets of the w25 000 participants in the European Prospective Investigation of Cancer-Norfolk cohort study (see Day et al. (1999) for details). After recruitment, participants were invited to attend a first health check (HC1), at which a blood sample was taken. A second blood sample was taken from participants who attended the second health check (HC2), w3 years after the first. For each set of blood samples, a subset were randomly selected for hormone level measurement from among the women who were considered to be post-menopausal based on being O55 years, not having menstruated in the last year, and having not taken HRT for at least 3 months before sampling (Supplementary Table 1C, see section on supplementary data given at the end of this article). The plasma and serum samples collected from these women had been stored at K70 8C until analysis, and their whole blood samples had been stored at K30 8C before DNA extraction. 2368 of the women for whom hormone levels had been measured had also been genotyped using the iCOGS array as BCAC control subjects (and thus were also controls in our iCOGS endometrial cancer analysis). Of these, 1333 women had hormones measured from their HC1 blood sample and 1536 from their HC2 sample, of whom 501 had hormones measured from both samples. Where two measurements existed, we chose the measurement from HC2, when women would be further from the menopause. After excluding women within 2 years of the menopause at blood draw or with missing E 2 levels or E 2 values O300 pmol/l (i.e., outside the possible range for a postmenopausal woman) our analysis was based on 1500 genotyped women with HC2 E 2 levels and 425 women with HC1 levels. Of the HC2 women, 1431 also had a valid T measurement (T was not measured as part of HC1).
Ethical approval was obtained from the Norwich Local Research Ethics Committee, LREC 98CN01. All study participants provided written informed consent.

The SIBS study Participants in the Sisters in Breast
Screening study (SIBS) were identified through the National Health Service breast screening program in the UK (Prescott et al. 2012). A subset of 905 SIBS women who were aged over 55 years at recruitment, 2 or more years since their last menstrual period, and not currently using HRT at the time of blood collection were selected for hormone measurement (Prescott et al. 2012). After excluding women with missing or extreme E 2 levels (as above), 889 women were left, of whom 302 had been genotyped using the iCOGS array as BCAC control subjects (after quality control exclusions), and thus were also controls in our iCOGS endometrial cancer analysis (Supplementary Table 1C, see section on supplementary data given at the end of this article). All participants gave informed written consent. This study was approved by the Eastern Multicentre Research Ethics Committee (SIBS).
For the EPIC and SIBS studies, plasma E 2 concentrations were measured at The Royal Marsden Hospital (London, UK), using an in-house RIA using a highly specific rabbit antiserum which had been raised against an E 2 -6-carboxymethyloxime-BSA conjugate and E 2 -6carboxymethyloxime-[2 K125 I] iodohistamine (Dowsett et al. 1987). The detection limit was 3 pmol/l and values were replaced with this limit when they were reported as being undetectable. For the SIBS study, at a concentration of 25 pmol/l the within assay variation was 6.5% and the between assay variation was 16% (nZ18). For the EPIC studies, at a concentration of 18 pmol/l the within assay and the between-batch coefficients of variation (CV) were 8.6 and 13% respectively. Testosterone was measured in the EPIC and SIBS studies using a solid-phase RIA kit (Diagnostic Products, Gwynedd, UK), with within-and between-batch CV at a concentration of 3.1 nmol/l of 6.1 and 10% respectively and with a detection limit of 0.14 nmol/l.
Additional SIBS replication set To increase statistical power we genotyped all of the 889 SIBS women with E 2 measurements (described above) for rs727479 using a Custom Taqman Assay (Life Technologies, ThermoFisher Scientific, Waltham, MA, USA) according to manufacturer's instructions (details provided in Supplementary Table 5, see section on supplementary data given at the end of this article). After quality control exclusions described below, 813 women had measured E 2 measurements and genotypes for rs727479, of whom 264 also had valid iCOGS genotyping i.e., 549 additional samples.

Regional imputation
We used IMPUTEv2 (Howie et al. 2009) to impute genotypes for SNPs in the 50 899 000-52 095 000 region of chromosome 15 in the 1000 Genomes dataset v3 (April 2012 release). We allowed the IMPUTE Software to select the most appropriate haplotypes from among the complete set of 1000 Genomes haplotypes (Howie et al. 2011). Imputation was conducted separately for the four datasets, and SNPs with imputation information score !0.7 and/or MAF !0.01 in any of the four studies were excluded.

Statistical analysis
The four imputed endometrial cancer datasets were analyzed separately using unconditional logistic regression with a per-allele (1 degree of freedom) model using SNPTEST v2 (Ferreira & Marchini 2011). For the iCOGS dataset, analyses were performed adjusting for country and for the first ten principal components, as in (Painter et al. 2015). The GWAS datasets were each analyzed as a single stratum, with adjustment for the first two (ANECS and NSECG) and three (SEARCH) principal components. Our ongoing genome-wide analyses have shown that the inclusion of these principal components is sufficient to control for population stratification (genomic control lZ1.002-1.038).
The endometrial cancer odds ratios (OR) for the four studies were combined using standard fixed-effects metaanalyses. The I 2 statistic (Higgins & Thompson 2002) was used to estimate the proportion of the variance due to between-study heterogeneity. SNPs with significant between-study heterogeneity (P!0.05) were excluded. Analyses for all SNPs were repeated adjusting for the most significant SNP to assess whether multiple independent causal variants were present. A statistical significance cut-off of P%10 K4 was used for secondary and conditional analyses. To determine the most likely candidate causative SNPs, the log likelihoods of all tested SNPs were compared with that of the top SNP. SNPs with log-likelihood ratios of !1:100 of being the top SNP and which were correlated with the top SNP (linkage disequilibrium (LD) r 2 O0.2) were prioritized as potentially causal variants for follow-up in bioinformatic and functional analyses (Udler et al. 2010).
The analyses were repeated restricting the iCOGS and NSECG studies to those cases with endometrioid or non-endometrioid histology (the ANECS and SEARCH GWAS sample sets contained only endometrioid histology cases).
Associations between SNPs and E 2 concentrations were tested using the natural logarithm transformed ratio of E 2 to T concentrations, adjusting for laboratory batch, study (EPIC HC2 or SIBS), age and BMI at blood draw, prior HRT use (yes or no) and menopausal status (2-5 years since menopause or O5 years since menopause) using ProbABEL software (Aulchenko et al. 2010). Given the family-based design of the SIBS study, we used the matrix of kinship coefficients to adjust for the nonindependence of relatives. This approach is also expected to avoid the effects of population stratification (Chen & Abecasis 2007).
For the analysis including the additional genotyping in the SIBS samples, the data were re-analyzed for all 2767 women, using the sandwich variance estimator to obtain standard errors robust to familial clustering (in the absence of a kinship coefficient matrix for the complete set).
The associations between the most significant SNP and the two phenotypes (endometrial cancer and E 2 concentration) were repeated after stratifying the datasets according to quartiles of age diagnosis (cases) or interview (controls) or quartiles of BMI. These analyses were restricted to the iCOGS dataset (plus the SIBS replication set for the E 2 analysis), as BMI was not available for all cases and controls in the GWAS sets. Since T concentration had not been measured in the EPIC HC1 women, the analyses stratified by BMI and age were based on logtransformed E 2 concentrations uncorrected for T, in order to maximise the sample size and hence the statistical power. Quartiles were based on the variables' distributions in cases, to ensure roughly equivalent statistical power across the quartiles. The same categories were used for the E 2 analysis to allow direct comparisons between the two phenotypes.
We used a Mendelian randomization style approach to compare the observed association of the top SNP with endometrial cancer with that predicted based on a SNP's effect on E 2 levels. For this we re-estimated the effect of each effect allele on E 2 (b) adjusting only for study and laboratory batch. Using a published estimate of the endometrial cancer OR associated with a doubling of post-menopausal E 2 concentration (Lukanova et al. 2004), we multiplied the natural logarithm of this OR by the ratio (lnb/ln2) to obtain a predicted endometrial cancer OR per effect allele. We then compared this predicted OR to that observed, to assess whether the observed association between the SNP and endometrial cancer is compatible with a causal association between higher post-menopausal E 2 concentration and endometrial cancer. In the same way, we compared the predicted effect of the top SNP on breast cancer risk (based on a published estimate of the effect of doubling E 2 concentration on breast cancer risk (Key et al. 2002)) with that observed in the iCOGS BCAC study of 45 290 breast cancer cases and 41 880 controls of European ancestry (Michailidou et al. 2013).
All statistical analyses used R software unless otherwise stated, and all statistical tests were two-sided. The association plot was produced using LocusZoom (Pruim et al. 2010).

Results
The CYP19A1 association with endometrial cancer is explained by a single signal Genetic imputation of the w1.2 Mb region of chromosome 15 between 50 899 000-52 095 000 using the April 2012 release of the 1000 Genomes reference panel in four independent case-control sets yielded post-QC genotype information for 2937 SNPs in 6608 endometrial cancer cases and 37 925 controls (Supplementary Table 1A, see section on supplementary data given at the end of this article). Of these SNPs, 100 had been genotyped in the largest study (iCOGS), and 191, 201, and 187 in the three GWAS sets (SEARCH, ANECS, and NSECG GWASs respectively).
Combining results across the four studies, 171 SNPs had P!1!10 K4 , compared with an expected number of less than one under the null hypothesis (Supplementary  Table 2, see section on supplementary data given at the end of this article). Fifty SNPs were significant at the conventional GWAS threshold of 5!10 K8 , of which rs727479 in intron 2 was the most significantly associated (OR per A alleleZ1.15, CIZ1.11-1.21, PZ4.81!10 K11 , Table 1, Fig. 2A). This SNP was directly genotyped in all four studies, and the strength of the association did not differ among studies (I 2 Z0.0%, P het Z0.92). (Supplementary Table 2, see section on supplementary data given at the end of this article, Supplementary Fig. 1A). Conditioning on rs727479, no other SNPs reached P!10 K4 (Supplementary Table 2).
These results suggest that rs727479, or a SNP correlated with it, is causally related to disease. Based on a likelihood ratio threshold of 1:100 (Udler et al. 2010), 28 SNPs remain as possible causal variants (Supplementary  Table 2, see section on supplementary data given at the end of this article); all are correlated with rs727479 at r 2 O0.2, and five were genotyped in iCOGS (rs7175531, rs727479, rs17601876, rs12050767, and rs749292).
Stronger associations between rs727479 and endometrial cancer in women of older age and higher BMI There was some suggestion of a stronger association between rs727479 and endometrial cancer among older women (ORZ1.28 (1.15-1.44) PZ1.7!10 K5 and ORZ1.24 (1.08-1.42) PZ1.9!10 K3 for the third and fourth quartiles of age respectively; Table 1), although the interaction between rs727479 and age was not significant (PZ0.19).
The set of correlated SNPs most significantly associated with endometrial cancer are all within the set of SNPs most significantly associated with the E 2 :T ratio T is the substrate for aromatization to E 2 , and the ratio of E 2 to T concentrations, in essence, corrects for the variation in T levels. This correction would be expected to lead to a more direct relationship with aromatase activity, hence we used the E 2 :T ratio as the hormonal phenotype in our initial fine-mapping of the CYP19A1 region. Circulating E 2 and T concentrations were measured in 1733 healthy post-menopausal women from the EPIC Norfolk (NZ1431) and SIBS (NZ302) studies (Dunning et al. 2004, Prescott et al. 2012 who formed a subset of the controls in the iCOGS study. Imputation and post-imputation QC identical to that performed in the endometrial cancer analysis resulted in 1956 SNPs across the CYP19A1 region, of which 100 had been genotyped. Adjusting for age, BMI, HRT use and menopausal status, 105 SNPs were associated with the E 2 :T ratio at P!1!10 K4 , including the lead endometrial cancer SNP rs727479 (PZ2.06!10 K7 ). Two imputed SNPs had very slightly smaller P values than rs727479 (rs12592697, PZ1.46!10 K7 and rs4775935, PZ1.89!10 K7 ), both of which were in near-complete LD with rs727479 (r 2 Z0.99). Ninety four SNPs had odds of at least 1:100 compared with rs12592697 of being the causal E 2 :T SNP, and also have r 2 O0.2 with rs12592697 (Supplementary Table 2, see section on supplementary data given at the end of this article). Conditioning on rs12592697, no SNPs have P!1!10 K4 ; hence there is no evidence of a second signal for E 2 :T in this region. The set of 95 SNPs contains all 28 non-excluded endometrial cancer candidate SNPs. Since rs727479 was the most significant of the genotyped SNPs and was statistically almost indistinguishable from the top two SNPs, rs727479 was used as the representative SNP for the set of 95 non-excluded SNPs. The rs727479 A allele was associated with higher E 2 concentration (bZ0.092, PZ3.80!10 K5 ) and, to a lesser extent, with lower T concentration (bZK0.045, PZ0.057).
Including an additional 485 women from the EPIC-Norfolk cohort for whom E 2 but not T concentrations had been measured (and therefore the E 2 :T ratio could not be computed), the association between rs727479 and E 2 became stronger (bZ0.094, PZ3.1!10 K6 ). To further increase the statistical power we genotyped rs727479 in the remaining 549 SIBS samples for whom E 2 concentrations had been measured. In the full set of 2767 women, the association between E 2 concentrations and rs727479 approached the genome-wide significance threshold (bZ0.096, PZ7.4!10 K8 ) ( Table 1).
There was no evidence of a difference in the association between rs727479 and E 2 concentration with age (P interaction Z0.90, Table 1). The rs727479-E2 association was the strongest among women with the highest BMIs, with borderline significant evidence of an interaction (P interaction Z0.066, Table 1, Fig. 3B).

Evidence that higher E 2 concentration is causal for endometrial cancer
Following a Mendelian randomization argument, if elevated E 2 concentration were causally associated with Association of SNPs in the CYP19A1 region with (A) endometrial cancer and (B) E 2 :T, highlighting rs727479. Each point indicates the statistical significance of the association between a SNP and endometrial cancer ( Fig. 2A) or between a SNP and the E 2 :T ratio (Fig. 2B). Squares denote SNPs directly genotyped by the iCOGS array; circles are SNPs for which genotypes were imputed. The larger purple square is rs727479, the SNP with the strongest evidence of association with endometrial cancer. Other colours show the strength of linkage disequilibrium between each SNP with rs7277479.
endometrial cancer (as opposed to an association produced by confounding), then we would expect any SNP which raises E 2 to be proportionally associated with endometrial cancer. We observed an rs727429 per-A-allele increase in adjusted E 2 concentration of 10% (95% CIZ6-14%, from the regression coefficient in Table 1 for log-transformed levels). Lukanova et al. (2004) estimated that the odds ratio for endometrial cancer associated with a doubling of post-menopausal E 2 concentration was 2.06 (CIZ1.47-2.89; it was necessary to use an external estimate because hormone levels had only been measured in control subjects in our study). Based on this published estimate, the predicted per-allele OR for endometrial cancer would be 1.09 (CIZ1.03-1.21), which is consistent with that observed in our study (ORZ1.15, CIZ1.11-1.21) (Fig. 4).

Candidate causal variants may regulate CYP19 expression
Bioinformatic analysis defined three putative regulatory elements (PREs) coincident with 12 of 28 candidate endometrial cancer causal variants prioritized by genetic analysis (Fig. 5). Altered binding of transcription factors was predicted for 10/12 candidates located within PREs, including top candidate rs727479 (Supplementary Table 3, see section on supplementary data given at the end of this article). For four of these (rs8024515, rs7181429, rs28637352, and rs28490942) there was experimental evidence for differential transcription factor (TF) binding in the cell types tested by ENCODE (Fig. 5) and SNPs rs7181429, rs28637352 overlap binding consensus sequences for NFIC and ZBTB7A in Ishikawa endometrial cancer cells (Fig. 5). Expression analysis identified nominal associations (P!0.05) between risk alleles for the 28 candidate causal variants and greater CYP19A1 expression in several tissues, with candidate SNP rs7181429's association with expression in blood passing a Bonferronicorrected significance threshold (PZ6.0!10 K5 ; corrected PZ1.62!10 K4 ; Supplementary Table 4, see section on supplementary data given at the end of this article).

Discussion
We conducted the largest comprehensive genetic study to date of SNPs across the CYP19A1 hormone metabolism -0.10

Figure 3
Association of SNP rs727479 with (A) endometrial cancer and (B) E 2 levels, by quartile of BMI distribution. In Fig. 3A the log(OR) of endometrial cancer associated with each A allele of SNP rs727479 is shown for each quartile of the BMI distribution, adjusting for age. There is a borderline significant interaction between genotype and BMI quartile (PZ0.047). Figure 3B shows the regression coefficient (b) for the association between each A allele of rs727479 and log-transformed E 2 levels (adjusted for laboratory batch, study, age at blood draw, BMI, HRT use and menopausal status), (P interaction Z0.066). For both plots, the error bars are 95% CI, and the quartiles are based on the BMI distribution in endometrial cancer cases, to allow for comparability between plots, and to ensure sufficient cases in each quartile.

Figure 4
The observed and predicted risks of endometrial cancer associated with each rs727479 A allele. The Observed per-A allele OR is that observed in this study of 6608 and 37 925 endometrial cancer cases and controls. The predicted per-A allele OR is estimated based on the observed association between rs727479 and E 2 levels in 2767 healthy post-menopausal women, and on the endometrial cancer OR associated with a doubling of post-menopausal E 2 levels reported by Lukanova et al. (2004).
gene and their associations with both endometrial cancer risk and circulating E 2 concentration. Using genotype information on nearly 3000 SNPs we have, for the first time, identified GWAS-level significant associations between SNPs in this region and endometrial cancer. Our finding that rs727479 is the most significantly associated SNP in this region confirms the findings of a previous candidate-SNP study (Setiawan et al. 2009) and provides a list of 28 SNPs which cannot be excluded as causal on the basis of statistical analyses. We found no evidence for further causal variants outside of this set. For example, rs749292, previously reported as a possible second signal (Setiawan et al. 2009), was not significantly associated with risk in our analysis after conditioning on rs727479, given the number of SNPs included in the analysis (P cond Z0.017). This is the first study to look at the CYP19A1 endometrial cancer association by histology. The most significant risk SNP, rs727479, appears to be more strongly associated with endometrioid histology endometrial tumors than with the rarer and poorer prognosis nonendometrioid cancers. However, the confidence intervals for the two ORs are not incompatible, and there was no significant difference in allele frequencies between the women with endometrioid and non-endometrioid tumors (PZ0.15). Despite the common description of nonendometrioid tumours as 'estrogen independent', recent work has shown that the two subtypes largely share Candidate endometrial risk variants coincide with three PREs. The 28 best candidate causal SNPs map towards the 3 0 end of the CYP19A1 gene. The functional elements displayed were accessed through the UCSC Genome Browser and include: H3K4Me1 and H3K27Ac histone modifications measured by the Encyclopedia of DNA Elements (ENCODE) project in seven cell lines; open chromatin as delineated by DNaseI hypersensitivity sites (HS) in Ishikawa endometrial cancer cells (previously incorrectly named ECC-1)) and 125 other cell types; TF binding in Ishikawa cells and 91 cell lines within ENCODE: 21/28 candidates are predicted to overlap TF binding sites. Roadmap Epigenomics Project chromatin state segmentation of adiposederived mesenchymal stem cells and adipocytes: orange bars represent enhancers and red bars represent regions flanking active transcription start sites. Twelve SNPs, marked by dbSNP rsIDs, are located in PREs: highlighted in blue. PREs were defined by the presence of histone modifications, DHS, TF binding and Roadmap enhancers. common risk factors, including those factors relating to endogenous or exogenous estrogen exposure (Setiawan et al. 2013), consistent with our findings.
We also confirmed at a borderline GWAS significance level (PZ7.4!10 K8 ) the association between rs727479 and E 2 concentration previously reported in post-menopausal women (Haiman et al. 2007, Ahn et al. 2009, Beckmann et al. 2011, Prescott et al. 2012) and in males (Travis et al. 2009). It had been reported that rs749292 and rs727479 may act independently to alter levels (Haiman et al. 2007), but we found no association between rs749292 and E 2 concentration after conditioning on rs727479 (PZ0.20). Our sample set partially overlaps with that included in the GWAS of hormone levels reported by Prescott et al. (2012), (the additional 549 SIBS women genotyped here for rs727479 had all been included in the Prescott et al. GWAS), but a combination of nearly 2000 extra subjects, denser genotyping in and around the CYP19A1 gene and imputation to the 1000 Genomes reference panel allowed us to look in more detail at the region. Our results suggest the existence of a single causal variant in CYP19A1 underlying both E 2 concentration and endometrial cancer, although we cannot exclude the possibility that there are instead multiple causal variants which are in sufficiently strong linkage disequilibrium that they are indistinguishable by epidemiological analysis.
We estimate that rs727479 accounts for 1.1% of the variance in post-menopausal E 2 concentration (in contrast, BMI accounts for 16% of the variance). Given that the estimated heritability of post-menopausal E 2 is around 40% (Varghese et al. 2012) it is clear that further genetic variants that affect E 2 concentration remain to be found.
The predominant source of circulating estrogens in post-menopausal women is adrenal androgens (T), which are converted to estrogens in peripheral adipose tissues, with the final stage of this process requiring aromatase, the enzyme encoded by the CYP19A1 gene. Although E 2 concentrations and endometrial cancer risks are both higher in women with larger BMI regardless of CYP19A1 genotype, there also appears to be a gene-environment interaction such that the associations of the rs727479 A allele with E 2 concentration and also with endometrial cancer risk increase according to BMI, with BMI presumably serving as a proxy for the amount of adipose tissue (Fig. 1). Whole body aromatization is known to be directly associated with BMI and the aromatization rate per cell has been found to increase with increasing age (Cleland et al. 1985). Together these data suggest that the influence of the SNP may be more profound when the aromatization rate is already higher.
Twelve of the 28 candidate causal variants (including top candidate rs727429) lie in PREs. Further, the risk alleles of candidates located in PRE-3 associate with increased CYP19A1 expression, and ENCODE and other data (Eeckhoute et al. 2006, Lee & Maeda 2012 indicate that the NFIC and ZBTB7A TFs may affect PRE-3 repressor activity in endometrial cancer cells. Taken together these lines of evidence indicate that candidate causal variants within PRE3 should have high priority for follow-up studies to test their effects on CYP19A1 promoter activity through long-range chromatin interactions. Using a Mendelian randomization argument with CYP19A1 genotype as the instrumental variable, we have shown that the endometrial cancer OR per A-allele of rs727479 predicted on the basis of the per-allele effect on E 2 (1.09, CIZ1.03-1.21) is in line with the directly observed effect of each A allele on endometrial cancer (ORZ1.15, CIZ1.11-1.21) (Fig. 4). Whereas previous epidemiological studies have observed a positive correlation between E 2 concentration and risk, it has not been possible to distinguish between a causal relationship and one produced by confounding. By exploiting the random allocation of alleles to individuals at conception, Mendelian randomization mimics a randomized control trial, thus removing possible confounding. We have therefore found good evidence that higher post-menopausal E 2 concentrations are indeed a causal risk factor for endometrial cancer, in line with other evidence such as the observed increase in risk associated with estrogen-only HRT but not with estrogenCprogesterone HRT (Beral et al. 2005). Hence lowering E 2 levels has the potential to be a useful strategy for reducing risk. Ideally we would like to be able to test this hypothesis in a prospective study, whereby E 2 levels are measured at baseline for a cohort of healthy post-menopausal women who are then followed up for endometrial cancer incidence. However, such a study would need to be extremely large in order to accrue sufficient cancer cases within a reasonable time frame. It would also be interesting to repeat the study in a non-European setting in order to see whether the results are consistent across populations.
In Mendelian randomization, for a genetic variant to be a suitable instrument, in addition to being associated with the biomarker it must also be i) independent of the unobserved confounders of the biomarker-disease relationship and ii) associated with the disease only via the biomarker. Population stratification is the most obvious way in which condition i) can be violated.
To guard against this we restricted our study to subjects of European ancestry and adjusted for principal components. Condition ii) can be broken by pleiotropy, or similarly if the variant is in LD with a separate disease-associated variant. It is impossible to be certain that this condition has been met, and our finding that the observed association between rs727479 and endometrial cancer is slightly stronger than that predicted according to rs727479's effect on E 2 levels may in part be due to the SNP additionally acting on endometrial cancer risk via a pathway not involving circulating E 2 . A Mendelian randomization using an instrument consisting of multiple independent E 2 -associated SNPs, as and when they are reported (e.g., via larger GWAS) is one way to minimize the potential impact of pleiotropy on the results. We are confident that the results are not due to reverse causation, since E 2 measurements were all carried out in women from the control arm of the endometrial cancer study.
Despite higher endogenous E 2 concentration also being a known risk factor for breast cancer (Key et al. 2002), candidate SNP studies of CYP19A1 have not reported an association with breast cancer (Haiman et al. 2007). Based on the 45 290 European-ancestry breast cancer cases and 41 880 controls from the BCAC iCOGS study, none of the 171 CYP19A1 locus SNPs with P!10 K4 for endometrial cancer were associated with breast cancer (Michailidou et al. 2013) (minimum PZ0.0033, data not shown). This may in part be because E 2 concentration is less strongly related to breast than to endometrial cancer; a doubling of E 2 concentration has been reported to be associated with an OR of 1.29 (CIZ1.15-1.44) for breast cancer (Key et al. 2002), from which we would predict a breast cancer OR of 1.03 (1.01-1.07) per A allele of rs727479. This predicted effect size is consistent with that observed for breast cancer in BCAC (ORZ1.02 (1.00-1.04); PZ0.10), but the effect size is too small to be confidently detected, even in a breast cancer study of this size.
In conclusion, we have confirmed at a genome-widelevel of significance the association between endometrial cancer and variants within the CYP19A1 gene, and shown that all of the reported associations can be explained by a single risk peak. We have also provided evidence that the same set of variants is associated with higher E 2 concentration in post-menopausal women, supporting a causal role for E 2 in endometrial cancer. For both traits, the SNP associations were stronger in women with a higher BMI, suggesting a biologically plausible gene-environment interaction.

Supplementary data
This is linked to the online version of the paper at http://dx.doi.org/10.1530/ ERC-15-0386.

Declaration of interest
The authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported.  and J N Painter co-ordinated the endometrial cancer iCOGS genotyping, and associated data management. J Dennis, K Michailidou and J P Tyrer co-ordinated quality control and data cleaning for the iCOGS control datasets, and K Michailidou provided quality control for the SEARCH GWAS control set. M K Bolla, Q Wang, M Shah, and R Luben were responsible for data management. T A O'Mara and A B Spurdle co-ordinated the ANECS GWAS genotyping; A M Dunning co-ordinated the SEARCH GWAS genotyping; I Tomlinson co-ordinated the NSECG GWAS genotyping. E Folkerd, D Doody, and M Dü rst carried out the hormone measurements. The remaining authors were involved in the co-ordination and/or extraction of phenotypic information for contributing studies. All authors provided critical review of the manuscript.