|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Oncological Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah 84112, USA; 2 Agilent Technologies, Santa Clara, California 95051, USA
| Abstract |
|---|
|
|
|---|
[Keywords: ETS; transcription; gene families; cooperative binding; promoter specificity; ChIP–chip]
Received April 16, 2007; revised version accepted June 14, 2007.
Large gene families in mammalian genomes that encode transcription factors with highly related DNA-binding properties (e.g., ETS, GATA, HOX, or FOX proteins, and nuclear hormone receptors) (Messina et al. 2004
) present a further challenge. The dilemma is how transcription factors with overlapping DNA sequence preferences direct distinct transcriptional responses in vivo. The problem itself is poorly characterized because many binding sites in promoters and enhancers have been assayed only for an arbitrary subset of family members. Furthermore, no mammalian ChIP–chip experiments have directly addressed this family conundrum. Thus, it is unresolved whether extensive genetic redundancy is a characteristic of these families or whether robust mechanisms operate that drive specificity.
The ETS family of transcription factors provides an excellent system to pursue these questions due to the extensive knowledge of the biological roles of ETS genes and the biochemical properties of ETS proteins (Sharrocks 2001
; Oikawa and Yamada 2003
). The family is defined by the conserved DNA-binding domain, termed the ETS domain, which bears a winged helix–turn–helix protein fold. Phylogenetic analysis of the 27 human ETS domains identifies subfamilies of more highly related members, termed clades (Fig. 1A). The DNA-binding properties of ETS proteins from all clades are remarkably similar due to the high conservation of amino acids within the ETS domain that are critical for DNA interaction. For example, in vitro site-selection studies performed on 10 ETS proteins each report preference for an invariant GGA core. In addition, five flanking positions also show conservation among these family members (Fig. 1B).
|
For individual ETS proteins, distinct functional domains that lie outside of the ETS domain could facilitate specificity. For example, one mechanism to enhance DNA-binding specificity is protein–protein interactions that mediate cooperative binding at distinct DNA sequences. The ETS family has a few examples of this phenomenon. The TCF clade (ELK1, SAP1, NET) functions with the DNA-binding factor, SRF, via a protein interaction domain (Price et al. 1995
; Buchwalter et al. 2004
). GABP
partners with GABP
, which mediates dimerization and formation of a GABP
/
hetero-tetramer that binds two ETS sites (de la Brousse et al. 1994
). High-resolution molecular models of these complexes are available (Batchelor et al. 1998
; Hassler and Richmond 2001
; Mo et al. 2001
); however, other partnerships are less well understood. For example, ETS1 could function with as many as nine different transcription factors (Li et al. 2000
). Only RUNX1 (also known as AML1, CBF
2, PEBP2) has been demonstrated to mediate DNA-binding cooperativity with ETS1 and, thus, potentially enhance specificity (Goetz et al. 2000
; Gu et al. 2000
). None of the potential ETS protein partnerships have been assayed by ChIP or shown to limit in vivo occupancy of other ETS proteins. Thus, the in vivo use of protein partnerships or any other specificity mechanism remains poorly characterized.
The unique biological function of ETS proteins predicts the selection of specific transcriptional targets. However, few target genes are linked definitively to individual ETS family members. Most of the >200 putative target genes for ETS proteins have been queried only by transcription effects that required overexpression in cell lines or by in vitro DNA binding, techniques that fail to identify the ETS protein(s) utilized in vivo (Sementchenko and Watson 2000
). Furthermore, no genome-wide occupancy of an ETS protein has been reported.
Determining the genomic occupancy of ETS proteins by ChIP will provide an unprecedented view of in vivo DNA-binding specificity within a transcription factor family and allow us to test mechanisms regulating ETS protein targeting. By investigating the endogenous ETS proteins ETS1, ELF1, and GABP
in the Jurkat human T-cell line, we discovered that these divergent family members frequently occupied the same genomic regions. This redundant occupancy correlated with a match to a strong consensus DNA-binding site and proximity to the transcriptional start site (TSS). Specific binding of ETS1 was also detected, but did not correlate with a strong match to a consensus site. A subset of ETS1-binding events correlated with an ETS–RUNX composite site that differed dramatically from ETS1 or RUNX1 consensus sites. The finding of two classes of ETS1 targets suggests a versatility of the ETS family, overlapping and specific DNA-binding modes that are mediated through distinct sequence motifs.
| Results |
|---|
|
|
|---|
Three genes reported to be regulated by ETS1 illustrate the need for more robust in vivo approaches and attention to family issues. The T-cell receptor (TCR)
and
enhancers have been characterized in vitro as sites of cooperative binding between ETS1 and RUNX1. However, in vivo specificity is not clear, as transient expression assays indicate that multiple ETS proteins can activate via this binding site (Sun et al. 1995
), and no ChIP has been reported. In contrast, the promoter of the protein kinase encoding gene, CDC2L2, is implicated as an ETS1 target by ChIP, but no tests for specificity were performed (Feng et al. 2004
). To test ETS protein in vivo specificity, we investigated occupancy of the CDC2L2 promoter and TCR
and TCR
enhancers by four distantly related ETS proteins—ETS1, GABP
, ELF1, and ELK1—in Jurkat T cells (Fig. 1A). (Based on steady-state mRNA levels, these ETS genes rank first, second, ninth, and 11th, respectively, of 17 ETS genes that are expressed in Jurkat T cells [Hollenhorst et al. 2004
]). ETS1, ELF1, and GABP
, but not ELK1, redundantly occupied the CDC2L2 promoter, whereas ETS1 specifically occupied the TCR
and TCR
enhancers (Fig. 2A). RUNX1 also occupied the TCR
and TCR
enhancers, supporting a role for RUNX1 in ETS1 specificity. These initial ChIP experiments detected the anticipated specific mode of binding for ETS proteins, but also found a surprising redundant mode.
|
To ascertain the biological significance of redundant occupancy and the relative importance of RUNX1 in specificity, we performed genome-wide promoter ChIP. The relative levels of specific and redundant binding of ETS proteins were assessed by a promoter microarray hybridized with ChIP DNA from the Jurkat human T-cell line. The promoter microarray represented the region from 5000 base pairs (bp) upstream of to 2000 bp downstream from the TSS of
17,000 human genes with 60-mer oligonucleotides at an average spacing of 200 bp. Promoters were scored as "bound" by statistical methods that considered the enrichment of multiple neighboring probes and consistent occupancy in experimental repetitions. Promoters occupied by ETS1, ELF1, or GABP
were frequently bound by one or more of the other ETS proteins (Fig. 2B). A second, independent set of ChIP–chip experiments, which was performed with a second promoter microarray that covered only regions within 1000 bp of the TSS, also indicated a very strong correlation between ETS1 and ELF1 occupancy (Supplementary Fig. S1).
This extensive overlap in potential targets was unexpected, and therefore we considered several possible nonbiological explanations. The overlap was not due to cross-reactivity of antibodies because the epitopes had no sequence similarity, and immunoprecipitation controls (Supplementary Fig. S2) as well as ChIP experiments (Fig. 2A) showed specificity. We considered a possible bias toward these genomic regions in the microarray design. ChIP–chip of E2F4, a transcription factor that does not belong to the ETS family, served as a negative control; a set of targets distinct from those bound by ETS1, ELF1, and GABP
, but similar to E2F targets in other cell types, was identified (Table 1; Supplementary Fig. S1; Boyer et al. 2005
). Another concern was the sensitivity necessary to detect specific sites. Quantitative PCR detected ETS1-specific occupancy at the TCR
and TCR
enhancers (Fig. 2A), but these sites were not near the TSS and, thus, were not on the promoter microarrays. To use this positive control we designed a third microarray that covered 20-kb regions surrounding these enhancers. ETS1-specific binding regions were detected and correlated with the known enhancers (data not shown). These controls indicated that the overlapping ChIP enrichments at ETS1, ELF1, and GABP
target promoters represent an accurate picture of genome-wide occupancy.
|
We postulated that the redundant and specific classes of target genes may have different biological functions and that distinct mechanisms would dictate ETS protein recruitment to each class. A more in-depth comparison of redundant and specific binding regions required a data set of each binding class that minimized false-positive results (albeit at the cost of increasing false-negative results). Therefore, data sets of segments bound by ETS1 and ELF1, ETS1 but not ELF1, or ELF1 but not ETS1 were created (Fig. 3A). ChIP and quantitative PCR with primers specific for each candidate segment showed 88% or greater concurrence with ChIP–chip data, thus validating ETS1 and ELF1 dual-bound as well as ETS1-specific data sets (Fig. 3B,C). Tests for occupancy by the ETS protein ELK1 yielded negative results. The ELF1-specific data set was less reliable (Fig. 3D). Thus, the ETS1 and ELF1 dual-bound and ETS1-specific data sets were used for further analyses.
|
(Fig. 1B). Greater than 70% of dual-bound segments had a sequence represented within this PWM. MEME did not identify any significantly overrepresented sequences in either the ETS1-specific data set or in any of 10 data sets randomly selected from the list of interrogated promoter regions (data not shown). To ensure that the distinction between dual and specific data sets was not due to the size of the data sets, randomly selected subsets of the dual-bound data set, similar in size to the specific data set, were tested; these smaller subsets also returned an ETS-like consensus sequence (data not shown). In conclusion, redundant binding by ETS proteins correlated with the presence of a strong consensus ETS-binding site.
|
The finding of a nondiscriminating consensus sequence led us to investigate other sequence features that might accompany redundant occupancy. To test whether there was a bias in the location of dual versus specific bound segments, the distance of each segment (measured from the highest-scoring oligonucleotide probe) to the TSS was determined for the ETS1 and ELF1 dual-bound and ETS1-specific data sets. Dual-bound segments clustered very strongly to a region within 200 bp of the TSS (Fig. 4B). (More detailed spacing conclusions are challenged by the limits of ChIP–chip resolution and TSS annotation.) Segments bound specifically by ETS1 showed significantly less constraint on their location and frequently appeared more distally (Fig. 4B) (t-test of the mean distance; P = 0.0002). In an independent approach to detect potential location bias, we measured the distance between the TSS and the best PMW matches from the PASTER analysis. A subset of the randomly selected genes had their strongest PWM matches in regions proximal to the TSS (Fig. 4C). However, the dual-bound genes were significantly enriched for this type of promoter, as indicated by significant difference between the mean distances (t-test; P < 0.0001). Notably, the mean distance of the ETS1-specific genes were not significantly different from that of the random genes (Fig. 4C). Only five ETS1-specific promoters (4%) had a perfect match to the PWM consensus within 200 bp of the TSS compared with 87 (14%) of the dual-bound promoters. In conclusion, two sequence properties correlated with redundant occupancy by ETS transcription factors—the presence of a consensus ETS-binding site and the tendency of this site to be located proximal to the TSS.
Housekeeping genes have redundantly occupied promoters
To investigate the biological role of the nonselective ETS binding at strong proximal ETS-binding sites, we asked whether the dual-bound gene set represented a specific biological pathway. Overrepresented ontologies of the genes near the ETS1 and ELF1 dual-bound promoters were queried by GOstat (Beissbarth and Speed 2004
). Housekeeping categories (e.g., RNA processing, ribosomal proteins, and cellular metabolism) had significant enrichment scores (Table 1). Random gene lists of similar size did not return any significant overrepresented categories. Additional informatics analyses supported this correlation between dual occupancy and housekeeping function. Eighty-five percent of the ETS1 and ELF1 dual-bound regions overlapped with CpG islands, a sequence feature consistent with the promoters of housekeeping genes (Bird 1986
). In contrast, data sets built from promoter regions with matched GC content that were randomly selected from the extended promoter array regions only displayed an average of 44% overlap with CpG islands (P < 0.01). Next, we queried our data set against a human gene set annotated for sequence characteristics of housekeeping genes (De Ferrari and Aitken 2006
). The promoters of 16% of all genes surveyed had evidence of ETS1 and ELF1 dual occupancy (see Materials and Methods), whereas this proportion was 52% and 4% among genes classified as "housekeeping" and "nonhousekeeping," respectively. Thus, three methods of classifying housekeeping genes each indicated an enrichment of redundant ETS occupancy at housekeeping promoters.
Co-occupancy of housekeeping promoters indicated a possible redundant function of ETS proteins at these promoters. We predicted that co-occupancy would not be cell-type specific, although different ETS protein combinations may be present in different cell types. To test this hypothesis, HT29 colon adenocarcinoma cell lines were used for ETS1 ChIP–chip. Promoters occupied by ETS1 were again overrepresented for housekeeping categories (Table 1). Therefore, ETS transcription factors appear to have a redundant role at the promoters of housekeeping genes, possibly in multiple cell types.
ETS1 and RUNX1 occupy promoters with a composite ETS1–RUNX-binding site
Specific occupancy of ETS1 in Jurkat T cells could be mediated through cooperative interactions only with RUNX1 or through a variety of cooperative partners. To differentiate between these two possibilities, a ChIP–chip experiment was performed with an antibody specific for RUNX1; 576 RUNX1-bound promoters were identified (Fig. 2C). However, only 36 of the 641 promoters bound by ETS1, but not ELF1, were also occupied by RUNX1. Therefore, cooperative interactions with RUNX1 likely represent one of a number of mechanisms that can mediate ETS1 specificity in Jurkat T cells.
Although RUNX1 occupancy could not explain the majority of the ETS1-specific binding, eight of the promoters with the strongest ETS1-binding signals also had strong RUNX1 signals (Fig. 5A, circled). In ChIP coupled with quantitative PCR, some of these segments showed weak binding signals for ELF1 and GABP
, but in every case the ETS1 antibody gave a strikingly higher signal (cf. Figs. 3B and 5B), indicating that these segments strongly favored ETS1 binding.
|
and TCR
enhancers (Gottschalk and Leiden 1990
Based on the poor fit of this ETS1–RUNX1 composite site to consensus sites, we hypothesized that in vivo occupancy might require cooperative DNA binding between ETS1 and RUNX1. DNA-binding assays were performed to determine the relative affinity of ETS1 in the presence and absence of RUNX1 (Fig. 6). ETS1 bound weakly alone with an affinity 10- to 100-fold lower than that of a consensus site (Goetz et al. 2000
). The affinity increased more than fivefold in the presence of RUNX1. Detection of ETS1 binding to GGAG sites required RUNX1. In conclusion, the ETS1–RUNX1 composite sites were marked by poor matches to ETS consensus sites and displayed low affinity that was improved by cooperative DNA binding. Remarkably, these extremely low-affinity sites displayed strong ChIP–chip signals (Fig. 5A) and extremely strong quantitative ChIP signals with gene-specific primers (Fig. 5B). We speculate that cooperative interactions can mediate extremely stable in vivo occupancy comparable to that of consensus ETS-binding sites.
|
enhancer (Travis et al. 1991| Discussion |
|---|
|
|
|---|
A redundant role for ETS proteins
Our survey of the in vivo occupancy of four ETS proteins from four different clades found that three of these proteins often occupy the same promoter regions. There are two possible models to explain the detection of multiple ETS proteins. This co-occupancy could represent a separate binding site for each factor or alternate occupancy of the same site in different cells, on different alleles, or at different times. Because the regions occupied redundantly by ETS family members correlated with a strong match to the consensus sites of multiple ETS proteins, we propose that the same binding site is bound alternatively by different ETS transcription factors.
Bioinformatics studies have identified hundreds of sequence motifs that are overrepresented in human promoters and ETS-like binding sites are always present on these lists (Bina et al. 2004
; FitzGerald et al. 2004
; Xie et al. 2005
). Our results indicate that some of these sequence motifs are likely to be occupied in vivo by multiple ETS proteins. The ETS protein ELK1 did not co-occupy promoters with ETS1, ELF1, and GABP
. This is consistent with in vitro data suggesting that ELK1 has low affinity for a monomeric ETS site, but requires SRF and an adjoining SRF site for high-affinity binding (Price et al. 1995
). Additional ChIP data will be required to uncover how many of the 23 remaining ETS transcription factors participate in redundant occupancy of strong ETS-binding sites.
The observation that 5%–15% of the 17,000 promoters are occupied by multiple ETS transcription factors suggests biological importance. In considering the potential significance we noted that many of the redundantly bound regions were associated with housekeeping genes. These findings are consistent with a bioinformatics report of a collection of ETS-type sequences as one of three sequence motifs found in proximal promoters of housekeeping genes (FitzGerald et al. 2004
). Our discovery of redundant binding of ETS transcription factors at these genes suggests that this mode of binding could facilitate consistent regulation of ubiquitously expressed housekeeping genes. In this model, sustained expression would be independent of the varying levels or identity of individual ETS proteins in distinct cell types.
More intriguingly, we speculate that the redundantly occupied regions identified in our study are targets for oncogenic ETS proteins. Preliminary support for this hypothesis comes from recent observations in human prostate cancer. It is now proposed that more than half of all cases correlate with a chromosomal rearrangement that leads to overexpression of one of three ETS proteins (Tomlins et al. 2005
, 2006
). A gene set expressed at higher levels in prostatic intraepithelial neoplasia (PIN) than in normal prostate tissue have two features that are similar to our dually occupied data set. An ETS-binding site is the most enriched site from the TRANSFAC database in the promoters of PIN-specific genes. By ontology analysis, genes involved in protein biosynthesis, including those encoding ribosomal proteins, are up-regulated in PINs (Tomlins et al. 2007
). We speculate that some housekeeping genes, specifically those redundantly regulated by ETS proteins, could be a class of misregulated targets relevant to tumor progression.
Specific binding and protein partnerships
Specific ETS1 occupancy was uncovered by the ChIP–chip experiment, although this class of targets was less frequent than the dual-bound targets. Furthermore, unbiased searching for enriched sequences did not find a PWM closely related to the in vitro-derived PWM for ETS1 or the PWM derived from dual-bound targets. We propose that a high-affinity ETS1-binding site would preclude specific occupancy due to its concurrent high affinity for other ETS proteins. We speculated that ETS1 achieves sufficient affinity for specific sites only by DNA-binding cooperativity with additional transcription factor(s). Indeed, intersecting ETS1 and RUNX1 ChIP–chip data sets facilitated an informed, yet unbiased, search that discovered a composite site resembling both ETS- and RUNX1-binding sites, but not matching either consensus. As additional ChIP–chip data become available, a similar strategy may identify sequences important for the remaining ETS1-specific sites and for specificity of other ETS proteins. ELK1-specific occupancy of the EGR1 promoter provides an exception to the trend of weak ETS sites for specific binding. This promoter has a strong ETS consensus juxtaposed to an SRF-binding site. This exception may have evolved because constitutive ELK1/SRF occupancy can occlude occupancy by other ETS proteins. Whether ELK1-specific binding generally occurs at weak or strong ETS sites awaits genome-wide occupancy data. In summary, the characterization of an ETS1/RUNX1 cooperative partnership by genome-wide occupancy data provides global evidence for combinatorial control of gene expression within the ETS family.
Limitations of undirected bioinformatics approaches
Mapping genome-wide transcription factor binding is envisioned to enable prediction of gene regulatory pathways. Bioinformatics approaches have attempted to use factor-binding predictions that are based on experimentally derived PWMs that favor the strongest binding sites in vitro (Aerts et al. 2003
; Blanchette et al. 2006
). Our observation that strong binding sites do not correlate with specificity challenges the accuracy of these in vitro-based approaches, especially for gene families. Other ChIP–chip assays have revealed that many transcription factors bind to regions of the genome that contain no strong matches to the consensus. Our data show how this phenomenon is related to specificity requirements for that particular transcription factor. Thus, bioinformatics approaches that can successfully identify transcription factor-binding sites in silico will likely require the integration of rules for cooperative DNA-binding partnerships.
General rules for gene families
An interesting question is whether these features of the ETS family are predictive of other transcription factor families. There are >20 identified families whose members have conserved DNA-binding domains and common binding properties (Messina et al. 2004
). The only mammalian transcription factor family with more than one member assayed by ChIP–chip is the six-member E2F family (Ren et al. 2002
; Weinmann et al. 2002
; Oberley et al. 2003
; Wells et al. 2003
). The microarrays available at the time of these studies only assayed a small subset of promoters, and none of these studies identified differences between redundantly and specifically occupied regions. Thus, our study expanded the use of genome-wide occupancy techniques to survey a larger mammalian transcription factor family and revealed characteristics of specifically and redundantly occupied loci. We expect our discovery of extremely weak sites, as a feature of the ETS1–RUNX1 interaction, will be generally applicable to many other DNA-binding partnerships. On the other hand, the use of the ETS family in a redundant manner at proximal promoters may represent a biologically important feature unique to this family.
In conclusion, an in vivo genomic occupancy approach demonstrated both redundant and specific roles for ETS transcription factors. Interchangeable occupancy of diverse members of the family at proximal promoters suggests an unexpected overlapping function for proteins that share no similarity outside of their DNA-binding domains. Furthermore, this occupancy uncovers a strategy that could mediate stable expression of housekeeping genes by making them relatively resistant to changes in transcription factor concentration and regulatory modifications. In contrast, weaker binding sites provide opportunities to enhance affinity and add specificity for biological regulation. In conclusion, the distinct promoter occupancy patterns of ETS proteins demonstrate the versatile use of a transcription factor family.
| Materials and methods |
|---|
|
|
|---|
Two promoter microarrays were used in our studies. The promoter microarray (Agilent Technologies, G4481A), which was used only for data in Supplementary Figure S1, consisted of two slides with a combined 88,000 60-mer oligonucleotide probes representing sequences from –1 kb to +0.3 kb relative to the TSS of
17,000 best-defined human transcripts from University of California at Santa Cruz hg17/NCBI release 35 (May 2004). An average promoter region was represented by four to five probes. A second promoter microarray (Agilent Technologies, G4489A) consisted of two slides with a combined 488,000 60-mer oligonucleotides representing sequences from –5 kb to +2 kb relative to the same TSS as in the proximal microarray. A custom microarray was manufactured by Agilent Technologies to represent the region from 10 kb upstream of to 10 kb downstream from the previously identified TCR
and TCR
enhancers using probes from the Agilent genomic tiling database. The average spacing between probes within these regions was
100 bp.
Cell culture
Cell lines were grown using standard tissue culture techniques. Jurkat cells were maintained in RPMI medium (GIBCO) plus 10% fetal bovine serum (FBS), 2 mM L-Glutamine, 10 mM Na Pyruvate, and 10 mM Hepes. HT29 cells were maintained in McCoys 5A medium (GIBCO) plus 10% FBS and 2 mM L-Glutamine.
ChIP
Dynabeads (50 µL) conjugated to sheep anti-rabbit IgG (Dynal Biotechnology) were mixed with 1 mL of Dilution Buffer (20 mM Tris at pH 7.9, 2 mM EDTA, 150 mM NaCl, 1% Triton X-100, 4 mg/mL bovine serum albumin [BSA], mammalian protease inhibitors [Sigma, #P8340]) and rotated for 10 min at 4°C. Next, 5 µL of polyclonal rabbit antibody (ETS1, sc-350; ELF1, sc-631; GABP
, sc-22810; ELK1, sc-355; and E2F4, sc-1082; Santa Cruz Biotechnology) or monoclonal mouse antibody (RUNX1,
3.2.3.1; gift of Dr. Nancy Speck [Dartmouth Medical School, Hanover, NH]) was added and the slurry was rotated overnight at 4°C. Cross-linked and sheared chromatin extracts were prepared as described previously (Hollenhorst et al. 2004
). The extract (100 µL) was added to Dynabead/antibody slurry and rotated for 4 h at 4°C. Beads were washed four times for 5 min with immunoprecipitation wash buffer (20 mM Tris at pH 7.9, 2 mM EDTA, 250 mM NaCl, 0.25% NP-40, 0.05% SDS). Beads were resuspended in 100 µL of 10 mM Tris (pH 8.0) and 100 µg/mL RNase A and incubated for 30 min at 37°C. SDS concentration was brought to 1% and Proteinase K was added to 200 µg/mL. Bead slurries were then incubated for 3 h at 55°C and 6 h at 65°C. ChIP DNA was purified from slurries by phenol/chloroform extraction and QiaQuick PCR Purification Kit (Qiagen).
ChIP DNA amplification, labeling, hybridization, and scanning
ChIP DNA was amplified by a whole-genome amplification kit (WGA2, Sigma) according to the manufacturers instructions, except that the fragmentation step was skipped and the number of PCR cycles was increased to 20. Alternate random primed and linker-mediated PCR amplification protocols were compared with the WGA2 kit on the proximal promoter microarray and gave similar results for ETS1 occupancy (data not shown). Amplified DNA was treated with a QiaQuick PCR purification kit (Qiagen), then labeled, hybridized to the Agilent microarrays, and washed as previously described (Boyer et al. 2005
). Hybridized microarrays were scanned using an Agilent G2565BA microarray scanner and raw image files were processed with Agilent Feature Extraction software (version 8.5). Two replicates of each ChIP–chip experiment from independent cell cultures were performed, with the exception of three repetitions of Jurkat ETS1 on the proximal promoter microarray and one repetition of HT29 ETS1 on the proximal promoter microarray. The data discussed in this publication have been deposited in NCBIs Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo) and are accessible through GEO Series accession number GSE7449
[NCBI GEO]
.
Classification of binding events
Data were analyzed with ChIP Analytics (version 1.3, Agilent Technologies) with the Whitehead Error Model. Normalization consisted of subtraction of median signals from negative control features, interarray median normalization, and dye-bias median normalization. A weighted average was used for replicates. A value, X, was calculated for each probe; this value correlates with a log ratio, but includes a correction for low intensities (for equations, see ChIP Analytics 1.3 users guide). A P value for each probe, P(X), represented the probability of observing an X value as high as or higher than its own, given the normal distribution of the mean and standard deviation. To incorporate the data of neighboring probes (peaks), a value X was calculated as the mean of the X value from that probe and the neighboring probe on either side. If the neighboring probe is >1 kb away, a value of 0 was substituted for X. A P value for X, P(X), was calculated just as for P(X).
For the data shown in Figures 2 and 5A and Table 1, a "bound" promoter designation required that one or more probes within a promoter region have a P(X) of <0.001 as determined by the "gene report" output of ChIP Analytics. The P(X) values used in Table 1 and Figure 5A were the lowest P(X) for any probe within that promoter region. Genes in the De Ferrari data set were considered to have evidence of dual occupancy if the corresponding promoters had minimum P(X) values of <0.01 for both ETS1 and ELF1.
The segments shown in Figure 3A were derived from the "segment report" of the ChIP Analytics, which classifies a genomic region as a "bound" segment if there is a series of "bound" probes with <1 kb gaps. A "bound" probe must satisfy a significance heuristic of P(X) < 0.001 and either [P(X) < 0.001 and one neighboring probe with P(X) < 0.01,] or [P(X) < 0.005 for that probe and a neighbor, or P(X) < 0.005 for both neighbors]. For each bound segment, the probe in the segment with the lowest P(X) was considered the center of that segment and that P(X) value was used as the P(X) value for the segment.
Dual-bound ETS1 and ELF1 or ETS1 and RUNX1, or specifically occupied ETS1 but not ELF1 were segregated by examining the P(X) value of all probes from the ELF1 or RUNX1 probe report that lie within 1 kb from the center of the ETS1 segment (ChIP Analytics). Segments were considered dual bound if one of these probes had a P(X) value <0.001 and specific if none of these probes had a P(X) value <0.01. Dual-bound and specific segments were matched to a specific gene by identifying the nearest TSS from the Ensembl database.
Analysis of bound segments by MEME, PATSER, and GOstat
Bound segments were shortened by utilizing only the region spanning the central probe [lowest P(X) value] and ending at probes that displayed a 100-fold increase in P(X) value. If the edge of a segment (gap of at least 1000 bp) was encountered first, the segment was extended by 200 bp and ended. Segments were analyzed by MEME (http://meme.sdsc.edu/meme/meme.html) (Bailey and Elkan 1994
) to identify sequences of variable length that occur more often than expected. Such sequences are reported as PWMs and are given an E value (expect value) describing the number of times that PWM would be identified by chance in a set of sequences of that size. MEME was run with default settings, except that the maximum motif length was set at 15 nucleotides. All sequences shown had the lowest E value of all complex sequences returned. (Runs of a single nucleotide were not considered complex.)
Three kilobases of sequence surrounding the TSS of each gene matched to ETS1/ELF1 dual-bound or ETS1 specifically bound segments and 937 randomly selected genes (Ensembl) were analyzed by PATSER (http://rsat.ulb.ac.be/rsat) (Hertz and Stormo 1999
). The PATSER program moved a window equal to the length of the ETS PWM (Fig. 4A) along both strand sequences and assigned a score to each position. The position of the highest-scoring PWM match for each sequence relative to the TSS was recorded.
The genes with the 400 lowest P(X) values for each category (except for ETS1-specific genes, where all 437 genes were used) were analyzed for overrepresented ontologies using GOstat (http://gostat.wehi.edu.au) (Beissbarth and Speed 2004
). [For ETS1/ELF1/GABP
co-occupied genes, each P(X) value was <0.001 and the mean P(X) value was used]. A list of the
17,000 genes represented on the microarray was used as a background gene list. Random gene lists (400 each) were generated from this background gene list. The maximal P value for returned categories was set to 0.001. Redundant gene categories (differing by less than three genes) were collapsed to one category and uninformative gene categories were not recorded.
Real-time PCR
Real-time PCR was performed as described previously (Hollenhorst et al. 2004
). Serial dilutions of Jurkat ChIP input DNA were used as a standard curve for real-time PCR. Primers designed for the specified genomic regions were found to amplify a single product from genomic input DNA based on a single melting peak. Each ChIP DNA sample was assayed for the levels of two negative control regions, the 3' ends of the albumin, and BCL-XL genes. In all cases, the absolute levels of these control regions varied by less than twofold. The mean level of the control regions was considered the background level of genomic DNA. ChIP enrichments are reported as a ratio of the absolute measurement of each genomic locus to the background level of genomic DNA in the same sample. Primers used to assay genomic loci are listed in Supplementary Table S3.
Protein expression and purification
Human ETS1 (p51) was cloned into bacterial expression vector pET28A (Novagen) at a site that introduces a 6x HIS tag at the N terminus. ETS1 protein was expressed and purified as described previously (Jonsen et al. 1996
). Protein concentration was determined by comparison to BSA standards by Coomassie brilliant blue-stained SDS-PAGE gels, and activity was determined by binding to the high-affinity ETS1-binding duplex 5'-TCGACGGCCAAGCCGGAAGTGAGTGCC-3' (Nye et al. 1992
).
A fragment of the RUNX1 protein that includes amino acids 1–302 was a gift of Nancy Speck. This fragment retains all of the regions necessary for cooperative DNA binding with ETS1 and was purified from baculovirus-infected SF9 cells (Gu et al. 2000
).
DNA-binding assays
Quantitative electrophoretic mobility shift assays were performed as described previously (Jonsen et al. 1996
). In brief, indicated concentrations of ETS1 protein were mixed with 32P-labeled double-stranded oligonucleotides at 1 x 10–11 M, then incubated for 1 h on ice. Duplexes, designated either GGAA or GGAG were composed of the following sequences, respectively: CACAGGAATGCTGGGAATTGTAGTTTTCGCTCTGT; CA CAGGAATGCTGGGAGTTGTAGTTTTCGCTCTGT. Reaction mixtures with RUNX1 (amino acids 1–302) at 3 x 10–8 M were incubated on ice for 30 min before addition of ETS1 protein. Aliquots of binding mixtures were run on a 6% polyacrylamide gel and relative radioactivity in bound or unbound DNA bands was quantified by PhosphorImager (Molecular Dynamics). KDs were calculated as described previously (Goetz et al. 2000
) using least squares curve fit of fraction of DNA bound = 1/(1 + KD/[ETS1]).
| Acknowledgments |
|---|
|
|
|---|
| Footnotes |
|---|
E-MAIL barbara.graves{at}hci.utah.edu; FAX (801) 585-1980. ![]()
Supplemental material is available at www.genesdev.org.
Article published online ahead of print. Article and publication date are online at http://www.genesdev.org/cgi/doi/10.1101/gad.1561707
| References |
|---|
|
|
|---|
Bailey, T.L. and Elkan, C. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In Proceedings of the second international conference on intelligent systems for molecular biology (eds. R. Altman et al.), pp. 28–36. AAAI Press, Menlo Park, CA.
Batchelor, A.H., Piper, D.E., de la Brousse, F.C., McKnight, S.L., and Wolberger, C. 1998. The structure of GABP
/
: An ETS domain–ankyrin repeat heterodimer bound to DNA. Science 279: 1037–1041.
Beissbarth, T. and Speed, T.P. 2004. GOstat: Find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20: 1464–1465.
Bina, M., Wyss, P., Ren, W., Szpankowski, W., Thomas, E., Randhawa, R., Reddy, S., John, P.M., Pares-Matos, E.I., Stein, A., et al. 2004. Exploring the characteristics of sequence elements in proximal promoters of human genes. Genomics 84: 929–940.[CrossRef][Medline]
Bird, A.P. 1986. CpG-rich islands and the function of DNA methylation. Nature 321: 209–213.[CrossRef][Medline]
Blanchette, M., Bataille, A.R., Chen, X., Poitras, C., Laganiere, J., Lefebvre, C., Deblois, G., Giguere, V., Ferretti, V., Bergeron, D., et al. 2006. Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res. 16: 656–668.
Bories, J.-C., Willerford, D.M., Grevin, D., Davidson, L., Camus, A., Martin, P., Stehelin, D., and Alt, F.W. 1995. Increased T-cell apoptosis and terminal B-cell differentiation induced by inactivation of the Ets-1 proto-oncogene. Nature 377: 635–638.[CrossRef][Medline]
Boyer, L.A., Lee, T.I., Cole, M.F., Johnstone, S.E., Levine, S.S., Zucker, J.P., Guenther, M.G., Kumar, R.M., Murray, H.L., Jenner, R.G., et al. 2005. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122: 947–956.[CrossRef][Medline]
Brown, T.A. and McKnight, S.L. 1992. Specificities of protein–protein and protein–DNA interaction of GABP
and two newly defined ets-related proteins. Genes & Dev. 6: 2502–2512.
Buchwalter, G., Gross, C., and Wasylyk, B. 2004. Ets ternary complex transcription factors. Gene 324: 1–14.[CrossRef][Medline]
De Ferrari, L. and Aitken, S. 2006. Mining housekeeping genes with a Naive Bayes classifier. BMC Genomics 7: 277. doi: 10.1186/1471-2164-7-277.[CrossRef][Medline]
de la Brousse, F.C., Birkenmeier, E.H., King, D.S., Rowe, L.B., and McKnight, S.L. 1994. Molecular and genetic characterization of GABP
. Genes & Dev. 8: 1853–1865.
Euskirchen, G., Royce, T.E., Bertone, P., Martone, R., Rinn, J.L., Nelson, F.K., Sayward, F., Luscombe, N.M., Miller, P., Gerstein, M., et al. 2004. CREB binds to multiple loci on human chromosome 22. Mol. Cell. Biol. 24: 3804–3814.
Feng, Y., Goulet, A.C., and Nelson, M.A. 2004. Identification and characterization of the human Cdc2l2 gene promoter. Gene 330: 75–84.[CrossRef][Medline]
FitzGerald, P.C., Shlyakhtenko, A., Mir, A.A., and Vinson, C. 2004. Clustering of DNA sequences in human promoters. Genome Res. 14: 1562–1574.
Galang, C.K., Muller, W.J., Foos, G., Oshima, R.G., and Hauser, C.A. 2004. Changes in the expression of many Ets family transcription factors and of potential target genes in normal mammary tissue and tumors. J. Biol. Chem. 279: 11281–11292.
Goetz, T.L., Gu, T.L., Speck, N.A., and Graves, B.J. 2000. Auto-inhibition of Ets-1 is counteracted by DNA binding cooperativity with core-binding factor
2. Mol. Cell. Biol. 20: 81–90.
Gottschalk, L.R. and Leiden, J.M. 1990. Identification and functional characterization of the human T-cell receptor
gene transcriptional enhancer: Common nuclear proteins interact with the transcriptional regulatory elements of the T-cell receptor
and
genes. Mol. Cell. Biol. 10: 5486–5495.
Gu, T.L., Goetz, T.L., Graves, B.J., and Speck, N.A. 2000. Auto-inhibition and partner proteins, core-binding factor
(CBF
) and Ets-1, modulate DNA binding by CBF
2 (AML1). Mol. Cell. Biol. 20: 91–103.
Hassler, M. and Richmond, T.J. 2001. The B-box dominates SAP-1–SRF interactions in the structure of the ternary complex. EMBO J. 20: 3018–3028.[CrossRef][Medline]
Hertz, G.Z. and Stormo, G.D. 1999. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15: 563–577.
Hollenhorst, P.C., Jones, D.A., and Graves, B.J. 2004. Expression profiles frame the promoter specificity dilemma of the ETS family of transcription factors. Nucleic Acids Res. 32: 5693–5702.
John, S., Marais, R., Child, R., Light, Y., and Leonard, W.J. 1996. Importance of low affinity Elf-1 sites in the regulation of lymphoid-specific inducible gene expression. J. Exp. Med. 183: 743–750.
Jonsen, M.D., Petersen, J.M., Xu, Q., and Graves, B.J. 1996. Characterization of the cooperative function of inhibitory sequences of Ets-1. Mol. Cell. Biol. 16: 2065–2073.[Abstract]
Kopp, J.L., Wilder, P.J., Desler, M., Kim, J.H., Hou, J., Nowling, T., and Rizzino, A. 2004. Unique and selective effects of five Ets family members, Elf3, Ets1, Ets2, PEA3, and PU.1, on the promoter of the type II transforming growth factor-
receptor gene. J. Biol. Chem. 279: 19407–19420.
Krig, S.R., Jin, V.X., Bieda, M.C., OGeen, H., Yaswen, P., Green, R., and Farnham, P.J. 2007. Identification of genes directly regulated by the oncogene ZNF217 using chromatin immunoprecipitation (ChIP)–chip assays. J. Biol. Chem. 282: 9703–9712.
Li, R., Pei, H., and Watson, D.K. 2000. Regulation of Ets function by protein–protein interactions. Oncogene 19: 6514–6523.[CrossRef][Medline]
Li, Z., Van Calcar, S., Qu, C., Cavenee, W.K., Zhang, M.Q., and Ren, B. 2003. A global transcriptional regulatory role for c-Myc in Burkitts lymphoma cells. Proc. Natl. Acad. Sci. 100: 8164–8169.
Mao, X., Miesfeldt, S., Yang, H., Leiden, J.M., and Thompson, C.B. 1994. The FLI-1 and chimeric EWS–FLI-1 oncoproteins display similar DNA binding specificities. J. Biol. Chem. 269: 18216–18222.
Martone, R., Euskirchen, G., Bertone, P., Hartman, S., Royce, T.E., Luscombe, N.M., Rinn, J.L., Nelson, F.K., Miller, P., Gerstein, M., et al. 2003. Distribution of NF-
B-binding sites across human chromosome 22. Proc. Natl. Acad. Sci. 100: 12247–12252.
Messina, D.N., Glasscock, J., Gish, W., and Lovett, M. 2004. An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression. Genome Res. 14: 2041–2047.
Meyers, S., Downing, J.R., and Hiebert, S.W. 1993. Identification of AML-1 and the (8;21) translocation protein (AML-1/ETO) as sequence-specific DNA-binding proteins: The runt homology domain is required for DNA binding and protein–protein interactions. Mol. Cell. Biol. 13: 6336–6345.
Mo, Y., Ho, W., Johnston, K., and Marmorstein, R. 2001. Crystal structure of a ternary SAP-1/SRF/c-fos SRE DNA complex. J. Mol. Biol. 314: 495–506.[CrossRef][Medline]
Muthusamy, N., Barton, K., and Leiden, J.M. 1995. Defective activation and survival of T-cells lacking the Ets-1 transcription factor. Nature 377: 639–642.[CrossRef][Medline]
Nye, J.A., Petersen, J.M., Gunther, C.V., Jonsen, M.D., and Graves, B.J. 1992. Interaction of murine Ets-1 with GGA-binding sites establishes the ETS domain as a new DNA-binding motif. Genes & Dev. 6: 975–990.