|
|
|
Vol. 13, No. 24, pp. 3217-3230, December 15, 1999
1 Laboratories of Molecular Biophysics, Howard Hughes Medical Institute, The Rockefeller University, New York, New York 10021 USA; 2 Departments of Internal Medicine and Biochemistry, Ryburn Cardiology Center, University of Texas Southwestern Medical Center, Dallas, Texas 75235-8573 USA
| |
Abstract |
|---|
|
|
|---|
Cocrystal structures of wild-type TATA box-binding protein (TBP)
recognizing 10 naturally occurring TATA elements have been determined
at 2.3-1.8 Å resolution, and compared with our 1.9 Å resolution
structure of TBP bound to the Adenovirus major late promoter (AdMLP)
TATA box (5'-TATAAAAG-3'). Minor-groove recognition by the
saddle-shaped protein induces the same conformational change in each of
these oligonucleotides, despite variations in promoter sequence that
reduce the efficiency of transcription initiation. Three molecular
mechanisms explain assembly of diverse TBP-TATA element complexes. (1)
T
A and A
T transversions leave the minor-groove face
unchanged, permitting formation of TBP-DNA complexes on many
A/T-rich core promoter sequences. (2) Cavities in the interface between TBP and the minor-groove face of the AdMLP TATA box
accommodate the exocyclic NH2 groups of G in a
TACA box and in a TATAAG box. (3) Formation of
a C:G Hoogsteen basepair in a TATAAAC box eliminates steric
clashes that would be produced by the Watson-Crick base pair. We
conclude that the structure of the TBP-TATA box complex found at the
heart of the polymerase II (pol II) transcription machinery has
remained constant over the course of evolution, despite variations in
TBP and its DNA targets.
[Key Words: TATA box; transcription; TBP-TATA complex; Pol II]
| |
Introduction |
|---|
|
|
|---|
In eukaryotes RNA polymerase II (Pol II) is responsible for
transcribing nuclear genes encoding the mRNAs and several small nuclear
RNAs. Like RNA Pol I and Pol III, Pol II cannot
recognize its target promoter directly and initiate transcription
without accessory proteins. Instead, this large multisubunit enzyme
relies on both general transcription factors (GTFs) and transcriptional activators and coactivators (both positive and negative) to regulate transcription from class-II nuclear gene promoters (for review, see
Roeder 1996
). The primary DNA anchor of this complicated macromolecular machine is transcription factor IID (TFIID), a 700-kD complex composed
of the TATA box-binding protein (TBP) and a set of phylogenetically conserved, Pol II-specific TBP-associated factors (for review, see
Burley and Roeder 1996
). DNA binding by human TFIID was first demonstrated with the adenovirus major-late promoter (AdMLP). DNase
I footprinting studies of the AdMLP and selected human gene promoters
revealed sequence-specific interactions with the TATA element, which
are primarily mediated by TBP. Protection outside of the TATA box
displays a nucleosome-like pattern of DNase I hypersensitivity, varies
radically among promoters, and can be induced by some activators (for
review, see Burley and Roeder 1996
).
Genes encoding TBPs have been cloned from organisms ranging from
archaea to human. The molecules share a phylogenetically conserved
180-residue carboxy-terminal or core segment, which contains two
imperfect direct repeats and supports all of the protein's
biochemically important functions in Pol II transcription (for review,
see Burley and Roeder 1996
). Because of the original purification and
characterization of Saccharomyces cerevisiae TBP, there has
been considerable progress toward understanding its mechanisms of
action. Specific nanomolar-affinity (Hahn et al. 1989
) binding to a
TATA element entails DNA bending (Horikoshi et al. 1992
) and occurs
exclusively via minor groove interactions (Lee et al. 1991
; Starr and
Hawley 1991
). Three-dimensional structures of a full-length TBP from
Arabidopsis thaliana (Nikolov et al. 1992
; Nikolov and Burley
1994
), yeast core TBP (Chasman et al. 1993
), and full-length TBP from
the archaeon Pyrococcus woesei (DeDecker et al. 1996
) have
been determined. The monomeric protein consists of two nearly identical
domains and adopts a quasisymmetric
/
structure, resembling a molecular saddle complete with stirrups (Fig.
1). The concave underside of the saddle is a highly
curved, 10-stranded, antiparallel
-sheet, containing the amino
acids involved in DNA binding (Fig. 2). The convex
upper surface of the saddle consists of four
-helices, which
interact with other transcription factors (for review, see Nikolov and
Burley 1994
). Cocrystal structure determinations of A. thaliana, yeast core, and human core TBPs interacting with similar
TATA elements (Kim et al. 1993a
,b
; Kim and Burley 1994
; Juo et al.
1996
; Nikolov et al. 1996
) revealed an unusual protein-DNA complex,
characterized by extensive hydrophobic interactions with the minor
groove, severely distorted DNA, and phenylalanine side chains kinking
DNA by insertion between base pairs at the 5' and 3' ends of
the TATA box. The same conformational change has been observed in
triple-complex cocrystal structures of TBP plus DNA with human TFIIB
(Nikolov et al. 1995
), an archaeal homolog of TFIIB (Kosa et al. 1997
), and yeast TFIIA (Geiger et al. 1996
; Tan et al. 1996
). Further relevant
work on TBP includes examination of the kinetics and thermodynamics of
TATA element binding (Hoopes et al. 1992
; Petri et al. 1995
, 1998
;
Parkhurst et al. 1996
), and studies of the effects of prebending
promoter DNA (Parvin et al. 1995
) and TBP-induced DNA deformation (Sun
and Hurley 1995
; L. Hurley, unpubl.).
|
|
Computational studies of the eukaryotic promoter database (EPD) have
also yielded important insights into the function of TBP (see Table 1
for a summary of results available on the internet via
http://www.epd.isb-sib.ch/promoter_elements). An exhaustive statistical survey
documented that the TATA box is an A/T-rich 8-bp segment
often flanked by G/C-rich sequences (Bucher 1990
). Despite the marked preference for A:T and T:A base pairs, C:G and/or G:C base pairs occur frequently at five of the
eight positions. Thus, TBP can bind productively to a large number of
diverse TATA elements, some of which bear little resemblance to the
optimal TATA sequence (5'-TATATAAG-3'), identified by an in
vitro binding-site selection experiment with Acanthamoeba
TBP (Wong and Bateman 1994
).
|
In an effort to understand how TBP can support Pol II transcription initiation from many different TATA boxes in class-II nuclear gene promoters, we have made a systematic X-ray crystallographic study of a canonical wild-type TBP (A. thaliana TBP isoform 2) recognizing 10 naturally occurring variants of the AdMLP TATA element. The structure of the TBP-DNA complex is essentially independent of TATA element sequence. Three distinct molecular mechanisms allow TBP to induce the same DNA deformation, one of which uses Hoogsteen base pairs adjacent to the 3' kink site. Detailed analyses of these high-resolution cocrystal structures document that TATA element recognition has remained constant over the course of evolution.
| |
Results and Discussion |
|---|
|
|
|---|
Study design and functional characterization of TATA element variants
Table 1 illustrates the oligonucleotides used for crystallization
with wild-type TBP. The 10 sequences represent naturally occurring
single-base variants of the AdMLP TATA element
(5'-TATAAAAG-3'), including at least one substitution at each
position. Diffraction-quality cocrystals were obtained with C:G or G:C
base pairs at four of the five TATA box positions, where they occur
with non-zero estimated frequency in the EPD (ranging from 2.9% to
38.4%). Defining the intrinsic strength of the AdMLP TATA box to be
100% (see Materials and Methods), the variants show transcription
activity levels ranging between 5% and 110%. (Similar
cocrystallization trials were unsuccessful with oligonucleotides that
do not support transcription initiation or TBP binding, including
5'-TGTAAAAG-3', 5'-TCTAAAAG-3', 5'-TAAAAAAG-3', 5'-TATGAAAG-3', and
5'-TATACAAG-3'.) Table 2 provides a summary
of the crystallographic statistics, showing that all 10 newly
determined cocrystal structures are of the highest quality. The
presence of two protein-DNA complexes/asymmetric units
in each case permits an estimate of the precision of each set of atomic
coordinates. Root-mean-square deviations (rmsds) between
-carbons
range between 0.2 and 0.4 Å, which is comparable to the precision of
the X-ray method at resolution limits between 1.8 and 2.3 Å (Brünger and Rice 1997
).
|
TBP structure is not affected by TATA element sequence variation
Figure 1B illustrates the structures of TBP extracted from each
cocrystal structure and overlaid by least-squares superposition on our
original AdMLP cocrystal structure (Kim et al. 1993a
; Kim and Burley
1994
). With the exception of the short segment connecting
-helix
H1 to
-strand S2, TBP structural variation as a function of TATA
element sequence is comparable to the precision of the atomic
coordinates (
-carbon rmsds = 0.2-0.5 Å). All subsequent structural comparisons are based on the superpositions in Figure 1B.
The phenylalanine pairs (Phe-148 and Phe-165, Phe-57, and Phe-74)
responsible for kinking the TATA element at its 5' and 3' ends
also show no significant structural variation (Fig. 1C,D).
Trajectory of the DNA double-helix axis is not affected by TATA element sequence variation
Figure 3 illustrates only minimal variation in the
trajectory of the DNA double-helix axis as a function of TATA element
sequence. This similarity documents that TBP binding induces the same
deformation in the core promoter, independent of the precise sequence
and transcription activity of the TATA element. Detailed analyses of
the DNA structural parameters (Lavery and Sklenar 1989
; Stofer and
Lavery 1993
) for each TATA element (data not shown) reveal only minor
differences when compared with the AdMLP TATA box bound to the same
protein (Kim and Burley 1994
).
|
Structure of the TBP-DNA complex is not affected by TATA element sequence variation
Pairwise comparisons of our TBP-DNA cocrystal structures document
that even the detailed structure of the protein-DNA complex is not
significantly affected by variations in the TATA element (rmsds for
core
-carbon and TATA element C1' atoms = 0.2-0.5 Å). A
comparison of plant and archaeal TBP-DNA complexes reveals a very
similar protein-DNA interface (rmsds for TATA element C1' atoms = 1.1 Å), despite the fact that the upper surfaces of the two
molecular saddles are somewhat different (data not shown). We conclude
that the structure of the TBP-TATA element complex is independent of
the sequence of the TATA element.
Figure 4 provides a schematic view of the protein
surface responsible for TATA element recognition. In all available
crystal structures containing TBP and DNA, only 15 highly conserved
residues contribute to interactions with the minor-groove edges of the bases. Excluding Plasmodium falciparum TBP and Drosophila
melanogaster TBP-related factor (TRF) from the sequence comparison,
these 15 residues are almost absolutely invariant (Fig. 2). The only
exceptions are a Thr-82
Leu substitution in
Schizosaccharomyces pombe TBP, and Thr-173
Ser
substitutions in P. woesei and Thermococcus celer
TBPs. The sequence of P. falciparum TBP displays six
conservative substitutions, including Val-29
Thr,
Phe-57
Ile, Val-80
Met (which is an Ile in
Drosophila TRF), Val-119
Ile, Pro-149
Ala, and
Val-171
Ile. The residues responsible for direct and
water-mediated interactions with the DNA backbone are also highly
conserved, showing minimal variation particularly in the carboxy-terminal half of the protein (Fig. 2). Thus, it is not surprising that the three-dimensional structure of TBP and its interactions with DNA have remained unchanged throughout evolution.
|
Three mechanisms explain how TBP exploits the same induced fit strategy to recognize all 10 variants of the AdMLP TATA element
The major-groove face of B-DNA is varied chemically as a function of
sequence, which explains why most sequence-specific DNA-binding proteins interact with the major-groove edges of base pairs in their
recognition elements (for review, see Patikoglou and Burley 1997
). In
contrast, the minor-groove face is chemically monotonous as a function
of sequence (Seeman et al. 1976
). T
A and A
T
transversions leave the structure of the minor-groove face essentially unchanged, because both A:T and T:A display similarly positioned pairs
of hydrogen bond acceptors on their minor-groove edges. G:C and C:G, on
the other hand, provide some relative chemical variation. Minor-groove
differences between G:C and A:T or C:G and T:A arise from an exocyclic
NH2 protruding from G. It is not surprising that DNA-binding
proteins exhibiting little or no sequence specificity interact
primarily with the minor groove (for review, see Patikoglou and Burley
1997
). TBP exploits an induced-fit mechanism that relies, in part, on
the chemical monotony of the minor groove to recognize many different
core promoters during Pol II transcription initiation.
Minor-groove structural degeneracy of A:T and T:A permits TBP-DNA
complex formation on many A/T-rich promoter sequences,
albeit with dramatically reduced transcriptional efficiency in some
cases (Table 1). Figure 5 depicts the consequences of
changing the first base pair of the TATA element from T:A to A:T in our
A(
31) cocrystal structure of an AATA box
(5'-AATAAAAG-3', frequency = 5.0%,
activity = 11%-16%), and the second base pair from A:T to T:A in
our T(
30) structure of a TTTA box
(5'-TTTAAAAG-3', frequency = 13.9%,
activity = 20%-29%). In both cases, the two cocrystal structures
are essentially identical because the protein cannot readily
distinguish T:A from A:T. We obtained similar findings for cocrystal
structures of T(
28) [TATT box (5'-TATTAAAG-3', frequency = 8.4%, activity = 14%)] and T(
27) [TATAT box
(5'-TATATAAG-3', frequency = 27.7%,
activity = 25%-43%)], and T(
25) [TATAAAT box (5'-TATAAATG-3', frequency = 35.2%,
activity = 90%-110%)] and T(
24) [TATAAAAT box
(5'-TATAAAAT-3', frequency = 11.8%, activity not
measured)] (Figs. 6 and 7).
Although TBP cannot readily distinguish T:A from A:T, work by the
laboratories of Dervan and Rees has demonstrated that a class of small
molecules can do so (Kielkopf et al. 1998
).
|
|
|
Shape complementarity is not sufficient, however, to ensure that a
given sequence will bind productively to TBP. For example, a TAAAAAA
box is inactive in Pol II transcription (Wobbe and Struhl 1990
) and
does not form a stable complex with TBP (Starr et al. 1995
). Model
building suggests that the TAAAAAA box should function (data not
shown), but the A tract is probably too rigid (DiGabriele and Steitz
1993
) to undergo the deformation characteristic of all known TBP-DNA
complexes. Crystallization attempts with oligonucleotides bearing
TAAAAAA were entirely unsuccessful.
The second mechanism underlying relaxed DNA-binding specificity allows
TBP to accommodate the exocyclic NH2 of G in the C:G base
pair of a TACA box [C(
29) (5'-TACAAAAG-3',
frequency = 3.3%, activity = 15%-24%)] and in the G:C base pair
of a TATAAG box [G(
26) (5'-TATAAGAG-3',
frequency = 9.5%, activity = 18%)]. Inspection of our AdMLP
cocrystal structure (5'-TATAAAAG-3') reveals two cavities
between TBP and DNA. The first cavity receives the exocyclic NH2 of G in a TACA box, explaining why a C:G base pair at the third position can be accommodated without any structural changes in
protein or the DNA (Fig. 5C). The exocyclic NH2 of G at
position six in a TATAAG box is found in the second cavity (Fig. 7A),
again allowing for productive PIC formation and Pol II transcription initiation (Table 1). The quasispherical cavity at position 3 does not
appear to be able to accommodate the exocyclic NH2 of G in a
hypothetical TAGA box, which almost never appears in the EPD (frequency
<1%). The same constraints may not apply to a hypothetical TATAAC
box (frequency = 2.9%), because the cavity at position 6 is somewhat
elongated and may receive the protruding amino group from the bottom
strand of the promoter. Further analyses of all available TBP-TATA
element interfaces do not reveal the existence of any other sizable
cavities (data not shown), and we believe that the cavity mechanism of
specificity broadening is restricted to positions 3 and 6.
The final contributor to relaxed DNA-binding specificity involves
TBP-induced formation of Hoogsteen base pairs (Hoogsteen 1963
), which
have not been described previously in protein-DNA complexes.
Inspection of our AdMLP cocrystal structure suggests that TBP should
not tolerate C:G or G:C base pairs at position 7, because the exocyclic
NH2 would clash with the side chain of Leu-72 (see the
structure of TBP bound to 5'-TATAAATG-3', illustrated in Fig.
7C). Bucher's studies of the EPD, however, yielded non-zero estimates
for the frequencies of C:G and G:C base pairs at position 7 (3.4%
and 16.4%, respectively). Cocrystallization of TBP with the
oligonucleotide corresponding to a TATAAAC box [C(
25)
(5'-TATAAACG-3', frequency = 3.4%,
activity = 5%-6%)] allowed us to uncover a remarkable explanation
for broadened DNA-binding specificity at position 7. A 180° torsion
angle change about the C1'-N9 bond (Fig. 8), giving a syn instead of the normal anti conformation,
creates a C:G Hoogsteen base pair that is stabilized by interstrand
hydrogen bonding (C N4-G O6 = 2.9 Å and possibly C N3-G N7 = 3.0
Å) plus an intrastrand hydrogen bond with the backbone (G
N2-phosphate O = 2.9 Å). This additional DNA deformation prevents a
steric clash between the exocyclic NH2 of G and Leu-72 that
would be produced by the corresponding Watson-Crick base pair (data
not shown). At the same time, the Hoogsteen base pair preserves many of
the van der Waals interactions with Phe-57 and Phe-74, which are
largely responsible for the DNA kink at the 3' end of the TATA box.
Similar steric arguments apply to the problem of accommodating a G:C
base pair at this position, and model building suggests that TATA
elements bearing a G at position 7 exploit an analogous G:C Hoogsteen
base pair (data not shown). The corresponding TATAAAG box occurs often
in eukaryotic promoters (frequency = 16.4%) and is capable of
supporting Pol II transcription initiation both in vitro and in vivo
(Wobbe and Struhl 1990
).
|
We believe that Hoogsteen base pairs can form at position 7 because the
energy barrier for the anti to syn glycosidic torsion angle change is very low for unstacked DNA bases (Ornstein et al.
1978
), which is precisely the case in the vicinity of the 3' kink.
Steric clashes between TBP and Watson-Crick C:G or G:C base pairs at
position 7 preclude assembly of stable protein-DNA complexes in the
absence of further conformational changes in the nucleic acid. TBP has
solved this problem by exploiting the phenylalanine-induced kink
between positions 7 and 8 to allow rotation of the position 7 G base
about the C1'-N9 bond. Noncrystallographic symmetry allows us to
rule out some lattice packing artifact in C(
25), because we see the
same behavior in both crystallographically independent TBP-TATAAAC box
cocrystal structures comprising the asymmetric unit.
Although Hoogsteen base pairs have not been described in protein-DNA
complexes, there is extensive literature on Hoogsteen base pairing in
drug-DNA complexes (for review, see Chen and Patel 1995
). C:G
Hoogsteen base pairs were detected in a cocrystal structure of
Triostin A bound to a self-complementary duplex oligonucleotide with
sequence 5'-GCGTACGC-3' (Wang et al. 1986
). Like our
observation at position 7 of the TATAAAC box, these Hoogsteen base
pairs flank drug intercalation sites where unstacking occurs. Formation
of a C:G (or G:C) Hoogsteen base pair stabilized by two hydrogen bonds
(Fig. 8) requires protonation of C at position N3 (pKa = 4.6 in the
absence of environmental effects). X-ray crystallography at 1.95 Å resolution cannot reveal the protonation state of a particular
titratable group, but we presume that the pKa of N3 is shifted toward
neutral in our Hoogsteen base pairs, as observed by NMR in drug-DNA
complexes (for review, see Escude et al. 1996
).
We also detected a G:C Hoogsteen base pair on the other side of the
3' kink, created by insertion of Phe-57 and Phe-74. Figure 8, C and
D, illustrates a portion of the TTTA box cocrystal structure [T(
30) 5'-TTTAAAAG-3', frequency = 13.9%,
activity = 20%-29%]. Bucher's (1990)
studies of the EPD reveal
no substantial base pair preference (Table 1), and our model building
results suggest that all four Watson-Crick base pairs can be
accommodated at position 8 (data not shown). In this case, we are
confident that the observed Hoogsteen base pair actually is an artifact
of lattice packing within this particular protein-DNA cocrystal (each
of the 10 newly determined cocrystal structures displays a unique
lattice packing arrangement). Arg-65, protruding from a nearby TBP-DNA
complex in the crystal lattice, makes two coplanar hydrogen bonds with G(
23) O6 and N7 (both 2.9 Å) and stacks on the
electron
cloud of G(
24) (3.4 Å interplanar distance), thereby stabilizing
the observed Hoogsteen base pair (Fig. 8C,D).
Detection of Hoogsteen base pairs adjacent to the 3' kink site in
the TATA element suggests that a similar situation might be obtained at
the 5' end of the recognition element, where the DNA is equally
kinked. Many TATA boxes are quasisymmetric and the underside of the TBP
saddle displays approximate twofold symmetry about an axis passing
between
-strands S1 and S1' and running perpendicular to the
-sheet (Fig. 4). Leu-72 even has a counterpart, Leu-163, which is
related by twofold quasisymmetry. On this side of the saddle, however,
Leu-163 really does preclude G:C or C:G base pairs at position 2 (frequencies = 1%), and wild-type TBP does not make productive
complexes with either TCTA or TGTA boxes (Wobbe and Struhl 1990
).
Site-directed mutagenesis of Leu-163
Val (with an accompanying
Ile
Phe substitution) yielded a mutant TBP capable of directing
Pol II transcription initiation in vivo from both TATA and TGTA boxes
(Strubin and Struhl 1992
). Thus, the TGTA box must be able to undergo
the obligate TBP-induced DNA deformation when it is bound to the mutant
protein. Our attempts to cocrystallize wild-type TBP with
oligonucleotides containing TCTA or TGTA boxes were unsuccessful (data
not shown), and we believe that Hoogsteen base pair formation does not
occur in the vicinity of the 5' kink site.
This functional asymmetry between the 5' and 3' ends of the
TATA element has been detected independently by Hurley and coworkers (Sun and Hurley 1995
; L. Hurley, unpubl.). Their chemical modification studies with pluramycin, a probe for deformed DNA, demonstrated that
TBP-dependent reactivity extends beyond the confines of the 3' end
of 5'-TATAAAAA-3' but never 5' of the T:A base pair at position 1. In contrast, a perfectly symmetric TATA element
(5'-TATATATA-3') shows equal TBP-dependent pluramycin
reactivity at both ends of the oligonucleotide. Thus, subtle
asymmetry in the deformability of various TATA element sequences may
contribute to asymmetric behavior during complex formation. If true,
this phenomenon could help dictate the polarity of TBP binding,
which continues to elude definitive explanation. This intrinsic effect
of the TATA element would be independent of any polarity-determining
role that TFIIB may play, via interactions with the promoter
upstream of the TATA box (Cox et al. 1997
; Lagrange et al. 1998
;
Qureshi and Jackson 1998
). It is remarkable that the TBP-dependent
effects of pluramycin reactivity are not influenced by addition of
TFIIB (L. Hurley, unpubl.).
Not surprisingly, all 10 newly determined cocrystal structures display
subtle changes at the level of both TBP and DNA, when compared with
our AdMLP cocrystal structure. The most remarkable difference was
observed in our TATAAT box cocrystal structure [T(
26)
(5'-TATAATAG-3', frequency = 2.8%,
activity = 6%)]. Modest structural rearrangements created a void in
the protein-DNA interface that is occupied by a water molecule (Fig.
6C,D) in both halves of the asymmetric unit. This finding was somewhat
unexpected, because there is no precedent for an ordered water molecule
being found between TBP and DNA. It is made even more interesting by the fact that this particular TATA box is almost 20-fold weaker than
its AdMLP counterpart, which differs by only 1 bp (A:T
T:A).
Functional definition of the TATA element
The wealth of information available on TBP-DNA complexes makes a
functional definition of the TATA element possible. Inspection of
extant cocrystal structures plus the results of model building exercises allow us to predict which of the four possible base pairs are
tolerated at each of the eight positions comprising a TATA box.
Specifically, we can predict which octameric sequences can present a
minor groove surface that is complementary to the underside of the
molecular saddle. Regrettably, we cannot make quantitative judgments
regarding whether or not a particular combination of allowed base pairs
will yield a sequence capable of undergoing the obligate conformational
change on binding to TBP. We have already discussed the example of an A
tract, but there are bound to be other sequence-specific effects
that preclude certain combinations of allowed base pairs. For reviews
on sequence-dependent effects in protein-DNA complexes see Dickerson
(1998)
and Olson et al. (1998)
.
Assuming that all functional TATA elements must present a complementary
minor-groove surface to TBP after they undergo the same DNA deformation
and taking each of the eight positions in turn, we derived a
structure-based definition for a TATA element that is also consistent
with the results of Bucher's (1990)
studies of the EPD.
TATA definition
T >> c > a
g/A >> t/T >> a
c/A >> t/T >> a/A >> g > c
t/A
T > g > c/G
A > c
t.
Position 1 (T >> c > a
g)
All four base pairs are observed in nature and are compatible with
our structural insights. The EPD survey identifies a marked preference
for T (frequency = 79.5%), which may be related to the fact that the
energy of T:A stacking on the adjacent base pair is relatively small
(except for T:A stacking on C:G) (Ornstein et al. 1978
) and easy to
overcome by intercalation of Phe-148 and Phe-165.
Position 2 (A >> t)
A:T and T:A appear to be equally acceptable from the structural
standpoint. Both G:C and C:G are forbidden by steric clashes with
Leu-163. The EPD analysis confirms that G:C and C:G are vanishingly rare (frequencies = 1%). There is a marked preference for A:T over
T:A (frequencies = 83.5% and 13.9%, respectively), which is
probably correlated with the frequency of T:A in position 1. T:A on A:T
stacking energy is the lowest of all possible combinations (Ornstein et
al. 1978
), again favoring unstacking of the first two base pairs by
Phe-148 and Phe-165.
Position 3 (T >> a
c)
T:A, A:T and C:G are all structurally compatible at this
position, whereas the exocyclic NH2 of G:C would clash with
Val-119. T:A dominates (frequency = 91.4%), which may reflect the
preference for T:A in position 1 followed by A:T in position 2. An A:T
in position 3 (frequency = 4.4%) would create an A tract.
Position 4 (A >> t) Our structures suggest that only A:T and T:A are permitted, which is consistent with database findings (frequencies = 89.2% and 8.4%, respectively). Val-119 precludes G:C and C:G (both frequencies = 1%).
Position 5 (T >> a) Like position 4, A:T and T:A are structurally permitted and observed in nature (frequencies = 71.0% and 27.7%, respectively), whereas G:C and C:G are precluded by Val-29 (both frequencies <1%). We infer that mutation of either of these critical valine residues (29 or 119) to alanine would greatly broaden the specificity of TBP binding.
Position 6 (A >> g > c
t)
Our collection of cocrystal structures includes A:T, T:A and G:C,
all of which function in transcription (Wobbe and Struhl 1990
). The
remaining possibility, C:G, is present at a low but statistically
significant level (frequency = 2.9%) and is active in transcription
(Wobbe and Struhl 1990
).
Position 7 (A
T > g > c)
The EPD contains examples of all four base pairs at this position.
Our structural study focused on C:G (frequency = 3.4%) revealed the
most significant structural ing with G in the syn conformation, thereby avoiding a steric clash with Leu-72.
Position 8 (G
A > c
t)
All four base pairs are found in the EPD and are compatible with the
structural data.
Implications for promoter sequence conservation and regulation of Pol II transcription initiation
Our functional definition of the TATA element implies that 6144 of
the 65,536 possible octameric sequences could present complementary minor-groove surfaces to the underside of the molecular saddle. The
actual number must be lower, because some combinations of individually
allowed base pairs cannot undergo the required conformational change
during TBP binding. In advance of a systematic study, it is impossible
to know how many putative TATA elements bind TBP productively, but it
probably numbers in the thousands. Typical sequence-specific
DNA-binding proteins could not bind such large ensembles of sequences
with high affinity, because they interact with the major groove.
Minor-groove recognition of TATA elements by TBP, on the other hand,
represents an architecturally elegant solution to two potential
problems arising from errors occurring during DNA replication. First,
random point mutants in TATA boxes controlling expression of essential
genes are not invariably lethal. Second, variations in TATA box
sequence do not significantly perturb later steps in PIC assembly. TATA
element deformation by TBP creates a structurally invariant
nucleoprotein complex that serves as the receptor for TFIIB and TFIIA.
Subsequent entrants to the PIC (Pol II/TFIIF, TFIIE,
TFIIH) would also see the same multiprotein-DNA complex no matter what
TATA box was actually present in the promoter. It is, of course,
possible that one or more factors could gain access to the major-groove
face of the TATA box while it is bound to TBP and affect transcription
initiation from specific promoters, as suggested by the results of Lee
and Roeder (1997)
.
Transcription activation in vivo is usually thought of as being
composed of two distinct subprocesses, antirepression and true
activation. We have already discussed the connection between TBP
binding to the minor-groove face of the TATA box and the repressive effects of chromatin (Kim et al. 1993a
), and now go on to consider true
activation. Our crystallographic analysis of DNA recognition by TBP
does not specifically address the issue of how transcriptional activators work, but the structural and functional data do provide some
mechanistic insights. The fact that the cocrystal structures are
essentially identical, despite 20-fold differences in transcription activity (all other things being equal in two different in vivo assays), immediately tells us that thermodynamic, kinetic, or dynamic
differences must be responsible for the observed variations in
transcriptional efficiency.
Preliminary biophysical observations demonstrate that our
transcriptionally weaker TATA element variants bind to TBP with lower
affinities when compared with the AdMLP (A.K. Mollah, B. Gilden, E. Jamison, M.D. Librizzi, S.K. Burley, I.M. Willis, and M. Brenowitz, in
prep.). Variations in association (kon)
and/or dissociation (koff) rates could
reduce both binding affinity
(KD = koff/kon)
and the efficiency of transcription initiation. If the obligate,
TBP-induced conformational change takes the same amount of time (i.e.,
kon remains unchanged) for some TATA elements, the
half-lives of the corresponding TBP-DNA complexes must be decreased in
the weaker TATA boxes (i.e., koff increases).
Alternatively, changes in TATA-box sequence could also slow the rate of
TBP-DNA complex formation. During Hoogsteen base pair formation at
position 7, we know that kon is reduced (Mollah,
S.K. Burley, and M. Brenowitz, in prep.), presumably because it takes
longer to form a biochemically productive TBP-DNA complex dependent on
rotation about the glycosidic bond. We suggest that some
transcriptional activators will exert their positive effects on mRNA
production by increasing the half-life of the foundation on which the
PIC is assembled on a specific promoter. Presumably, this strategy
allows regulatory proteins bound upstream to overcome the effects of an
intrinsically weak TATA element, as seen for the Zta
trans-activator protein (Lieberman and Berk 1991
). A
transcriptional activator could also up-regulate gene expression by
increasing TBP (or TFIID) recruitment and increasing kon, which may well be the case with the artificial
lex-TBP fusion transcription system (Chatterjee and Struhl 1995
), the
Zta trans-activator (Lieberman and Berk 1994
), and other
experimental systems cited in Chi et al. (1995)
.
Regrettably, our study does not provide any direct insight into the
precise molecular mechanisms responsible for transcription initiation
from the so-called TATA-less promoters (for review, see Smale et al.
1998
). However, we do believe that the functional definition of the
TATA element discussed earlier will serve as a useful tool for
identifying bona fide TATA-less promoters (i.e., sequences that are not
capable of forming the protein-DNA complex illustrated in Fig. 1A).
This definition does not preclude TBP binding to G/C-rich
sequences upstream of transcription start sites with a different
molecular recognition strategy. Although such a scenario represents a
formal possibility, it seems more likely that other components of TFIID
interact with the core promoters of class-II nuclear genes in the
absence of a functional TATA element. Alternatively, initiator
element-binding proteins could recruit components of the Pol II
transcription machinery to a TATA-less promoter.
| |
Conclusion |
|---|
|
|
|---|
This paper presents an atomic resolution analysis of the molecular mechanisms responsible for TATA element recognition by TBP during Pol II transcription initiation. Our work provides a detailed picture of how TBP exploits minor-groove interactions and formation of Hoogsteen base pairs to recognize thousands of octameric sequences while inducing a dramatic distortion in the DNA double helix. The structure of TBP bound to the deformed core promoter is independent of the origin of TBP and of the sequence of the TATA box, demonstrating that the Pol II PIC is assembled on a nucleoprotein foundation that has remained unchanged throughout evolution.
| |
Materials and methods |
|---|
|
|
|---|
Reagent preparation and crystallization
Wild-type TBP isoform 2 from A. thaliana was overexpressed
in Escherichia coli and purified to homogeneity (Nikolov et
al. 1992
). The 14-bp oligonucleotides containing variants of the AdMLP TATA box (Table 1) were prepared as described previously (Kim et al.
1993a
). Cocrystals were obtained by mixing an equimolar ratio of DNA
and TBP to form the complex at a final concentration of ~0.5
mM in 40 mM 2-(N-morpholino)ethane
sulfonic acid (MES) at pH 5.9, 60 mM or 100 mM KCl
(depending on the oligonucleotide), 4 mM MgCl2,
14% (vol/vol) glycerol, 300 mM ammonium
acetate, and 10 mM DTT and equilibrating against a reservoir
containing 12% (vol/vol) glycerol, 25 mM MES
(pH 5.9), and 10 mM DTT with sitting drop vapor diffusion at
4°C. Following seeding, plate-like cocrystals (0.8 × 0.8 × 0.1 mm) grew in weeks.
X-ray data collection, structure determination, and refinement
Diffraction data were obtained from flash-frozen cocrystals via the
oscillation method and integrated, scaled, and merged with
DENZO/SCALEPACK (Otwinowski and Minor 1997
). Most of the cocrystals (8/10) were isomorphous with our AdMLP
cocrystals (Kim et al. 1993a
) (P21: a = 41.8,
b = 146.7, c = 57.4,
= 90.5°, two complexes/asymmetric unit). The remaining two,
A(
31) and T(
24), adopted a similar but different lattice
packing arrangement (P21: a = 42,
b = 57, c = 147,
= 96°, two
complexes per asymmetric unit). Initial phases for the A(
31) and
T(
24) structures were determined by molecular replacement with the
AdMLP cocrystal structure (Kim et al. 1993a
) as the search model, after
removal of the altered base pair and nearby protein residues. The
remaining structures were phased directly with the search model.
XPLOR refinement (Brünger 1992b
) for each structure converged,
giving crystallographic R-factors of 18.2%-21.0% and free R-factors of 23.9%-27.5%, with excellent stereochemistry
(Table 2). At the final resolution limits, no restraints were placed on
sugar pucker, DNA backbone torsion angles, or hydrogen bonding of the
altered base pair. The electron density for the polypeptide backbones
is continuous everywhere at 1.3
in
(2|Fobserved|
|Fcalculated|)
difference Fourier syntheses (data not shown). PROCHECK (Laskowski et
al. 1993
) revealed no more than one or two unfavorable (
,
)
combinations in any given structure, and main-chain and side-chain
structural parameters consistently better than those expected at these
resolution limits (overall G-factor = 0.2-0.3). Atomic
coordinates have been submitted to the Protein Data Bank (PDB).
Construction of reporter gene plasmids with variant TATA elements
A synthetic promoter was constructed by introducing the AdMLP TATA element into the yeast Gal1 promoter with a Gal4-binding site [consensus UAS (5'-CGGAGGACTGTCCTCCG-3') or the MEL1 UAS (5'-CGGCCATATGTCTTCCG-3')] located 200 bp upstream. The synthetic promoters were placed immediately upstream of a lacZ reporter gene, yielding two reporter plasmids (pLS1 and pLS2), which differ only in the sequence of the Gal4-binding site. By use of pLS1 and pLS2 as parent plasmids, other constructs containing nine variant TATA elements were made by site-directed mutagenesis.
Reporter gene experiments
The reporter plasmids described above were transformed into yeast
strain Sc18 (GAL4, gal80, ura3-52, leu2-3,112, his3, trp, MEL1).
Transformed cells were grown on minimal media lacking uracil to mid-log
phase and then harvested as described previously (Vashee and Kodadek
1995
). The carbon source was 2% galactose, 3% glycerol, and 2%
lactic acid. Reported
-galactosidase values are accurate to
±15% and are the result of at least three independent, replicate measurements. For AdMLP, the absolute levels of
-galactosidase activity supported by the consensus and MEL1 UAS Gal4-binding sites
differed by an order of magnitude. In Table 1, each is listed as 100%
and the activity levels obtained with the other TATA elements are
appropriately normalized.
| |
Acknowledgments |
|---|
We thank Drs. L. Berman, M. Capel, and R.M. Sweet for help at beamline X25 at the National Synchrotron Light Source; Drs. S. Ealick and D. Thiel and the MacCHESS staff for help at beamlines A1 and F1 at the Cornell High Energy Synchrotron Source; E. Halay for help with DNA production; Drs. J. Bonanno, D. Jeruzalmi, J. Marcotrigiano, D.B. Nikolov, S.K. Nair, and X. Xie for useful discussions and much help with computing and figure preparation. For their many useful suggestions, we are grateful to Drs. K. Arndt, M. Brenowitz, P. Bucher, R.E. Dickerson, J. Kahn, L. Hurley, J. Kuriyan, W.K. Olson, G.A. Petsko, R.G. Roeder, P.B. Sigler, and M. Vasseur. This work was supported by the Howard Hughes Medical Institute (S.K.B.).
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
Footnotes |
|---|
Received September 2, 1999; revised version accepted October 28, 1999.
Present addresses: 3Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06510 USA; 4Kinetix Pharmaceuticals, Medford, Massachusetts 02155 USA.
5 Corresponding author.
Dedicated to the memory of Nikolaos Patikoglou.
E-MAIL burley{at}rockvax.rockefeller.edu; FAX (212) 327-8337.
| |
References |
|---|
|
|
|---|