Transcription Control in Eukaryotes: 2009

Sunday, April 12, 2009

Transcription Control in Eukaryotes

Transcription in eukaryotes differs from that in
prokaryotes in two main respects. In eukaryotes,
one gene codes for a single polypeptide
(monocistronic transcription unit) and the initial
transcript is processed into mature messenger
mRNA.

Prototype of a eukaryotic structural gene

A structural gene is a gene that codes for a polypeptide
gene product. It can be divided into sections
involved in transcription (transcription
unit) and regulatory sequences. Regulatory
sequences are located both upstream (the 5!
direction) and downstream (the 3! direction) of
the gene. In addition, internal regulatory
sequences may occur in introns. Some regulatory
sequences are located far from the gene.
Together with the promoter

Prototype of mature eukaryotic mRNA

Mature eukaryotic mRNA is produced from its
precursor RNA by the removal of introns, addition
of a 5! cap at the 5! end, and addition of
numerous adenine nucleotides at the 3! end
(polyadenylation). A noncoding sequence (5!
leader) is located in front of the translation start
signal (AUG), and a trailer sequence, at the 3!
end in back of the translation stop signal (UAA).
Both addition of the 5! cap and polyadenylation
involve enzymatic reactions.

7-Methyl-guanosine cap

The translation of eukaryotic mRNA is similar to
that of prokaryotic mRNA, with two distinct
differences: (1) transcription and translation
occur at different locations in the eukaryotic
cell: transcription occurs in the cell nucleus,
and translation in the cytoplasm; (2) the 5! and
3! ends of eukaryotic mRNA have special structures.
The structure at the 5! end is called a cap.
Through the action of guanosine-7-methyltransferase,
guanosine is bound by a triphosphate
bridge to the first and second ribose
groups of the precursor mRNA chain. The guanosine
is methylated in position 7, as are the
two initial ribose residues at the beginning of
the RNA chain. Except for the mRNAs transcribed
by DNA viruses, eukaryotic mRNA usually
contains a single protein-coding sequence
(monocistronic messenger).

Polyadenylation at the 3! end

Eukaryotic termination signals have been less
well recognized than the regulators of gene activity
at the 5! end. Eukaryotic primary transcripts
are split by a specific endonuclease
shortly after the sequence AAAUAA. Subsequently,
about 100 –250 adenine nucleotides
are attached to the 3! end of the transcript by
means of a poly(A)-polymerase (polyadenylation).
The poly(A) end binds to a protein. All
mRNAs, except those that code for histone proteins,
possess a poly(A) terminus.

Regulation of Gene Expression in Eukaryotes

Precisely regulated gene expression is a prerequisite
for producing and maintaining the
many different types of cells and tissues of a
multicellular organism. Cells differentiate into
their particular cell types by means of combinations
of expressed and repressed genes. During
differentiation the tightly regulated genes function
in the order, usually sequential, required
for a particular cell fate (developmental pathways).
Many regulator genes and their proteins
have been identified (cf. part III, Genetics and
Medicine). The following outlines some important
principles of the specific control of gene expression
in eukaryotic cells.

Levels of control of eukaryotic gene expression

In principle, expression can be regulated at four
distinct levels. The first and by far the most important
is primary control of transcription.
Processing to mature RNA can be regulated at
the level of the primary RNA transcript. A
frequently observed process is alternative splicing
(see D). Translation can be varied by RNA
editing (see B for an example). At the protein
level, posttranslational modifications can determine
the activity of a protein. The cleavage of
preproinsulin to form mature insulin, glycosylation
or hydroxylation, and protein folding

RNA editing

RNA editing modifies genetic information at the
RNA level. An important example is the apolipoprotein-
B gene involved in lipid metabolism. It
encodes a protein of 4538 amino acids, apolipoprotein
B. This is synthesized in the liver and
secreted into the blood, where it transports
lipids. A related shorter form of the protein with
2153 amino acids, Apo B-48 (250 kDa, instead of
512 kDa for Apo B-100), is synthesized in the intestine.
An intestinal deaminase converts a cytosine
in codon 2158, CAA (glutamine), to uracil
(UAA). This change results in a stop codon (UAA)
and thereby terminates translation at this site.

Long-range gene activation by an enhancer

Enhancers control gene activity at a distance.
An enhancer is a distant site involved in initiation
of transcription (see p. 206). It may be located
either upstream or downstream of the
same DNA strand (cis-acting) or on a different
DNA strand (trans-acting). Enhancer elements
provide tissue-specific or time-dependent regulation.
It is unclear how enhancers can exert
their effect from a considerable distance. One
model suggests that DNA forms a loop between
enhancer and promoter. Activator proteins
bound to the enhancer, e.g., a steroid hormone,
could then come into contact with the general
transcription factor complex at the promoter.
Others might function as repressors

Alternative RNA Splicing

A DNA segment can code for different forms of
mRNA when different introns are removed from
the primary transcript (alternative splicing). By
means of alternative gene splicing, a gene can
code for different, albeit similar gene products.
This allows a high degree of functional flexibility.
Numerous examples of differential RNA
splicing are known for mammalian genes. For
example, the primary transcript for the calcitonin
gene contains six exons. They are spliced
into two different types of mature mRNA. One,
consisting of exons 1–4 (but not exons 5 and 6),
is produced in the thyroid and codes for calcitonin.
The other consists of exons 1, 2, 3, 5, and
6, but not exon 4. It codes for a calcitonin-like
protein in the hypothalamus (calcitonin generelated
product, CGRP).

DNA-Binding Proteins

Regulatory DNA sequences interact with proteins
to exert proper functional control. Regulatory
proteins can recognize specific DNA
sequences because the surface of the proteins
fits precisely onto the DNA surface. Three basic
groups of regulatory DNA sequences can be distinguished:
(1) sequences that establish the
exact beginning of translation; (2) DNA segments
that regulate the end, or termination;
and (3) DNA sequences near the promoter that
have specific effects on gene activity (repressors,
activators, enhancers, and others).

Binding of a regulatory protein to DNA

Gene regulatory proteins can recognize DNA
sequence information without having to open
the hydrogen bonds within the helix. Each base
pair represents a distinctive pattern of hydrogen
bond donors (example shown in red) and
hydrogen acceptors (example shown in green).
These proteins recognize the major groove of
DNA, where binding takes place. Here a single
contact of an asparagine (Asn) of a gene-regulatory
protein with a DNA base adenine (A) is
shown. A typical area of surface-to-surface contact
involves 10–20 such interactions.

An ! helix inserts into a major

groove of operator DNA
One part of the protein, an ! helix (the
sequence-reading or recognition helix) is inserted
into the major groove of DNA. Here the
sequence Q-Q-Q-S-T (glutamine Q, serine S,
threonine T) in the recognition sequence of the
bacteriophage 434 repressor bonds with
specific bases in a major groove of operator
DNA.

Zinc finger motif

Another group of proteins are called zinc fingers
because they resemble fingers (see D). They are
involved in important functions during embryonic
development and differentiation. The basic
zinc finger motif consists of a zinc atom connected
to four amino acids of a polypeptide
chain. Here, two histidine (H) and two cysteine
(C) residues are shown in the schema on the
left. The three-dimensional structure on the
right consists of an antiparallel " sheet (amino
acids 1–10), an ! helix (amino acids 12–24),
and the zinc connection. Four amino acids, cysteines
3 and 6 and histidines 19 and 23, are
bonded to the zinc atom and hold the carboxy
(COOH) end of the ! helix to one end of the "
sheet.

Zinc finger proteins bind to DNA

The interaction with DNA is strong and specific.
Each protein recognizes a specific DNA
sequence. As the number of zinc fingers can be
varied, this type of DNA-binding has great evolutionary
flexibility.

Binding to a response element

Many hormones and growth factors activate
cell-surface receptors. In contrast, steroid hormones
enter the target cells and interact there
with a specific receptor protein in the cytosol.
The hormone–receptor complex then migrates
to specific sites of DNA. The hormone-binding
domain will prevent binding to DNA unless the
hormone is present. Activated receptors bind to
specific DNA sequences called hormone response
elements (HREs). Each polypeptide
chain of the receptor contains a zinc atom
bound to four cysteines (1). The skeletal model
shows the two DNA-binding domains binding
to the HRE in two adjacent major grooves of the
target DNA (2). The space-filling model shows
how tightly the recognition helix of each dimer
of this protein fits into the major groove of DNA
shown in red and green

Other Transcription Activators

Transcription activators are dimeric proteins
with distinct functional domains: a DNA-binding
domain and an activation domain. The DNAbinding
domain interacts with specific regulatory
DNA sequences. The activation domain interacts
with other proteins that stimulate transcription.
Transcription activators participate in
the assembly of the initiation complex, for example,
by stimulating the binding of transcription
factor IID (TFIID, see p. 212) to the promoter.
Other activators may interact with
general transcription factors. They provide a
second level of transcriptional control.

Leucine zipper dimer

Most DNA-binding regulatory proteins recognize
specific sites as dimers. One part of the
molecule serves as the recognition molecule,
the other stabilizes the structure. A particularly
striking example is given by proteins with a
leucine zipper motif. The name is derived from
the basic structure. Two ! helices are joined like
a zipper by periodically repeated leucine residues
located at the interface of the two helices.
The two helices separate, form a Y-shaped
structure, and extend into the major groove of
the DNA (1). Leucine zipper proteins may be homodimers
with identical subunits (2, 3) or heterodimers
with different albeit similar subunits
(4). The ability to form unlike dimers (heterodimerization)
greatly expands the spectrum of
specificites. The use of combinations of different
proteins to control cellular functions is
called combinatorial control. (Figure redrawn
from Alberts et al., 1994).
A DNA-binding motif related to the leucine zipper
is the helix–loop–helix (HLH) motif (not
shown). The HLH motif consists of one short !
helix and one longer ! helix. The two ! helices
are connected by a flexible loop of protein.

Activation by steroid hormone binding

Transcriptional enhancers are regulatory regions
of DNA that increase the rate of transcription.
Their spacing and orientation vary relative
to the starting point of transcription. An enhancer
is activated by binding to a hormone–receptor
complex. This activates the promoter,
and transcription begins (active gene). Numerous
important genes in mammalian development
are regulated by steroids (steroid-responsive
transcription). The latter include glucocorticoids
and mineralocorticoids, the steroids of
glycogen and mineral metabolism; sex hormones,
which function in embryonic sex differentiation
and control of reproduction; and
others. Normal bone development and function
are under the control of steroidlike vitamin D.
Another steroidlike hormone is retinoic acid, an
important regulator of differentiation during
embryogenesis (morphogen). These hormones
initiate their physiological effects by association
with corresponding steroid-specific transcellular
receptors (hormone–receptor complex).

Evidence of a protein-binding region in DNA

Protein-binding regions in DNA represent regulatory
areas; thus, their analysis can yield some
insights into gene regulation. Protein-binding
DNA regions can be demonstrated in several
ways. With band-shift analysis (1), proteinbound
and non-protein-bound DNA fragments
are differentiated using gel electrophoresis in
direction towards the small fragments, a DNA
fragment that is part of a DNA-protein complex
migrates more slowly than a free DNA fragment
of the same size. The DNA-protein complex is
found at a different position (“band shift”). DNA
footprinting (2) is another procedure for identifying
protein-binding sites on DNA. The principle
of DNA footprinting is that a proteinbound
DNA region, e.g., the polymerase-promoter
complex, is protected from the effects of
a DNA-cleaving enzyme (DNAase I). Previously
isolated DNA is cut into different fragments by
DNAase, and the fragments are sorted according
to size by gel electrophoresis. Since the DNA
protein-binding region is protected from cleavage
by DNAase I (DNAase I protection experiment),
DNA bands from the binding region are
missing (“footprint”).

Transcription Control in Eukaryotes

Sunday, April 12, 2009