39 Regulation of gene expression
The evolution of Eukaryotes brought with it many changes to organismal form and function. As a general rule, eukaryotic cells are larger and more complex than either bacteria or Archaea. Eukaryotes are also frequently multicellular, which creates options for cell and tissue specialisation that are not possible in a single-celled organism. The increase in complexity that comes with multicellularity also required an evolution of the eukaryotic genome. As a result, the eukaryotic genome tends to be quite a bit larger.
Genes and their gene products are heavily regulated at every stage of their life cycle.
The cell determines not only when they will be transcribed and translated but also at what speed this will happen and for how long. Once the proteins have been synthesised, they continue to be regulated through post translational modifications, such as phosphorylation.
Dive deeper
Watch this video to review DNA transcription and translation: CrashCourse. (2012, April 10). DNA, hot pockets, & the longest word ever: Crash course biology #11 [YouTube, 14:07mins]
How are genes and proteins are regulated?
- Transcriptional control determines when and how often genes are transcribed
- RNA processing control determines which combinations of introns/exons are produced, so different proteins can be made from the same gene
- Once the processed mRNA leaves the nucleus, the cell continues to regulate the gene products by controlling
- when and how translation happens,
- when the mRNA is degraded,
- what kinds of post-translational modifications take place
- when the protein is tagged for destruction

Transcriptional control
Eukaryotic cells deal with the overwhelming amount of DNA in their nuclei by tightly packing the DNA away in the form of heterochromatin. By packing DNA tightly, the cell can influence how accessible genes are to the transcription machinery. Epigenetics is the study of how gene expression can be regulated at the chromatin level. This kind of regulation can be inherited.
Heterochromatin transitions to euchromatin so that genes in the area can be transcribed. This is facilitated by a combination of proteins known as histone-modifying enzymes and chromatin-remodelling complexes. Modification of histone tails regulates chromatin packing (Figure 8.6). Each of these has different effects on the histones, which, in turn, will impact the availability of the genes in that region. For example, acetylation is usually associated with an increase in gene expression—the changes in electrical charge that are the result of acetylation will reduce the ability of the histone to interact efficiently with the negatively charged DNA. Phosphorylation also changes the charge of the histone, so it can work similarly to acetylation. Phosphorylation of histones has specifically been shown to play a role in DNA repair mechanisms and the extreme DNA packing that is required during mitosis and meiosis. Additionally, to pack and unpack the DNA, the histones need to be shifted around or even removed so that the DNA can be accessed. Chromatin-remodelling complexes use ATP to drive reactions that affect nucleosome location and/or structure.

Transcription factors provide an important additional level of control. Not only are general transcription factors required to allow the RNA polymerase to bind to the DNA for initiation, but additional gene-specific transcription factors can enhance or inhibit transcription.
The structure of a Eukaryotic gene
All genes have two major regions: the regulatory region, where all regulatory sequences reside, including the site where the promoter is located, and the transcribed region, from which the gene transcript (mRNA, tRNA, rRNA) is derived (Figure 8.7).

Reflective question
If only one of the DNA strands is ever used for transcription, how do we identify which strand is which?
The template strand is the DNA strand that the polymerase will physically bind to and use as the template to transcribe the RNA. The sequence on this strand is complementary to the RNA sequence. The sequence on the other DNA strand, the coding strand, has the same sequence as the RNA, except that it has base T in the DNA instead of the U in RNA. (This strand is much easier for molecular biologists to refer to when doing genetic research. We say that it ‘carries the code’ that is in the same direction that the RNA will be read to translate the protein). The coding strand in the 5′ to 3′ direction is the direction in which the ribosome reads the mRNA during translation.
The product of transcription is an RNA transcript. A gene that codes for a protein, so the RNA produced from that gene is known as messenger RNA (mRNA) (Figure 8.7). MRNA also has several key sections you should know:
- 5’ and 3’ untranslated regions (UTRs)— sequences upstream and downstream of the protein coding region on the RNA. They are used to help the ribosome hold onto the RNA.
- The protein coding region is the section that will eventually become the protein once the mRNA is exported to the cytosol and combined with the ribosome.
- Translation start site—the site of the start codon, which will initiate translation. The start codon marks the beginning of the protein coding region. (Note: the start codon is not in the same spot as the +1 site on the DNA. The 5’ UTR comes first, and the start codon is downstream of that).
- Translation stop site — is where the stop codon is located and the end of the protein coding region. Once read, the ribosome is released from the RNA, and translation is terminated.
- In between the start and stop codon, there are introns and exons. Introns are intervening sequences, which will be removed before the mRNA is mature and ready to be sent to the cytosol for translation. Exons are left in the sequence so that they can be expressed.
There are many genes in the genome that code for RNA only (i.e., the RNA is transcribed but will not be translated by the ribosome). For example, the building of proteins also requires ribosomal RNA (rRNA), which is what the catalytic regions of the ribosome are made of, and transfer RNA (tRNA), which is covalently bound to the amino acids and will be used to translate the mRNA codons into a sequence of amino acids. In addition to these three types, a number of additional forms of RNA have also been discovered in recent years, and virtually none of them code for protein. RNA that is not going to be used as mRNA will not have a translation start/stop site, nor will it have introns or exons. Additionally, RNA that is not going to be used for building proteins should generally not be discussed in terms of codons, as that is a language for translation. As little as 1–2% of the human genome is thought to actually code for proteins.
Transcription factors control when and how transcription happens
Transcription requires a number of different proteins, in addition to the RNA polymerase, to bind to the DNA. A eukaryotic transcription complex assembled on the DNA in the regulatory region of the gene when it is ready to begin transcribing DNA into RNA (Figure 8.8). The key components:
- Chromatin-remodelling complexes help shift or remove nucleosomes to allow access to the DNA.
- General transcription factors help the RNA polymerase to bind, and other transcription regulators determine when gene expression is activated, and to what level.
- Mediator is a large protein that can act as a hub to bind general transcription factors as well as other transcription regulators together. This is helpful, since some of the regulatory DNA they bind to can be far away on the linear strand.
These all assemble to form a multipart complex to initiate transcription.

Gene expression can be controlled by a number of different types of transcription regulators:
- Activator proteins bind to enhancer regions on the DNA
- Repressor proteins bind to suppressor regions on the DNA
- Cofactors (with other regulatory proteins) change the transcriptional response of the gene
- Histone modifying enzymes chemically modify the histones to facilitate additional changes to the chromatin
- Chromatin modelling complexes bind to nucleosomes and help open up the DNA and make it accessible for transcription
While some of these regulatory proteins will be present in all genes, each gene has its own unique set of regulatory sequences by which it is controlled and may only include a subset of the ones in the list. These sequences are often spread over hundreds to thousands of base pairs, and they accomplish very complex regulatory tasks. The following are some examples:
- different genes can be transcribed at different rates,
- the same gene can be transcribed at different rates in different tissues,
- the same gene can be transcribed at different rates at different times during development in the same tissue, and
- some genes will not be transcribed at all, as they are not required in that particular cell type or at that stage in development.
The binding of different combinations of transcription regulators in different tissues and during different times in development is what allows such flexibility in the expression of eukaryotic genes. This concept is often referred to as combinatorial control.
Post-transcriptional control: mRNA processing
The physical separation of transcription (in the nucleus) and translation (in the cytosol) in Eukaryotes has created space for increased flexibility in gene expression, as well as providing additional protection from mutation. RNA processing is extremely complex and an area of active research.
The first and most important concept to remember is that all transcripts—mRNA, rRNA, tRNA, and all other forms of noncoding (nc)RNA—are processed in the nucleus before they are exported to the cytoplasm. Each type of RNA will require unique processing steps. Processing may include any of the following modifications, depending on the class of RNA:
- addition of sequences (e.g., 5′ cap and poly[A] tail in mRNA)
- cleavage of the transcript into several pieces (rRNA)
- removal of some sequences
- splicing (this is how introns are removed from mRNA)
To simplify the rest of this section, we will focus solely on the processing of mRNA. Remember that other types of RNA (mRNA, tRNA, rRNA, and other ncRNA) will have their own unique steps to complete before they are considered mature RNA.
mRNA processing
There are three major processing events that are required before a pre-mRNA is considered to be mature and ready for export (Figure 8.9):
- RNA capping at the 5’ end of the RNA. The 5’ cap consists of a modified guanosine (G) with an extra methyl group attached to it, which is joined to the initial 5′ nucleotide of the nascent RNA using a triphosphate linkage. This is added by enzymes that are part of the RNA polymerase complex right at the start of transcription, when the transcript is still only about 25 nucleotides long.
- Polyadenylation A poly(A) (polyadenylic acid) tail of about 100–200 adenylic acid (A) residues is added near the 3′ end of the primary transcript. There is a specific base sequence (AATAA) in the 3’ end of mRNA that acts as the signal site. That sequence is recognised by a specific endonuclease (an enzyme that cuts nucleic acids). The endonuclease cuts the transcript 20–30 bases downstream of the recognition sequence and then adds the A residues.
- Splicing During splicing, portions of the coding region of the mRNA transcript are removed.

The roles of RNA capping and polyadenylation are similar; they both serve to increase stability of the final mRNA molecule and to identify it as a completed, mature transcript that is ready to be exported out of the nucleus. Nuclear export proteins will need to bind to these regions of the mRNA transcript in order to facilitate mRNA export for the nucleus.
mRNA splicing is when a portion of the RNA is excised from the coding region of the transcript, leaving behind a shorter mRNA that will be used for translation. A typical coding region in a primary mRNA transcript will include the following:
- Introns (which stands for intervening sequences) are noncoding RNA segments that are recognized and removed from the primary transcript. Usually, 75–80% of the initial primary mRNA transcript is lost as a result of splicing. In some cases, it has been shown to be as much as 95%.
Despite introns being noncoding (not translated into protein), they often carry important information such as regulatory sequences that regulate the gene in which they sit. They can also regulate other genes that are upstream or downstream of that site.
- Exons (which stands for expressed sequences) are the coding sequences that are left behind in the transcript. They contain the sequence that codes for the protein and are destined for export to the cytoplasm.
A gene could have many exons (some genes have more than 50!) that are joined together to produce a processed transcript.
Mechanism of RNA splicing
It is interesting that the required sequences for splicing are quite short (compared to the length of the genes themselves) and have a lot of variation built into them, and yet intron removal is an extremely precise process (Figure 8.10). A large complex known as the spliceosome binds the ends of the introns, cutting them out, and then rejoining the ends of the exons. This complex combines both proteins and special protein-RNA complexes called snRNPs (small nuclear ribonucleoproteins), pronounced “snurps.”

In Eukaryotes, virtually all protein coding genes are made of a combination of introns and exons. There are thought to be several advantages to this, including the fact that it may protect against mutations impacting protein sequence. Another advantage, for which we see the evidence in many genomes, is the ability to produce multiple different variations of a single protein, all of which can be transcribed using the same gene. This is known as alternative splicing and it is a relatively common occurrence in Eukaryotes—about 95% of all human protein coding genes are thought to be involved in alternative splicing. Differences in splicing patterns are often to produce tissue- or developmental stage–specific protein variants.

Dive deeper
Watch: Amoeba Sisters. (2024, October 1). Gene expression and regulation [YouTube, 9:54mins]