Transcriptional control is at the heart of developmental, physiological, and disease processes, yet our understanding of transcriptional regulation is primarily limited to events occurring at gene promoters. It is now clear that canonical transcriptional control at promoters is only a portion of the regulation that occurs during normal transcription. Regulation of gene expression through changes in post-initiation events is likely to mediate a significant part of cellular and organismal processes. Furthermore, the control of transcriptional elongation not only affects gene expression, but it is also involved in the recruitment of chromatin modifiers that adjust chromatin structure, set histone marks and repair DNA damage.
How do co-transcriptional processes regulate RNA polymerase as it travels along gene bodies? How does this control result in the correct identity and subsequent fate of RNA transcripts? Are these events coordinated across the genome during the execution of gene expression programs? We study the molecular mechanism and functional consequence of transcription elongation within the context of the cell.
For a layman's version of our work, see the talk Stirling gave on the lab's research at an art gallery.
Visualizing global transcription with single-nucleotide resolution
Dissection of post-initiation regulatory mechanisms requires high-resolution strategies for precisely following transcripts as they are being produced. We established an approach, native elongating transcript sequencing (NET-seq), that exploits the extraordinary stability of the DNA-RNA-RNA polymerase ternary complex to capture nascent transcripts directly from live cells (Churchman and Weissman, Nature, 2011 and Mayer, di Iulio et al., Cell, 2015). The identity and abundance of the 3’ end of purified transcripts are revealed by deep sequencing thus providing a quantitative, strand-specific measure of RNA polymerase (RNAP) density with single nucleotide precision. NET-seq, by resulting in a non-perturbative measure of transcription initiation, elongation and termination, allows for the in-depth investigation of transcriptional complexities and provides insight into the in vivo dynamics of RNAP.
Figure 1. Yeast NET-seq method. Nascent RNA is co-purified via an immunoprecipitation (IP) of the RNAP II elongation complex. Conversion of RNA into DNA results in a DNA library with reverse-transcribed RNA as an insert between DNA sequencing linkers. b, Each sequencing read is mapped to the yeast genome and the position of the 3’ end of each RNA transcript (red dot in panel A) is determined. The number of reads ending at each position in the genome is quantified. The results for the coding region of RPL30 are shown. These data represent the RNAP II density at single nucleotide resolution.
Control of transcriptional elongation in S. cerevisiae
How are cellular factors involved in controlling transcription elongation and in coordinating co-transcriptional activities? We are using NET-seq to investigate how these factors affect transcriptional activity genome wide. Moreover, we are investigating how and when they work together to achieve transcriptional regulation. Finally, we are investigating how these factors (trans factors) and cis elements affect frequent transcriptional pausing. Beyond NET-seq, we are starting projects to interrogate the composition of the elongation complex using mass spectroscopy and single molecule fluorescence.
Figure 2. Cellular factors have a strong effect on transcription elongation. Dst1 as an example. A) A comparison of NET-seq data for wild-type and dst1Δ strains at the GPM1 gene. B) Cross correlation analysis between WT and dst1Δ data. (Churchman & Weissman, Nature, 2011)
Mechanisms of transcription elongation in human cells
Many questions remain about how transcription proceeds in human cells and how transcription elongation is coupled to other gene expression processes. How does transcriptional pausing affect alternative splicing? How does transcription termination occur in the human genome? Do nucleosomes create a barrier to transcription in human cells? How do unstable transcripts originate? Finally, how do the answers to all these questions relate to other aspects of the human genome, such as 3D structure, locations of regulatory regions and chromatin modifications? These are the basic questions that we are answering by applying NET-seq to human cells. Beyond answering these fundamental questions, we are interested in understanding how transcriptional activity changes as human cells differentiate in development and in cancer.
Coupling of transcription and splicing
In mammalian cells, alternative splicing of RNA transcripts allows extensive diversification and tailoring of cells’ protein repertoires, yet this process frequently goes awry in diseases ranging from cancer to neurological disorders. Splicing is largely co-transcriptional, and transcription shapes splicing outcomes. Two mechanisms have been proposed to mediate the observed coupling of transcription and splicing. In the recruitment model, alternative splicing outcomes arise through the physical interactions of different splicing factors with RNA polymerase II (Pol II). In the kinetic coupling model, transcription events such as pausing vary the elongation rate and thereby adjust the relative “windows of opportunity” for competing alternative splicing reactions. The recruitment and kinetic models are not mutually exclusive and combinations of the two could create a range of control mechanisms. The molecular mechanisms that control transcriptional pausing around exons and the factors that connect transcription elongation to splicing remain largely uncharacterized. NET-seq provides the nucleotide resolution necessary to resolve pausing events in single mammalian exons. Our NET-seq data show pronounced pausing at exon junctions, in a manner that correlates with the splicing fate of the exon. RNA polymerase not only detects the exon, but knows whether it will be retained in the RNA. We are now starting projects to determine how the pausing occurs and, ideally, to determine their functional consequence. This work will contribute to our understanding of the kinetic model of alternative splicing.
Figure 3. Heat maps and meta-exon analysis of HeLa S3 Pol II density across different types of exons. NET-seq signal from each exon (+/- 25 bp) is normalized to vary from 0 to 1 (white to black scale in the heatmaps). Solid lines on the meta-exon plots indicate the mean values and the gray shading represents the 95% confidence interval. (Mayer, di Iulio et al. Cell, 2015)
The regulation of mitochondrial gene expression
Intracellular descendants of engulfed α-proteobacteria, mitochondria have gradually transferred many of the genes from their genome to the nucleus, retaining a small but significant subset on a circular genome tucked within the mitochondrial matrix. As eukaryotic cells evolved, the mitochondrial genome diverged dramatically from its prokaryotic and eukaryotic nuclear counterparts to adopt a different genetic code, tRNA structure and RNA polymerase. As a result of this divergence, decades of groundbreaking research on eukaryotic nuclear and prokaryotic gene expression do not provide sufficient insight into the mechanisms that guarantee the faithful mitochondrial gene expression, which permits mitochondria to serve flexible roles in oxidative phosphorylation and signal transduction pathways, including the regulation of apoptosis.
Our goal is to directly observe all stages of mitochondrial gene expression and determine their regulatory mechanisms. For the nuclear genome, this goal would be formidable, but for the mitochondrial genome it seems feasible as it expresses a handful of proteins, using machinery that is simpler than nuclear counterparts and controlled by fewer factors. We are developing a systematic approach that exploits recent advances in molecular biology to dissect mitochondrial gene expression.
Massively parallel sequencing has allowed whole genome analyses that have brought new insight into how nuclear gene expression is regulated, but these studies tended to overlook the mitochondrial genome through their experimental design or their computational analysis. We are adapting and applying a select set of quantitative genomic approaches to fully dissect the mitochondrial gene expression process, from DNA to protein. In this manner, we plan to map the landscape of mitochondrial gene expression in both budding yeast and human cells using a combination of high-resolution, quantitative approaches that query across the mitochondrial genome and the mitochondrial-encoded proteome.