Areas of research in the Churchman lab
1) Visualizing global transcription with single-nucleotide resolution
Dissection of post-initiation regulatory mechanisms requires high-resolution strategies for precisely following transcripts as they are being produced. We established an approach, native elongating transcript sequencing (NET-seq), that exploits the extraordinary stability of the DNA-RNA-RNA polymerase ternary complex to capture nascent transcripts directly from live cells (Churchman and Weissman, Nature, 2011 and Mayer, di Iulio et al., Cell, 2015). The identity and abundance of the 3’ end of purified transcripts are revealed by deep sequencing thus providing a quantitative, strand-specific measure of RNA polymerase (RNAP) density with single nucleotide precision. NET-seq, by resulting in a non-perturbative measure of transcription initiation, elongation and termination, allows for the in-depth investigation of transcriptional complexities and provides insight into the in vivo dynamics of RNAP.
Control of transcriptional elongation in S. cerevisiae
How are cellular factors involved in controlling transcription elongation and in coordinating co-transcriptional activities? We are using NET-seq to investigate how these factors affect transcriptional activity genome wide. Moreover, we are investigating how and when they work together to achieve transcriptional regulation. Finally, we are investigating how these factors (trans factors) and cis elements affect frequent transcriptional pausing.
Mechanisms of transcription elongation in human cells
Many questions remain about how transcription proceeds in human cells and how transcription elongation is coupled to other gene expression processes. How does transcriptional pausing affect alternative splicing? How does transcription termination occur in the human genome? Do nucleosomes create a barrier to transcription in human cells? How do unstable transcripts originate? Finally, how do the answers to all these questions relate to other aspects of the human genome, such as 3D structure, locations of regulatory regions and chromatin modifications? These are the basic questions that we are answering by applying NET-seq to human cells. Beyond answering these fundamental questions, we are interested in understanding how transcriptional activity changes as human cells differentiate in development and in cancer.
Coupling of transcription and splicing
In mammalian cells, alternative splicing of RNA transcripts allows extensive diversification and tailoring of cells’ protein repertoires, yet this process frequently goes awry in diseases ranging from cancer to neurological disorders. Splicing is largely co-transcriptional, and transcription shapes splicing outcomes. NET-seq provides the nucleotide resolution necessary to resolve pausing events in single mammalian exons. Our NET-seq data show pronounced pausing at exon junctions, in a manner that correlates with the splicing fate of the exon. RNA polymerase not only detects the exon, but knows whether it will be retained in the RNA (Mayer, di Iulio et al., Cell, 2015). We are now starting projects to determine how the pausing occurs and, ideally, to determine their functional consequence. This work will contribute to our understanding of the kinetic model of alternative splicing.
For a layman's version of our transcription work, see the talk Stirling gave on the lab's research at an art gallery.
2) The regulation of mitochondrial gene expression
Intracellular descendants of engulfed α-proteobacteria, mitochondria have gradually transferred many of the genes from their genome to the nucleus, retaining a small but significant subset on a circular genome tucked within the mitochondrial matrix. As eukaryotic cells evolved, the mitochondrial genome diverged dramatically from its prokaryotic and eukaryotic nuclear counterparts to adopt a different genetic code, tRNA structure and RNA polymerase. As a result of this divergence, decades of groundbreaking research on eukaryotic nuclear and prokaryotic gene expression do not provide sufficient insight into the mechanisms that guarantee the faithful mitochondrial gene expression, which permits mitochondria to serve flexible roles in oxidative phosphorylation and signal transduction pathways, including the regulation of apoptosis.
Oxidative phosphorylation complexes pose a unique challenge for the cell, because their subunits are encoded on the nuclear genome and on the mitochondrial genome. Recently we showed that the mitochondrial and nuclear transcription programs are not coordinated during mitochondrial biogenesis in S. cerevisiae. Rather there are synchronized translation programs across compartments (Couvillion et al. Nature, 2016). In human cells, it remains unclear whether translation programs will be coordinated due to the vast differences between yeast and human mitochondrial gene expression, especially with respect to translation regulation. Furthermore, many mysteries remain about human mitochondrial translation, including how initiation occurs without 5’ leader sequences. To investigate mitochondrial translation in human cells, we are ribosome profiling to robustly capture the unique human mitoribosome. We hope to reveal the dynamics of human mitochondrial translation, shedding light on mitochondrial translation initiation and the precision of mitochondrial translation termination. Experiments are ongoing to determine how human mitochondrial translation responds to stress and disease conditions and whether mitochondrial and cytosolic translation programs are co-regulated in human cells.
For a quick overview, see our video on HMS news.
3. Computational approaches to maximally extract information from functional genomics datasets
Numerous advances in sequencing technologies have revolutionized genomics through generating many methods, such as ChIP-seq, RNA-seq and NET-seq, that report on different aspects of the genome, epigenome and gene expression. Statistical tools have been developed to analyze individual data types, but there lack strategies to integrate disparate datasets under a unified framework. Moreover, most analysis techniques heavily rely on feature selection and data preprocessing which increase the difficulty of addressing biological questions through the integration of multiple datasets. We are actively developing frameworks that address this fundamental bottleneck in genomics research using various strategies, including deep neural networks. Recently we developed FIDDLE (Flexible Integration of Data with Deep LEarning) an open source data-agnostic flexible integrative framework that learns a unified representation from multiple data types to infer another data type (Eser and Churchman, bioRxiv 081380, 2016). As a case study, we use multiple Saccharomyces cerevisiae functional genomic datasets to predict global transcription start sites (TSS) through the simulation of TSS-seq data. We demonstrate that a type of data (e.g. TSS-seq data) can be inferred from other sources of data types (e.g. NET-seq, ChIP-seq etc.) without manually specifying the relevant features and preprocessing. We find the average relative entropy of our model’s predictions is almost equal to the relative entropy obtained by comparing two biological replicate datasets. Furthermore, we show that models built from multiple genome-wide datasets perform profoundly better than models built from individual datasets. FIDDLE learns the complex epistatic relationship within individual datasets and, importantly, across datasets. We are now expanding FIDDLE to the human genome where we aim to create models that will permit cross cell type predictions.
For more, see Umut present on his work at the Broad's MIA lecture series.