1
|
Hummel NFC, Markel K, Stefani J, Staller MV, Shih PM. Systematic identification of transcriptional activation domains from non-transcription factor proteins in plants and yeast. Cell Syst 2024; 15:662-672.e4. [PMID: 38866009 DOI: 10.1016/j.cels.2024.05.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 04/26/2024] [Accepted: 05/22/2024] [Indexed: 06/14/2024]
Abstract
Transcription factors can promote gene expression through activation domains. Whole-genome screens have systematically mapped activation domains in transcription factors but not in non-transcription factor proteins (e.g., chromatin regulators and coactivators). To fill this knowledge gap, we employed the activation domain predictor PADDLE to analyze the proteomes of Arabidopsis thaliana and Saccharomyces cerevisiae. We screened 18,000 predicted activation domains from >800 non-transcription factor genes in both species, confirming that 89% of candidate proteins contain active fragments. Our work enables the annotation of hundreds of nuclear proteins as putative coactivators, many of which have never been ascribed any function in plants. Analysis of peptide sequence compositions reveals how the distribution of key amino acids dictates activity. Finally, we validated short, "universal" activation domains with comparable performance to state-of-the-art activation domains used for genome engineering. Our approach enables the genome-wide discovery and annotation of activation domains that can function across diverse eukaryotes.
Collapse
Affiliation(s)
- Niklas F C Hummel
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA; Feedstocks Division, Joint BioEnergy Institute, Emeryville, CA 94608, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Department of Biology, Technische Universität Darmstadt, 64287 Darmstadt, Germany
| | - Kasey Markel
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA; Feedstocks Division, Joint BioEnergy Institute, Emeryville, CA 94608, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jordan Stefani
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Max V Staller
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA; Center for Computational Biology, University of California, Berkeley, CA 94720, USA; Chan Zuckerberg Biohub-San Francisco, San Francisco, CA 9415, USA.
| | - Patrick M Shih
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA; Feedstocks Division, Joint BioEnergy Institute, Emeryville, CA 94608, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA.
| |
Collapse
|
2
|
Morffy N, Van den Broeck L, Miller C, Emenecker RJ, Bryant JA, Lee TM, Sageman-Furnas K, Wilkinson EG, Pathak S, Kotha SR, Lam A, Mahatma S, Pande V, Waoo A, Wright RC, Holehouse AS, Staller MV, Sozzani R, Strader LC. Identification of plant transcriptional activation domains. Nature 2024:10.1038/s41586-024-07707-3. [PMID: 39020176 DOI: 10.1038/s41586-024-07707-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 06/12/2024] [Indexed: 07/19/2024]
Abstract
Gene expression in Arabidopsis is regulated by more than 1,900 transcription factors (TFs), which have been identified genome-wide by the presence of well-conserved DNA-binding domains. Activator TFs contain activation domains (ADs) that recruit coactivator complexes; however, for nearly all Arabidopsis TFs, we lack knowledge about the presence, location and transcriptional strength of their ADs1. To address this gap, here we use a yeast library approach to experimentally identify Arabidopsis ADs on a proteome-wide scale, and find that more than half of the Arabidopsis TFs contain an AD. We annotate 1,553 ADs, the vast majority of which are, to our knowledge, previously unknown. Using the dataset generated, we develop a neural network to accurately predict ADs and to identify sequence features that are necessary to recruit coactivator complexes. We uncover six distinct combinations of sequence features that result in activation activity, providing a framework to interrogate the subfunctionalization of ADs. Furthermore, we identify ADs in the ancient AUXIN RESPONSE FACTOR family of TFs, revealing that AD positioning is conserved in distinct clades. Our findings provide a deep resource for understanding transcriptional activation, a framework for examining function in intrinsically disordered regions and a predictive model of ADs.
Collapse
Affiliation(s)
| | - Lisa Van den Broeck
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA
| | - Caelan Miller
- Department of Biology, Duke University, Durham, NC, USA
| | - Ryan J Emenecker
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
| | - John A Bryant
- Biological Systems Engineering, Virginia Tech, Blacksburg, VA, USA
| | - Tyler M Lee
- Department of Biology, Duke University, Durham, NC, USA
| | | | | | - Sunita Pathak
- Department of Biology, Duke University, Durham, NC, USA
| | - Sanjana R Kotha
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Angelica Lam
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Saloni Mahatma
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA
| | - Vikram Pande
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA
| | - Aman Waoo
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA
| | - R Clay Wright
- Biological Systems Engineering, Virginia Tech, Blacksburg, VA, USA
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
| | - Max V Staller
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Rosangela Sozzani
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA
| | | |
Collapse
|
3
|
Naderi J, Magalhaes AP, Kibar G, Stik G, Zhang Y, Mackowiak SD, Wieler HM, Rossi F, Buschow R, Christou-Kent M, Alcoverro-Bertran M, Graf T, Vingron M, Hnisz D. An activity-specificity trade-off encoded in human transcription factors. Nat Cell Biol 2024:10.1038/s41556-024-01411-0. [PMID: 38969762 DOI: 10.1038/s41556-024-01411-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/20/2024] [Indexed: 07/07/2024]
Abstract
Transcription factors (TFs) control specificity and activity of gene transcription, but whether a relationship between these two features exists is unclear. Here we provide evidence for an evolutionary trade-off between the activity and specificity in human TFs encoded as submaximal dispersion of aromatic residues in their intrinsically disordered protein regions. We identified approximately 500 human TFs that encode short periodic blocks of aromatic residues in their intrinsically disordered regions, resembling imperfect prion-like sequences. Mutation of periodic aromatic residues reduced transcriptional activity, whereas increasing the aromatic dispersion of multiple human TFs enhanced transcriptional activity and reprogramming efficiency, promoted liquid-liquid phase separation in vitro and more promiscuous DNA binding in cells. Together with recent work on enhancer elements, these results suggest an important evolutionary role of suboptimal features in transcriptional control. We propose that rational engineering of amino acid features that alter phase separation may be a strategy to optimize TF-dependent processes, including cellular reprogramming.
Collapse
Affiliation(s)
- Julian Naderi
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute of Chemistry and Biochemistry, Department of Biology, Chemistry and Pharmacy, Freie Universität Berlin, Berlin, Germany
| | - Alexandre P Magalhaes
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Gözde Kibar
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Gregoire Stik
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
- Josep Carreras Leukaemia Research Institute, Badalona, Spain
| | - Yaotian Zhang
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Sebastian D Mackowiak
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Hannah M Wieler
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Francesca Rossi
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Rene Buschow
- Microscopy Core Facility, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Marie Christou-Kent
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Marc Alcoverro-Bertran
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Thomas Graf
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Denes Hnisz
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany.
| |
Collapse
|
4
|
Cornwell AB, Zhang Y, Thondamal M, Johnson DW, Thakar J, Samuelson AV. The C. elegans Myc-family of transcription factors coordinate a dynamic adaptive response to dietary restriction. GeroScience 2024:10.1007/s11357-024-01197-x. [PMID: 38878153 DOI: 10.1007/s11357-024-01197-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 05/08/2024] [Indexed: 06/25/2024] Open
Abstract
Dietary restriction (DR), the process of decreasing overall food consumption over an extended period of time, has been shown to increase longevity across evolutionarily diverse species and delay the onset of age-associated diseases in humans. In Caenorhabditis elegans, the Myc-family transcription factors (TFs) MXL-2 (Mlx) and MML-1 (MondoA/ChREBP), which function as obligate heterodimers, and PHA-4 (orthologous to FOXA) are both necessary for the full physiological benefits of DR. However, the adaptive transcriptional response to DR and the role of MML-1::MXL-2 and PHA-4 remains elusive. We identified the transcriptional signature of C. elegans DR, using the eat-2 genetic model, and demonstrate broad changes in metabolic gene expression in eat-2 DR animals, which requires both mxl-2 and pha-4. While the requirement for these factors in DR gene expression overlaps, we found many of the DR genes exhibit an opposing change in relative gene expression in eat-2;mxl-2 animals compared to wild-type, which was not observed in eat-2 animals with pha-4 loss. Surprisingly, we discovered more than 2000 genes synthetically dysregulated in eat-2;mxl-2, out of which the promoters of down-regulated genes were substantially enriched for PQM-1 and ELT-1/3 GATA TF binding motifs. We further show functional deficiencies of the mxl-2 loss in DR outside of lifespan, as eat-2;mxl-2 animals exhibit substantially smaller brood sizes and lay a proportion of dead eggs, indicating that MML-1::MXL-2 has a role in maintaining the balance between resource allocation to the soma and to reproduction under conditions of chronic food scarcity. While eat-2 animals do not show a significantly different metabolic rate compared to wild-type, we also find that loss of mxl-2 in DR does not affect the rate of oxygen consumption in young animals. The gene expression signature of eat-2 mutant animals is consistent with optimization of energy utilization and resource allocation, rather than induction of canonical gene expression changes associated with acute metabolic stress, such as induction of autophagy after TORC1 inhibition. Consistently, eat-2 animals are not substantially resistant to stress, providing further support to the idea that chronic DR may benefit healthspan and lifespan through efficient use of limited resources rather than broad upregulation of stress responses, and also indicates that MML-1::MXL-2 and PHA-4 may have distinct roles in promotion of benefits in response to different pro-longevity stimuli.
Collapse
Affiliation(s)
- Adam B Cornwell
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY, 14642, USA
| | - Yun Zhang
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY, 14642, USA
| | - Manjunatha Thondamal
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY, 14642, USA
- MURTI Centre and Department of Biotechnology, School of Technology, Gandhi Institute of Technology and Management (GITAM), Visakhapatnam, Andhra Pradesh, 530045, India
| | - David W Johnson
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY, 14642, USA
- Department of Math and Science, Genesee Community College, One College Rd, Batavia, NY, 14020, USA
| | - Juilee Thakar
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY, 14642, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY, 14642, USA
- Department of Microbiology and Immunology, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY, 14642, USA
| | - Andrew V Samuelson
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY, 14642, USA.
| |
Collapse
|
5
|
Ginell GM, Emenecker RJ, Lotthammer JM, Usher ET, Holehouse AS. Direct prediction of intermolecular interactions driven by disordered regions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.03.597104. [PMID: 38895487 PMCID: PMC11185574 DOI: 10.1101/2024.06.03.597104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Intrinsically disordered regions (IDRs) are critical for a wide variety of cellular functions, many of which involve interactions with partner proteins. Molecular recognition is typically considered through the lens of sequence-specific binding events. However, a growing body of work has shown that IDRs often interact with partners in a manner that does not depend on the precise order of the amino acid order, instead driven by complementary chemical interactions leading to disordered bound-state complexes. Despite this emerging paradigm, we lack tools to describe, quantify, predict, and interpret these types of structurally heterogeneous interactions from the underlying amino acid sequences. Here, we repurpose the chemical physics developed originally for molecular simulations to develop an approach for predicting intermolecular interactions between IDRs and partner proteins. Our approach enables the direct prediction of phase diagrams, the identification of chemically-specific interaction hotspots on IDRs, and a route to develop and test mechanistic hypotheses regarding IDR function in the context of molecular recognition. We use our approach to examine a range of systems and questions to highlight its versatility and applicability.
Collapse
Affiliation(s)
- Garrett M. Ginell
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO
- Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, MO
| | - Ryan. J Emenecker
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO
- Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, MO
| | - Jeffrey M. Lotthammer
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO
- Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, MO
| | - Emery T. Usher
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO
- Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, MO
| | - Alex S. Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO
- Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, MO
| |
Collapse
|
6
|
Farheen F, Broyles BK, Zhang Y, Ibtehaz N, Erkine AM, Kihara D. Predicting transcriptional activation domain function using Graph Neural Networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.08.593266. [PMID: 38766093 PMCID: PMC11100744 DOI: 10.1101/2024.05.08.593266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Analysis of factors that lead to the functionality of transcriptional activation domains remains a crucial and yet challenging task owing to the significant diversity in their sequences and their intrinsically disordered nature. Almost all existing methods that have aimed to predict activation domains have involved traditional machine learning approaches, such as logistic regression, that are unable to capture complex patterns in data or plain convolutional neural networks and have been limited in exploration of structural features. However, there is a tremendous potential in the inspection of the structural properties of activation domains, and an opportunity to investigate complex relationships between features of residues in the sequence. To address these, we have utilized the power of graph neural networks which can represent structural data in the form of nodes and edges, allowing nodes to exchange information among themselves. We have experimented with two kinds of graph formulations, one involving residues as nodes and the other assigning atoms to be the nodes. A logistic regression model was also developed to analyze feature importance. For all the models, several feature combinations were experimented with. The residue-level GNN model with amino acid type, residue position, acidic/basic/aromatic property and secondary structure feature combination gave the best performing model with accuracy, F1 score and AUROC of 97.9%, 71% and 97.1% respectively which outperformed other existing methods in the literature when applied on the dataset we used. Among the other structure-based features that were analyzed, the amphipathic property of helices also proved to be an important feature for classification. Logistic regression results showed that the most dominant feature that makes a sequence functional is the frequency of different types of amino acids in the sequence. Our results consistent have shown that functional sequences have more acidic and aromatic residues whereas basic residues are seen more in non-functional sequences.
Collapse
Affiliation(s)
- Farhanaz Farheen
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Bradley K. Broyles
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Yuanyuan Zhang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Nabil Ibtehaz
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Alexandre M. Erkine
- College of Pharmacy and Health Sciences, Butler University, Indianapolis, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
7
|
Struhl K. Intrinsically disordered regions (IDRs): A vague and confusing concept for protein function. Mol Cell 2024; 84:1186-1187. [PMID: 38579676 PMCID: PMC11090402 DOI: 10.1016/j.molcel.2024.02.023] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 02/13/2024] [Accepted: 02/23/2024] [Indexed: 04/07/2024]
Abstract
The term "intrinsically disordered region" (IDR) in proteins has been used in numerous publications. However, most proteins contain IDRs, the term refers to very different types of structures and functions, and many IDRs become structured upon interaction with other biomolecules. Thus, IDR is an unnecessary, vague, and ultimately confusing concept.
Collapse
Affiliation(s)
- Kevin Struhl
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
8
|
Singleton MD, Eisen MB. Evolutionary analyses of intrinsically disordered regions reveal widespread signals of conservation. PLoS Comput Biol 2024; 20:e1012028. [PMID: 38662765 PMCID: PMC11075841 DOI: 10.1371/journal.pcbi.1012028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 05/07/2024] [Accepted: 03/28/2024] [Indexed: 05/08/2024] Open
Abstract
Intrinsically disordered regions (IDRs) are segments of proteins without stable three-dimensional structures. As this flexibility allows them to interact with diverse binding partners, IDRs play key roles in cell signaling and gene expression. Despite the prevalence and importance of IDRs in eukaryotic proteomes and various biological processes, associating them with specific molecular functions remains a significant challenge due to their high rates of sequence evolution. However, by comparing the observed values of various IDR-associated properties against those generated under a simulated model of evolution, a recent study found most IDRs across the entire yeast proteome contain conserved features. Furthermore, it showed clusters of IDRs with common "evolutionary signatures," i.e. patterns of conserved features, were associated with specific biological functions. To determine if similar patterns of conservation are found in the IDRs of other systems, in this work we applied a series of phylogenetic models to over 7,500 orthologous IDRs identified in the Drosophila genome to dissect the forces driving their evolution. By comparing models of constrained and unconstrained continuous trait evolution using the Brownian motion and Ornstein-Uhlenbeck models, respectively, we identified signals of widespread constraint, indicating conservation of distributed features is mechanism of IDR evolution common to multiple biological systems. In contrast to the previous study in yeast, however, we observed limited evidence of IDR clusters with specific biological functions, which suggests a more complex relationship between evolutionary constraints and function in the IDRs of multicellular organisms.
Collapse
Affiliation(s)
- Marc D. Singleton
- Howard Hughes Medical Institute, UC Berkeley, Berkeley, California, United States of America
| | - Michael B. Eisen
- Howard Hughes Medical Institute, UC Berkeley, Berkeley, California, United States of America
- Department of Molecular and Cell Biology, UC Berkeley, Berkeley, California, United States of America
| |
Collapse
|
9
|
Monté D, Lens Z, Dewitte F, Villeret V, Verger A. Assessment of machine-learning predictions for the Mediator complex subunit MED25 ACID domain interactions with transactivation domains. FEBS Lett 2024; 598:758-773. [PMID: 38436147 DOI: 10.1002/1873-3468.14837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/01/2024] [Accepted: 02/10/2024] [Indexed: 03/05/2024]
Abstract
The human Mediator complex subunit MED25 binds transactivation domains (TADs) present in various cellular and viral proteins using two binding interfaces, named H1 and H2, which are found on opposite sides of its ACID domain. Here, we use and compare deep learning methods to characterize human MED25-TAD interfaces and assess the predicted models to published experimental data. For the H1 interface, AlphaFold produces predictions with high-reliability scores that agree well with experimental data, while the H2 interface predictions appear inconsistent, preventing reliable binding modes. Despite these limitations, we experimentally assess the validity of MED25 interface predictions with the viral transcriptional activators Lana-1 and IE62. AlphaFold predictions also suggest the existence of a unique hydrophobic pocket for the Arabidopsis MED25 ACID domain.
Collapse
Affiliation(s)
- Didier Monté
- CNRS EMR 9002 Integrative Structural Biology, Inserm U 1167 - RID-AGE, Univ. Lille, CHU Lille, Institut Pasteur de Lille, France
| | - Zoé Lens
- CNRS EMR 9002 Integrative Structural Biology, Inserm U 1167 - RID-AGE, Univ. Lille, CHU Lille, Institut Pasteur de Lille, France
| | - Frédérique Dewitte
- CNRS EMR 9002 Integrative Structural Biology, Inserm U 1167 - RID-AGE, Univ. Lille, CHU Lille, Institut Pasteur de Lille, France
| | - Vincent Villeret
- CNRS EMR 9002 Integrative Structural Biology, Inserm U 1167 - RID-AGE, Univ. Lille, CHU Lille, Institut Pasteur de Lille, France
| | - Alexis Verger
- CNRS EMR 9002 Integrative Structural Biology, Inserm U 1167 - RID-AGE, Univ. Lille, CHU Lille, Institut Pasteur de Lille, France
| |
Collapse
|
10
|
Mindel V, Brodsky S, Cohen A, Manadre W, Jonas F, Carmi M, Barkai N. Intrinsically disordered regions of the Msn2 transcription factor encode multiple functions using interwoven sequence grammars. Nucleic Acids Res 2024; 52:2260-2272. [PMID: 38109289 PMCID: PMC10954448 DOI: 10.1093/nar/gkad1191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/04/2023] [Accepted: 12/11/2023] [Indexed: 12/20/2023] Open
Abstract
Intrinsically disordered regions (IDRs) are abundant in eukaryotic proteins, but their sequence-function relationship remains poorly understood. IDRs of transcription factors (TFs) can direct promoter selection and recruit coactivators, as shown for the budding yeast TF Msn2. To examine how IDRs encode both these functions, we compared genomic binding specificity, coactivator recruitment, and gene induction amongst a large set of designed Msn2-IDR mutants. We find that both functions depend on multiple regions across the > 600AA IDR. Yet, transcription activity was readily disrupted by mutations that showed no effect on the Msn2 binding specificity. Our data attribute this differential sensitivity to the integration of a relaxed, composition-based code directing binding specificity with a more stringent, motif-based code controlling the recruitment of coactivators and transcription activity. Therefore, Msn2 utilizes interwoven sequence grammars for encoding multiple functions, suggesting a new IDR design paradigm of potentially general use.
Collapse
Affiliation(s)
- Vladimir Mindel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Sagie Brodsky
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Aileen Cohen
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Wajd Manadre
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Felix Jonas
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Miri Carmi
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Naama Barkai
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
11
|
Gan P, Eppert M, De La Cruz N, Lyons H, Shah AM, Veettil RT, Chen K, Pradhan P, Bezprozvannaya S, Xu L, Liu N, Olson EN, Sabari BR. Coactivator condensation drives cardiovascular cell lineage specification. SCIENCE ADVANCES 2024; 10:eadk7160. [PMID: 38489358 PMCID: PMC10942106 DOI: 10.1126/sciadv.adk7160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 02/12/2024] [Indexed: 03/17/2024]
Abstract
During development, cells make switch-like decisions to activate new gene programs specifying cell lineage. The mechanisms underlying these decisive choices remain unclear. Here, we show that the cardiovascular transcriptional coactivator myocardin (MYOCD) activates cell identity genes by concentration-dependent and switch-like formation of transcriptional condensates. MYOCD forms such condensates and activates cell identity genes at critical concentration thresholds achieved during smooth muscle cell and cardiomyocyte differentiation. The carboxyl-terminal disordered region of MYOCD is necessary and sufficient for condensate formation. Disrupting this region's ability to form condensates disrupts gene activation and smooth muscle cell reprogramming. Rescuing condensate formation by replacing this region with disordered regions from functionally unrelated proteins rescues gene activation and smooth muscle cell reprogramming. Our findings demonstrate that MYOCD condensate formation is required for gene activation during cardiovascular differentiation. We propose that the formation of transcriptional condensates at critical concentrations of cell type-specific regulators provides a molecular switch underlying the activation of key cell identity genes during development.
Collapse
Affiliation(s)
- Peiheng Gan
- Department of Molecular Biology, Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Mikayla Eppert
- Laboratory of Nuclear Organization, Cecil H. and Ida Green Center for Reproductive Biology Sciences, Division of Basic Research, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Nancy De La Cruz
- Laboratory of Nuclear Organization, Cecil H. and Ida Green Center for Reproductive Biology Sciences, Division of Basic Research, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Heankel Lyons
- Laboratory of Nuclear Organization, Cecil H. and Ida Green Center for Reproductive Biology Sciences, Division of Basic Research, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Akansha M. Shah
- Department of Molecular Biology, Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Reshma T. Veettil
- Laboratory of Nuclear Organization, Cecil H. and Ida Green Center for Reproductive Biology Sciences, Division of Basic Research, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Kenian Chen
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Prashant Pradhan
- Laboratory of Nuclear Organization, Cecil H. and Ida Green Center for Reproductive Biology Sciences, Division of Basic Research, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Svetlana Bezprozvannaya
- Department of Molecular Biology, Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Lin Xu
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Ning Liu
- Department of Molecular Biology, Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Eric N. Olson
- Department of Molecular Biology, Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Benjamin R. Sabari
- Department of Molecular Biology, Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Laboratory of Nuclear Organization, Cecil H. and Ida Green Center for Reproductive Biology Sciences, Division of Basic Research, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
12
|
Lobel JH, Ingolia NT. Defining the mechanisms and properties of post-transcriptional regulatory disordered regions by high-throughput functional profiling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.01.578453. [PMID: 38370681 PMCID: PMC10871298 DOI: 10.1101/2024.02.01.578453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Disordered regions within RNA binding proteins are required to control mRNA decay and protein synthesis. To understand how these disordered regions modulate gene expression, we surveyed regulatory activity across the entire disordered proteome using a high-throughput functional assay. We identified hundreds of regulatory sequences within intrinsically disordered regions and demonstrate how these elements cooperate with core mRNA decay machinery to promote transcript turnover. Coupling high-throughput functional profiling with mutational scanning revealed diverse molecular features, ranging from defined motifs to overall sequence composition, underlying the regulatory effects of disordered peptides. Machine learning analysis implicated aromatic residues in particular contexts as critical determinants of repressor activity, consistent with their roles in forming protein-protein interactions with downstream effectors. Our results define the molecular principles and biochemical mechanisms that govern post-transcriptional gene regulation by disordered regions and exemplify the encoding of diverse yet specific functions in the absence of well-defined structure.
Collapse
Affiliation(s)
- Joseph H Lobel
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Nicholas T Ingolia
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Lead contact
| |
Collapse
|
13
|
Sreenivasan S, Heffren P, Suh K, Rodnin MV, Kosa E, Fenton AW, Ladokhin AS, Smith PE, Fontes JD, Swint‐Kruse L. The intrinsically disordered transcriptional activation domain of CIITA is functionally tuneable by single substitutions: An exception or a new paradigm? Protein Sci 2024; 33:e4863. [PMID: 38073129 PMCID: PMC10806935 DOI: 10.1002/pro.4863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 12/04/2023] [Accepted: 12/07/2023] [Indexed: 01/27/2024]
Abstract
During protein evolution, some amino acid substitutions modulate protein function ("tuneability"). In most proteins, the tuneable range is wide and can be sampled by a set of protein variants that each contains multiple amino acid substitutions. In other proteins, the full tuneable range can be accessed by a set of variants that each contains a single substitution. Indeed, in some globular proteins, the full tuneable range can be accessed by the set of site-saturating substitutions at an individual "rheostat" position. However, in proteins with intrinsically disordered regions (IDRs), most functional studies-which would also detect tuneability-used multiple substitutions or small deletions. In disordered transcriptional activation domains (ADs), studies with multiple substitutions led to the "acidic exposure" model, which does not anticipate the existence of rheostat positions. In the few studies that did assess effects of single substitutions on AD function, results were mixed: the ADs of two full-length transcription factors did not show tuneability, whereas a fragment of a third AD was tuneable by single substitutions. In this study, we tested tuneability in the AD of full-length human class II transactivator (CIITA). Sequence analyses and experiments showed that CIITA's AD is an IDR. Functional assays of singly-substituted AD variants showed that CIITA's function was highly tuneable, with outcomes not predicted by the acidic exposure model. Four tested positions showed rheostat behavior for transcriptional activation. Thus, tuneability of different IDRs can vary widely. Future studies are needed to illuminate the biophysical features that govern whether an IDR is tuneable by single substitutions.
Collapse
Affiliation(s)
- Shwetha Sreenivasan
- Department of Biochemistry and Molecular BiologyUniversity of Kansas Medical CenterKansas CityKansasUSA
| | - Paul Heffren
- Department of Biochemistry and Molecular BiologyUniversity of Kansas Medical CenterKansas CityKansasUSA
- Present address:
Department of BiosciencesKansas City UniversityKansas CityMissouriUSA
| | - Kyung‐Shin Suh
- Department of ChemistryKansas State UniversityManhattanKansasUSA
| | - Mykola V. Rodnin
- Department of Biochemistry and Molecular BiologyUniversity of Kansas Medical CenterKansas CityKansasUSA
| | - Edina Kosa
- Department of Biochemistry and Molecular BiologyUniversity of Kansas Medical CenterKansas CityKansasUSA
| | - Aron W. Fenton
- Department of Biochemistry and Molecular BiologyUniversity of Kansas Medical CenterKansas CityKansasUSA
| | - Alexey S. Ladokhin
- Department of Biochemistry and Molecular BiologyUniversity of Kansas Medical CenterKansas CityKansasUSA
| | - Paul E. Smith
- Department of ChemistryKansas State UniversityManhattanKansasUSA
| | - Joseph D. Fontes
- Department of Biochemistry and Molecular BiologyUniversity of Kansas Medical CenterKansas CityKansasUSA
| | - Liskin Swint‐Kruse
- Department of Biochemistry and Molecular BiologyUniversity of Kansas Medical CenterKansas CityKansasUSA
| |
Collapse
|
14
|
Udupa A, Kotha SR, Staller MV. Commonly asked questions about transcriptional activation domains. Curr Opin Struct Biol 2024; 84:102732. [PMID: 38056064 PMCID: PMC11193542 DOI: 10.1016/j.sbi.2023.102732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 10/23/2023] [Accepted: 10/27/2023] [Indexed: 12/08/2023]
Abstract
Eukaryotic transcription factors activate gene expression with their DNA-binding domains and activation domains. DNA-binding domains bind the genome by recognizing structurally related DNA sequences; they are structured, conserved, and predictable from protein sequences. Activation domains recruit chromatin modifiers, coactivator complexes, or basal transcriptional machinery via structurally diverse protein-protein interactions. Activation domains and DNA-binding domains have been called independent, modular units, but there are many departures from modularity, including interactions between these regions and overlap in function. Compared to DNA-binding domains, activation domains are poorly understood because they are poorly conserved, intrinsically disordered, and difficult to predict from protein sequences. This review, organized around commonly asked questions, describes recent progress that the field has made in understanding the sequence features that control activation domains and predicting them from sequence.
Collapse
Affiliation(s)
- Aditya Udupa
- Department of Molecular and Cell Biology, University of California, Berkeley, 94720, USA
| | - Sanjana R Kotha
- Department of Molecular and Cell Biology, University of California, Berkeley, 94720, USA; Center for Computational Biology, University of California, Berkeley, 94720, USA
| | - Max V Staller
- Department of Molecular and Cell Biology, University of California, Berkeley, 94720, USA; Center for Computational Biology, University of California, Berkeley, 94720, USA; Chan Zuckerberg Biohub-San Francisco, San Francisco, CA 94158, USA.
| |
Collapse
|
15
|
DelRosso N, Bintu L. Using High-Throughput Measurements to Identify Principles of Transcriptional and Epigenetic Regulators. Methods Mol Biol 2024; 2842:79-101. [PMID: 39012591 DOI: 10.1007/978-1-0716-4051-7_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
To achieve exquisite control over the epigenome, we need a better predictive understanding of how transcription factors, chromatin regulators, and their individual domain's function, both as modular parts and as full proteins. Transcriptional effector domains are one class of protein domains that regulate transcription and chromatin. These effector domains either repress or activate gene expression by interacting with chromatin-modifying enzymes, transcriptional cofactors, and/or general transcriptional machinery. Here, we discuss important design considerations for high-throughput investigations of effector domains, recent advances in discovering new domains in human cells and testing how domain function depends on amino acid sequence. For every effector domain, we would like to know the following: What role does the cell type, signaling state, and targeted context have on activation, silencing, and epigenetic memory? Large-scale measurements of transcriptional activities can help systematically answer these questions and identify general rules for how all these parameters affect effector domain activities. Last, we discuss what steps need to be taken to turn a newly discovered effector domain into a robust, precise epigenome editor. With more carefully considered high-throughput investigations, soon we will have better predictive control over the epigenome.
Collapse
|
16
|
Cornwell A, Zhang Y, Thondamal M, Johnson DW, Thakar J, Samuelson AV. The C. elegans Myc-family of transcription factors coordinate a dynamic adaptive response to dietary restriction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.22.568222. [PMID: 38045350 PMCID: PMC10690244 DOI: 10.1101/2023.11.22.568222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Dietary restriction (DR), the process of decreasing overall food consumption over an extended period of time, has been shown to increase longevity across evolutionarily diverse species and delay the onset of age-associated diseases in humans. In Caenorhabditis elegans, the Myc-family transcription factors (TFs) MXL-2 (Mlx) and MML-1 (MondoA/ChREBP), which function as obligate heterodimers, and PHA-4 (orthologous to forkhead box transcription factor A) are both necessary for the full physiological benefits of DR. However, the adaptive transcriptional response to DR and the role of MML-1::MXL-2 and PHA-4 remains elusive. We identified the transcriptional signature of C. elegans DR, using the eat-2 genetic model, and demonstrate broad changes in metabolic gene expression in eat-2 DR animals, which requires both mxl-2 and pha-4. While the requirement for these factors in DR gene expression overlaps, we found many of the DR genes exhibit an opposing change in relative gene expression in eat-2;mxl-2 animals compared to wild-type, which was not observed in eat-2 animals with pha-4 loss. We further show functional deficiencies of the mxl-2 loss in DR outside of lifespan, as eat-2;mxl-2 animals exhibit substantially smaller brood sizes and lay a proportion of dead eggs, indicating that MML-1::MXL-2 has a role in maintaining the balance between resource allocation to the soma and to reproduction under conditions of chronic food scarcity. While eat-2 animals do not show a significantly different metabolic rate compared to wild-type, we also find that loss of mxl-2 in DR does not affect the rate of oxygen consumption in young animals. The gene expression signature of eat-2 mutant animals is consistent with optimization of energy utilization and resource allocation, rather than induction of canonical gene expression changes associated with acute metabolic stress -such as induction of autophagy after TORC1 inhibition. Consistently, eat-2 animals are not substantially resistant to stress, providing further support to the idea that chronic DR may benefit healthspan and lifespan through efficient use of limited resources rather than broad upregulation of stress responses, and also indicates that MML-1::MXL-2 and PHA-4 may have different roles in promotion of benefits in response to different pro-longevity stimuli.
Collapse
Affiliation(s)
- Adam Cornwell
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA
| | - Yun Zhang
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA
| | - Manjunatha Thondamal
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA
- Department of Biological Sciences, GITAM University, Andhra Pradesh, India
| | - David W Johnson
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA
- Department of Math and Science, Genesee Community College, One College Rd Batavia, NY 14020, USA
| | - Juilee Thakar
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA
- Department of Microbiology and Immunology, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA
| | - Andrew V Samuelson
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA
| |
Collapse
|
17
|
Kotha SR, Staller MV. Clusters of acidic and hydrophobic residues can predict acidic transcriptional activation domains from protein sequence. Genetics 2023; 225:iyad131. [PMID: 37462277 PMCID: PMC10550315 DOI: 10.1093/genetics/iyad131] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 07/03/2023] [Indexed: 10/06/2023] Open
Abstract
Transcription factors activate gene expression in development, homeostasis, and stress with DNA binding domains and activation domains. Although there exist excellent computational models for predicting DNA binding domains from protein sequence, models for predicting activation domains from protein sequence have lagged, particularly in metazoans. We recently developed a simple and accurate predictor of acidic activation domains on human transcription factors. Here, we show how the accuracy of this human predictor arises from the clustering of aromatic, leucine, and acidic residues, which together are necessary for acidic activation domain function. When we combine our predictor with the predictions of convolutional neural network (CNN) models trained in yeast, the intersection is more accurate than individual models, emphasizing that each approach carries orthogonal information. We synthesize these findings into a new set of activation domain predictions on human transcription factors.
Collapse
Affiliation(s)
- Sanjana R Kotha
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA 94720, USA
| | - Max Valentín Staller
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA 94720, USA
- Chan Zuckerberg Biohub—San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
18
|
Jores T, Hamm M, Cuperus JT, Queitsch C. Frontiers and techniques in plant gene regulation. CURRENT OPINION IN PLANT BIOLOGY 2023; 75:102403. [PMID: 37331209 DOI: 10.1016/j.pbi.2023.102403] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 05/12/2023] [Accepted: 05/19/2023] [Indexed: 06/20/2023]
Abstract
Understanding plant gene regulation has been a priority for generations of plant scientists. However, due to its complex nature, the regulatory code governing plant gene expression has yet to be deciphered comprehensively. Recently developed methods-often relying on next-generation sequencing technology and state-of-the-art computational approaches-have started to further our understanding of the gene regulatory logic used by plants. In this review, we discuss these methods and the insights into the regulatory code of plants that they can yield.
Collapse
Affiliation(s)
- Tobias Jores
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| | - Morgan Hamm
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Josh T Cuperus
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| |
Collapse
|
19
|
Hummel NFC, Markel K, Stefani J, Staller MV, Shih PM. Systematic identification of transcriptional activator domains from non-transcription factor proteins in plants and yeast. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.12.557247. [PMID: 37745555 PMCID: PMC10515812 DOI: 10.1101/2023.09.12.557247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Transcription factors promote gene expression via trans-regulatory activation domains. Although whole genome scale screens in model organisms (e.g. human, yeast, fly) have helped identify activation domains from transcription factors, such screens have been less extensively used to explore the occurrence of activation domains in non-transcription factor proteins, such as transcriptional coactivators, chromatin regulators and some cytosolic proteins, leaving a blind spot on what role activation domains in these proteins could play in regulating transcription. We utilized the activation domain predictor PADDLE to mine the entire proteomes of two model eukaryotes, Arabidopsis thaliana and Saccharomyces cerevisiae ( 1 ). We characterized 18,000 fragments covering predicted activation domains from >800 non-transcription factor genes in both species, and experimentally validated that 89% of proteins contained fragments capable of activating transcription in yeast. Peptides with similar sequence composition show a broad range of activities, which is explained by the arrangement of key amino acids. We also annotated hundreds of nuclear proteins with activation domains as putative coactivators; many of which have never been ascribed any function in plants. Furthermore, our library contains >250 non-nuclear proteins containing peptides with activation domain function across both eukaryotic lineages, suggesting that there are unknown biological roles of these peptides beyond transcription. Finally, we identify and validate short, 'universal' eukaryotic activation domains that activate transcription in both yeast and plants with comparable or stronger performance to state-of-the-art activation domains. Overall, our dual host screen provides a blueprint on how to systematically discover novel genetic parts for synthetic biology that function across a wide diversity of eukaryotes. Significance Statement Activation domains promote transcription and play a critical role in regulating gene expression. Although the mapping of activation domains from transcription factors has been carried out in previous genome-wide screens, their occurrence in non-transcription factors has been less explored. We utilize an activation domain predictor to mine the entire proteomes of Arabidopsis thaliana and Saccharomyces cerevisiae for new activation domains on non-transcription factor proteins. We validate peptides derived from >750 non-transcription factor proteins capable of activating transcription, discovering many potentially new coactivators in plants. Importantly, we identify novel genetic parts that can function across both species, representing unique synthetic biology tools.
Collapse
|
20
|
Mahendrawada L, Warfield L, Donczew R, Hahn S. Surprising connections between DNA binding and function for the near-complete set of yeast transcription factors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.25.550593. [PMID: 37546716 PMCID: PMC10402042 DOI: 10.1101/2023.07.25.550593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
DNA sequence-specific transcription factors (TFs) modulate transcription and chromatin architecture, acting from regulatory sites in enhancers and promoters of eukaryotic genes. How TFs locate their DNA targets and how multiple TFs cooperate to regulate individual genes is still unclear. Most yeast TFs are thought to regulate transcription via binding to upstream activating sequences, situated within a few hundred base pairs upstream of the regulated gene. While this model has been validated for individual TFs and specific genes, it has not been tested in a systematic way with the large set of yeast TFs. Here, we have integrated information on the binding and expression targets for the near-complete set of yeast TFs. While we found many instances of functional TF binding sites in upstream regulatory regions, we found many more instances that do not fit this model. In many cases, rapid TF depletion affects gene expression where there is no detectable binding of that TF to the upstream region of the affected gene. In addition, for most TFs, only a small fraction of bound TFs regulates the nearby gene, showing that TF binding does not automatically correspond to regulation of the linked gene. Finally, we found that only a small percentage of TFs are exclusively strong activators or repressors with most TFs having dual function. Overall, our comprehensive mapping of TF binding and regulatory targets have both confirmed known TF relationships and revealed surprising properties of TF function.
Collapse
|
21
|
Hummel NFC, Zhou A, Li B, Markel K, Ornelas IJ, Shih PM. The trans-regulatory landscape of gene networks in plants. Cell Syst 2023; 14:501-511.e4. [PMID: 37348464 DOI: 10.1016/j.cels.2023.05.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 03/21/2023] [Accepted: 05/11/2023] [Indexed: 06/24/2023]
Abstract
The transcriptional effector domains of transcription factors play a key role in controlling gene expression; however, their functional nature is poorly understood, hampering our ability to explore this fundamental dimension of gene regulatory networks. To map the trans-regulatory landscape in a complex eukaryote, we systematically characterized the putative transcriptional effector domains of over 400 Arabidopsis thaliana transcription factors for their capacity to modulate transcription. We demonstrate that transcriptional effector activity can be integrated into gene regulatory networks capable of elucidating the functional dynamics underlying gene expression patterns. We further show how our characterized domains can enhance genome engineering efforts and reveal how plant transcriptional activators share regulatory features conserved across distantly related eukaryotes. Our results provide a framework to systematically characterize the regulatory role of transcription factors at a genome-scale in order to understand the transcriptional wiring of biological systems.
Collapse
Affiliation(s)
- Niklas F C Hummel
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Feedstocks Division, Joint BioEnergy Institute, Emeryville, CA 94608, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94705, USA; Department of Biology, Technische Universität Darmstadt, Darmstadt 64287, Germany
| | - Andy Zhou
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Feedstocks Division, Joint BioEnergy Institute, Emeryville, CA 94608, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94705, USA
| | - Baohua Li
- Feedstocks Division, Joint BioEnergy Institute, Emeryville, CA 94608, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94705, USA
| | - Kasey Markel
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Feedstocks Division, Joint BioEnergy Institute, Emeryville, CA 94608, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94705, USA
| | - Izaiah J Ornelas
- Feedstocks Division, Joint BioEnergy Institute, Emeryville, CA 94608, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94705, USA
| | - Patrick M Shih
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Feedstocks Division, Joint BioEnergy Institute, Emeryville, CA 94608, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94705, USA; Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA.
| |
Collapse
|
22
|
Jonas F, Carmi M, Krupkin B, Steinberger J, Brodsky S, Jana T, Barkai N. The molecular grammar of protein disorder guiding genome-binding locations. Nucleic Acids Res 2023; 51:4831-4844. [PMID: 36938874 PMCID: PMC10250222 DOI: 10.1093/nar/gkad184] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 01/25/2023] [Accepted: 03/15/2023] [Indexed: 03/21/2023] Open
Abstract
Intrinsically disordered regions (IDRs) direct transcription factors (TFs) towards selected genomic occurrences of their binding motif, as exemplified by budding yeast's Msn2. However, the sequence basis of IDR-directed TF binding selectivity remains unknown. To reveal this sequence grammar, we analyze the genomic localizations of >100 designed IDR mutants, each carrying up to 122 mutations within this 567-AA region. Our data points at multivalent interactions, carried by hydrophobic-mostly aliphatic-residues dispersed within a disordered environment and independent of linear sequence motifs, as the key determinants of Msn2 genomic localization. The implications of our results for the mechanistic basis of IDR-based TF binding preferences are discussed.
Collapse
Affiliation(s)
- Felix Jonas
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Miri Carmi
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Beniamin Krupkin
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Joseph Steinberger
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Sagie Brodsky
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Tamar Jana
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Naama Barkai
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
23
|
Davey NE, Simonetti L, Ivarsson Y. The next wave of interactomics: Mapping the SLiM-based interactions of the intrinsically disordered proteome. Curr Opin Struct Biol 2023; 80:102593. [PMID: 37099901 DOI: 10.1016/j.sbi.2023.102593] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/09/2023] [Accepted: 03/17/2023] [Indexed: 04/28/2023]
Abstract
Short linear motifs (SLiMs) are a unique and ubiquitous class of protein interaction modules that perform key regulatory functions and drive dynamic complex formation. For decades, interactions mediated by SLiMs have accumulated through detailed low-throughput experiments. Recent methodological advances have opened this previously underexplored area of the human interactome to high-throughput protein-protein interaction discovery. In this article, we discuss that SLiM-based interactions represent a significant blind spot in the current interactomics data, introduce the key methods that are illuminating the elusive SLiM-mediated interactome of the human cell on a large scale, and discuss the implications for the field.
Collapse
Affiliation(s)
- Norman E Davey
- Division of Cancer Biology, The Institute of Cancer Research, 237 Fulham Road, London, SW3 6JB, UK.
| | - Leandro Simonetti
- Department of Chemistry - BMC, Uppsala University, Box 576, Husargatan 3, 751 23, Uppsala, Sweden
| | - Ylva Ivarsson
- Department of Chemistry - BMC, Uppsala University, Box 576, Husargatan 3, 751 23, Uppsala, Sweden.
| |
Collapse
|
24
|
Reynaud K, McGeachy AM, Noble D, Meacham ZA, Ingolia NT. Surveying the global landscape of post-transcriptional regulators. Nat Struct Mol Biol 2023; 30:740-752. [PMID: 37231154 PMCID: PMC10279529 DOI: 10.1038/s41594-023-00999-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 04/17/2023] [Indexed: 05/27/2023]
Abstract
Numerous proteins regulate gene expression by modulating mRNA translation and decay. To uncover the full scope of these post-transcriptional regulators, we conducted an unbiased survey that quantifies regulatory activity across the budding yeast proteome and delineates the protein domains responsible for these effects. Our approach couples a tethered function assay with quantitative single-cell fluorescence measurements to analyze ~50,000 protein fragments and determine their effects on a tethered mRNA. We characterize hundreds of strong regulators, which are enriched for canonical and unconventional mRNA-binding proteins. Regulatory activity typically maps outside the RNA-binding domains themselves, highlighting a modular architecture that separates mRNA targeting from post-transcriptional regulation. Activity often aligns with intrinsically disordered regions that can interact with other proteins, even in core mRNA translation and degradation factors. Our results thus reveal networks of interacting proteins that control mRNA fate and illuminate the molecular basis for post-transcriptional gene regulation.
Collapse
Affiliation(s)
- Kendra Reynaud
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA, USA
| | - Anna M McGeachy
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - David Noble
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Zuriah A Meacham
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Nicholas T Ingolia
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA, USA.
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA.
| |
Collapse
|
25
|
Cermakova K, Hodges HC. Interaction modules that impart specificity to disordered protein. Trends Biochem Sci 2023; 48:477-490. [PMID: 36754681 PMCID: PMC10106370 DOI: 10.1016/j.tibs.2023.01.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 01/09/2023] [Accepted: 01/12/2023] [Indexed: 02/09/2023]
Abstract
Intrinsically disordered regions (IDRs) are especially enriched among proteins that regulate chromatin and transcription. As a result, mechanisms that influence specificity of IDR-driven interactions have emerged as exciting unresolved issues for understanding gene regulation. We review the molecular elements frequently found within IDRs that confer regulatory specificity. In particular, we summarize the differing roles of disordered low-complexity regions (LCRs) and short linear motifs (SLiMs) towards selective nuclear regulation. Examination of IDR-driven interactions highlights SLiMs as organizers of selectivity, with widespread roles in gene regulation and integration of cellular signals. Analysis of recurrent interactions between SLiMs and folded domains suggests diverse avenues for SLiMs to influence phase-separated condensates and highlights opportunities to manipulate these interactions for control of biological activity.
Collapse
Affiliation(s)
- Katerina Cermakova
- Department of Molecular and Cellular Biology, Center for Precision Environmental Health, Baylor College of Medicine, Houston, TX, USA
| | - H Courtney Hodges
- Department of Molecular and Cellular Biology, Center for Precision Environmental Health, Baylor College of Medicine, Houston, TX, USA; Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA; Department of Bioengineering, Rice University, Houston, TX, USA; Center for Cancer Epigenetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
26
|
DelRosso N, Tycko J, Suzuki P, Andrews C, Aradhana, Mukund A, Liongson I, Ludwig C, Spees K, Fordyce P, Bassik MC, Bintu L. Large-scale mapping and mutagenesis of human transcriptional effector domains. Nature 2023; 616:365-372. [PMID: 37020022 PMCID: PMC10484233 DOI: 10.1038/s41586-023-05906-y] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 03/01/2023] [Indexed: 04/07/2023]
Abstract
Human gene expression is regulated by more than 2,000 transcription factors and chromatin regulators1,2. Effector domains within these proteins can activate or repress transcription. However, for many of these regulators we do not know what type of effector domains they contain, their location in the protein, their activation and repression strengths, and the sequences that are necessary for their functions. Here, we systematically measure the effector activity of more than 100,000 protein fragments tiling across most chromatin regulators and transcription factors in human cells (2,047 proteins). By testing the effect they have when recruited at reporter genes, we annotate 374 activation domains and 715 repression domains, roughly 80% of which are new and have not been previously annotated3-5. Rational mutagenesis and deletion scans across all the effector domains reveal aromatic and/or leucine residues interspersed with acidic, proline, serine and/or glutamine residues are necessary for activation domain activity. Furthermore, most repression domain sequences contain sites for small ubiquitin-like modifier (SUMO)ylation, short interaction motifs for recruiting corepressors or are structured binding domains for recruiting other repressive proteins. We discover bifunctional domains that can both activate and repress, some of which dynamically split a cell population into high- and low-expression subpopulations. Our systematic annotation and characterization of effector domains provide a rich resource for understanding the function of human transcription factors and chromatin regulators, engineering compact tools for controlling gene expression and refining predictive models of effector domain function.
Collapse
Affiliation(s)
| | - Josh Tycko
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Peter Suzuki
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Cecelia Andrews
- Department of Developmental Biology, Stanford University, Stanford, CA, USA
| | - Aradhana
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Adi Mukund
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Ivan Liongson
- Department of Biology, Stanford University, Stanford, CA, USA
| | - Connor Ludwig
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Kaitlyn Spees
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Polly Fordyce
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
- ChEM-H Institute, Stanford University, Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | | | - Lacramioara Bintu
- Department of Bioengineering, Stanford University, Stanford, CA, USA.
| |
Collapse
|
27
|
Klaus L, de Almeida BP, Vlasova A, Nemčko F, Schleiffer A, Bergauer K, Hofbauer L, Rath M, Stark A. Systematic identification and characterization of repressive domains in Drosophila transcription factors. EMBO J 2023; 42:e112100. [PMID: 36545802 PMCID: PMC9890238 DOI: 10.15252/embj.2022112100] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 11/21/2022] [Accepted: 12/01/2022] [Indexed: 12/24/2022] Open
Abstract
All multicellular life relies on differential gene expression, determined by regulatory DNA elements and DNA-binding transcription factors that mediate activation and repression via cofactor recruitment. While activators have been extensively characterized, repressors are less well studied: the identities and properties of their repressive domains (RDs) are typically unknown and the specific co-repressors (CoRs) they recruit have not been determined. Here, we develop a high-throughput, next-generation sequencing-based screening method, repressive-domain (RD)-seq, to systematically identify RDs in complex DNA-fragment libraries. Screening more than 200,000 fragments covering the coding sequences of all transcription-related proteins in Drosophila melanogaster, we identify 195 RDs in known repressors and in proteins not previously associated with repression. Many RDs contain recurrent short peptide motifs, which are conserved between fly and human and are required for RD function, as demonstrated by motif mutagenesis. Moreover, we show that RDs that contain one of five distinct repressive motifs interact with and depend on different CoRs, such as Groucho, CtBP, Sin3A, or Smrter. These findings advance our understanding of repressors, their sequences, and the functional impact of sequence-altering mutations and should provide a valuable resource for further studies.
Collapse
Affiliation(s)
- Loni Klaus
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
- Vienna BioCenter PhD ProgramDoctoral School of the University of Vienna and Medical University of ViennaViennaAustria
| | - Bernardo P de Almeida
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
- Vienna BioCenter PhD ProgramDoctoral School of the University of Vienna and Medical University of ViennaViennaAustria
| | - Anna Vlasova
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
| | - Filip Nemčko
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
- Vienna BioCenter PhD ProgramDoctoral School of the University of Vienna and Medical University of ViennaViennaAustria
| | - Alexander Schleiffer
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
- Institute of Molecular Biotechnology (IMBA)Vienna BioCenter (VBC)ViennaAustria
| | - Katharina Bergauer
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
| | - Lorena Hofbauer
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
- Vienna BioCenter PhD ProgramDoctoral School of the University of Vienna and Medical University of ViennaViennaAustria
| | - Martina Rath
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
| | - Alexander Stark
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
- Medical University of ViennaVienna BioCenter (VBC)ViennaAustria
| |
Collapse
|
28
|
Lambert É, Puwakdandawa K, Tao YF, Robert F. From structure to molecular condensates: emerging mechanisms for Mediator function. FEBS J 2023; 290:286-309. [PMID: 34698446 DOI: 10.1111/febs.16250] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 10/15/2021] [Accepted: 10/25/2021] [Indexed: 02/05/2023]
Abstract
Mediator is a large modular protein assembly whose function as a coactivator of transcription is conserved in all eukaryotes. The Mediator complex can integrate and relay signals from gene-specific activators bound at enhancers to activate the general transcription machinery located at promoters. It has thus been described as a bridge between these elements during initiation of transcription. Here, we review recent studies on Mediator relating to its structure, gene specificity and general requirement, roles in chromatin architecture as well as novel concepts involving phase separation and transcriptional bursting. We revisit the mechanism of action of Mediator and ultimately put forward models for its mode of action in gene activation.
Collapse
Affiliation(s)
- Élie Lambert
- Institut de recherches cliniques de Montréal, Canada
| | | | - Yi Fei Tao
- Institut de recherches cliniques de Montréal, Canada
| | - François Robert
- Institut de recherches cliniques de Montréal, Canada.,Département de Médecine, Faculté de Médecine, Université de Montréal, Canada
| |
Collapse
|
29
|
Warfield L, Donczew R, Mahendrawada L, Hahn S. Yeast Mediator facilitates transcription initiation at most promoters via a Tail-independent mechanism. Mol Cell 2022; 82:4033-4048.e7. [PMID: 36208626 PMCID: PMC9637718 DOI: 10.1016/j.molcel.2022.09.016] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 05/12/2022] [Accepted: 09/13/2022] [Indexed: 11/06/2022]
Abstract
Mediator (MED) is a conserved factor with important roles in basal and activated transcription. Here, we investigate the genome-wide roles of yeast MED by rapid depletion of its activator-binding domain (Tail) and monitoring changes in nascent transcription. Rapid Tail depletion surprisingly reduces transcription from only a small subset of genes. At most of these Tail-dependent genes, in unperturbed conditions, MED is detected at both the UASs and promoters. In contrast, at most Tail-independent genes, we find MED primarily at promoters but not at the UASs. These results suggest that MED Tail and activator-mediated MED recruitment regulates only a small subset of genes. Furthermore, we define three classes of genes that differ in PIC assembly pathways and the requirements for MED Tail, SAGA, TFIID, and BET factors Bdf1/2. Our combined results have broad implications for the roles of MED, other coactivators, and mechanisms of transcriptional regulation at different gene classes.
Collapse
Affiliation(s)
- Linda Warfield
- Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Mailstop A1-162, Seattle, WA 98109, USA
| | - Rafal Donczew
- Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Mailstop A1-162, Seattle, WA 98109, USA
| | - Lakshmi Mahendrawada
- Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Mailstop A1-162, Seattle, WA 98109, USA
| | - Steven Hahn
- Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Mailstop A1-162, Seattle, WA 98109, USA.
| |
Collapse
|
30
|
Staller MV. Transcription factors perform a 2-step search of the nucleus. Genetics 2022; 222:iyac111. [PMID: 35939561 PMCID: PMC9526044 DOI: 10.1093/genetics/iyac111] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 07/14/2022] [Indexed: 01/02/2023] Open
Abstract
Transcription factors regulate gene expression by binding to regulatory DNA and recruiting regulatory protein complexes. The DNA-binding and protein-binding functions of transcription factors are traditionally described as independent functions performed by modular protein domains. Here, I argue that genome binding can be a 2-part process with both DNA-binding and protein-binding steps, enabling transcription factors to perform a 2-step search of the nucleus to find their appropriate binding sites in a eukaryotic genome. I support this hypothesis with new and old results in the literature, discuss how this hypothesis parsimoniously resolves outstanding problems, and present testable predictions.
Collapse
Affiliation(s)
- Max Valentín Staller
- Corresponding author: Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA.
| |
Collapse
|
31
|
Sahu M, Gupta R, Ambasta RK, Kumar P. Artificial intelligence and machine learning in precision medicine: A paradigm shift in big data analysis. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2022; 190:57-100. [PMID: 36008002 DOI: 10.1016/bs.pmbts.2022.03.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The integration of artificial intelligence in precision medicine has revolutionized healthcare delivery. Precision medicine identifies the phenotype of particular patients with less-common responses to treatment. Recent studies have demonstrated that translational research exploring the convergence between artificial intelligence and precision medicine will help solve the most difficult challenges facing precision medicine. Here, we discuss different aspects of artificial intelligence in precision medicine that improve healthcare delivery. First, we discuss how artificial intelligence changes the landscape of precision medicine and the evolution of artificial intelligence in precision medicine. Second, we highlight the synergies between artificial intelligence and precision medicine and promises of artificial intelligence and precision medicine in healthcare delivery. Third, we briefly explain the promise of big data analytics and the integration of nanomaterials in precision medicine. Last, we highlight the challenges and opportunities of artificial intelligence in precision medicine.
Collapse
Affiliation(s)
- Mehar Sahu
- Molecular Neuroscience and Functional Genomics Laboratory, Delhi Technological University (Formerly Delhi College of Engineering), Shahbad Daulatpur, Delhi, India
| | - Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Delhi Technological University (Formerly Delhi College of Engineering), Shahbad Daulatpur, Delhi, India
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Delhi Technological University (Formerly Delhi College of Engineering), Shahbad Daulatpur, Delhi, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Delhi Technological University (Formerly Delhi College of Engineering), Shahbad Daulatpur, Delhi, India.
| |
Collapse
|
32
|
Abstract
"De novo" genes evolve from previously non-genic DNA. This strikes many of us as remarkable, because it seems extraordinarily unlikely that random sequence would produce a functional gene. How is this possible? In this two-part review, I first summarize what is known about the origins and molecular functions of the small number of de novo genes for which such information is available. I then speculate on what these examples may tell us about how de novo genes manage to emerge despite what seem like enormous opposing odds.
Collapse
Affiliation(s)
- Caroline M Weisman
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
33
|
Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning. PLoS Comput Biol 2022; 18:e1010238. [PMID: 35767567 PMCID: PMC9275697 DOI: 10.1371/journal.pcbi.1010238] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 07/12/2022] [Accepted: 05/23/2022] [Indexed: 02/07/2023] Open
Abstract
A major challenge to the characterization of intrinsically disordered regions (IDRs), which are widespread in the proteome, but relatively poorly understood, is the identification of molecular features that mediate functions of these regions, such as short motifs, amino acid repeats and physicochemical properties. Here, we introduce a proteome-scale feature discovery approach for IDRs. Our approach, which we call “reverse homology”, exploits the principle that important functional features are conserved over evolution. We use this as a contrastive learning signal for deep learning: given a set of homologous IDRs, the neural network has to correctly choose a held-out homolog from another set of IDRs sampled randomly from the proteome. We pair reverse homology with a simple architecture and standard interpretation techniques, and show that the network learns conserved features of IDRs that can be interpreted as motifs, repeats, or bulk features like charge or amino acid propensities. We also show that our model can be used to produce visualizations of what residues and regions are most important to IDR function, generating hypotheses for uncharacterized IDRs. Our results suggest that feature discovery using unsupervised neural networks is a promising avenue to gain systematic insight into poorly understood protein sequences. Intrinsically disordered regions (IDRs) are widespread in proteins but are poorly understood on a systematic level because they evolve too rapidly for classic bioinformatics methods to be effective. We designed a neural network that learns what features (for example, electrostatic charge, or the presence of certain motifs) might be important to the function of IDRs, even when we don’t have prior knowledge of function. Our neural network learns by exploiting principles of evolution. Important features tend to be conserved over species, so guessing what sequences evolved from the same common ancestor helps the neural network identify these features. Importantly, training a neural network this way can be defined as a fully automatic operation, so no manual effort is required. After our neural network is trained, we can apply interpretation techniques to understand what kinds of features are important to IDRs globally in the proteome, and to form hypotheses about specific IDRs. We show that many of the features our neural network learns are consistent with features we already know to be important to IDRs. We hope that our neural network can be applied to help biologists form hypotheses about poorly characterized IDRs.
Collapse
|
34
|
Knight A, Piskacek M. Cryptic inhibitory regions nearby activation domains. Biochimie 2022; 200:19-26. [PMID: 35561946 DOI: 10.1016/j.biochi.2022.05.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Revised: 04/23/2022] [Accepted: 05/05/2022] [Indexed: 11/27/2022]
Abstract
Previously, the Nine amino acid TransActivation Domain (9aaTAD) was identified in the Gal4 region 862-870 (DDVYNYLFD). Here, we identified 9aaTADs in the distal Gal4 orthologs by our prediction algorithm and found their conservation in the family. The 9aaTAD function as strong activators was demonstrated. We identified adjacent Gal4 region 871-811 (DEDTPPNPKKE) as a natural 9aaTAD inhibitory domain located at the extreme Gal4 terminus. Moreover, we identified conserved Gal4 region 172-185 (FDWSEEDDMSDGLP), which was capable to reverse the 9aaTAD inhibition. In conclusion, our results uncover the existence of the cryptic inhibitory domains, which need to be carefully implemented in all functional studies with transcription factors to avoid incorrect conclusions.
Collapse
Affiliation(s)
- Andrea Knight
- Department of Pathological Physiology, Faculty of Medicine, Masaryk University Brno, Kamenice 5, 625 00, Brno, Czech Republic
| | - Martin Piskacek
- Department of Pathological Physiology, Faculty of Medicine, Masaryk University Brno, Kamenice 5, 625 00, Brno, Czech Republic.
| |
Collapse
|
35
|
Bi XA, Li L, Wang Z, Wang Y, Luo X, Xu L. IHGC-GAN: influence hypergraph convolutional generative adversarial network for risk prediction of late mild cognitive impairment based on imaging genetic data. Brief Bioinform 2022; 23:6554128. [PMID: 35348583 DOI: 10.1093/bib/bbac093] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Revised: 01/28/2022] [Accepted: 02/23/2022] [Indexed: 11/13/2022] Open
Abstract
Predicting disease progression in the initial stage to implement early intervention and treatment can effectively prevent the further deterioration of the condition. Traditional methods for medical data analysis usually fail to perform well because of their incapability for mining the correlation pattern of pathogenies. Therefore, many calculation methods have been excavated from the field of deep learning. In this study, we propose a novel method of influence hypergraph convolutional generative adversarial network (IHGC-GAN) for disease risk prediction. First, a hypergraph is constructed with genes and brain regions as nodes. Then, an influence transmission model is built to portray the associations between nodes and the transmission rule of disease information. Third, an IHGC-GAN method is constructed based on this model. This method innovatively combines the graph convolutional network (GCN) and GAN. The GCN is used as the generator in GAN to spread and update the lesion information of nodes in the brain region-gene hypergraph. Finally, the prediction accuracy of the method is improved by the mutual competition and repeated iteration between generator and discriminator. This method can not only capture the evolutionary pattern from early mild cognitive impairment (EMCI) to late MCI (LMCI) but also extract the pathogenic factors and predict the deterioration risk from EMCI to LMCI. The results on the two datasets indicate that the IHGC-GAN method has better prediction performance than the advanced methods in a variety of indicators.
Collapse
Affiliation(s)
- Xia-An Bi
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, and the College of Information Science and Engineering in Hunan Normal University, Changsha 410081, P.R. China
| | - Lou Li
- Department of Computing, School of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Zizheng Wang
- Department of Computing, School of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Yu Wang
- Department of Computing, School of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Xun Luo
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, and the College of Information Science and Engineering in Hunan Normal University, Changsha 410081, P.R. China
| | - Luyun Xu
- College of Business, Hunan Normal University, Changsha 410081, P.R. China
| |
Collapse
|
36
|
Soto L, Li Z, Santoso CS, Berenson A, Ho I, Shen VX, Yuan S, Bass JIF. Compendium of human transcription factor effector domains. Mol Cell 2022; 82:514-526. [PMID: 34863368 PMCID: PMC8818021 DOI: 10.1016/j.molcel.2021.11.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 10/16/2021] [Accepted: 11/03/2021] [Indexed: 02/08/2023]
Abstract
Transcription factors (TFs) regulate gene expression by binding to DNA sequences and modulating transcriptional activity through their effector domains. Despite the central role of effector domains in TF function, there is a current lack of a comprehensive resource and characterization of effector domains. Here, we provide a catalog of 924 effector domains across 594 human TFs. Using this catalog, we characterized the amino acid composition of effector domains, their conservation across species and across the human population, and their roles in human diseases. Furthermore, we provide a classification system for effector domains that constitutes a valuable resource and a blueprint for future experimental studies of TF effector domain function.
Collapse
Affiliation(s)
- Luis Soto
- Escuela Profesional de Genética y Biotecnología, Facultad de Ciencias Biológicas, Universidad Nacional Mayor de San Marcos, Lima 15081, Perú
| | - Zhaorong Li
- Bioinformatics Program, Boston University, Boston MA 02215
| | - Clarissa S Santoso
- Biology Department, Boston University, Boston MA 02215,Molecular Biology, Cellular Biology and Biochemistry Program, Boston University, Boston MA 02215
| | - Anna Berenson
- Biology Department, Boston University, Boston MA 02215,Molecular Biology, Cellular Biology and Biochemistry Program, Boston University, Boston MA 02215
| | - Isabella Ho
- Biology Department, Boston University, Boston MA 02215
| | - Vivian X Shen
- Biology Department, Boston University, Boston MA 02215
| | - Samson Yuan
- Biology Department, Boston University, Boston MA 02215
| | - Juan I Fuxman Bass
- Bioinformatics Program, Boston University, Boston MA 02215,Biology Department, Boston University, Boston MA 02215,Molecular Biology, Cellular Biology and Biochemistry Program, Boston University, Boston MA 02215,correspondence:
| |
Collapse
|
37
|
Abstract
In this issue of Molecular Cell, Alerasool et al. (2022) present a proteome-scale functional screen to systematically uncover human proteins that can activate transcription.
Collapse
Affiliation(s)
- Filip Nemčko
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria; Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, Vienna, Austria
| | - Alexander Stark
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria; Medical University of Vienna, Vienna BioCenter (VBC), Vienna, Austria.
| |
Collapse
|
38
|
Staller MV, Ramirez E, Kotha SR, Holehouse AS, Pappu RV, Cohen BA. Directed mutational scanning reveals a balance between acidic and hydrophobic residues in strong human activation domains. Cell Syst 2022; 13:334-345.e5. [PMID: 35120642 PMCID: PMC9241528 DOI: 10.1016/j.cels.2022.01.002] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 10/20/2021] [Accepted: 01/05/2022] [Indexed: 01/01/2023]
Abstract
Acidic activation domains are intrinsically disordered regions of the transcription factors that bind coactivators. The intrinsic disorder and low evolutionary conservation of activation domains have made it difficult to identify the sequence features that control activity. To address this problem, we designed thousands of variants in seven acidic activation domains and measured their activities with a high-throughput assay in human cell culture. We found that strong activation domain activity requires a balance between the number of acidic residues and aromatic and leucine residues. These findings motivated a predictor of acidic activation domains that scans the human proteome for clusters of aromatic and leucine residues embedded in regions of high acidity. This predictor identifies known activation domains and accurately predicts previously unidentified ones. Our results support a flexible acidic exposure model of activation domains in which the acidic residues solubilize hydrophobic motifs so that they can interact with coactivators. A record of this paper’s transparent peer review process is included in the supplemental information. Transcriptional activation domains are poorly conserved, intrinsically disordered regions of the transcription factors that remain difficult to predict from protein sequences. A high-throughput method reveals how strong activation domains require a balance between acidic and hydrophobic residues. This balance powers an accurate predictor of activation domains on human transcription factors.
Collapse
Affiliation(s)
- Max V Staller
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, Saint Louis, MO 63110, USA; Department of Genetics, Washington University School of Medicine in St. Louis, Saint Louis, MO 63110, USA; Center for Computational Biology, University of California Berkeley, Berkeley, CA 94720, USA.
| | - Eddie Ramirez
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, Saint Louis, MO 63110, USA; Department of Genetics, Washington University School of Medicine in St. Louis, Saint Louis, MO 63110, USA
| | - Sanjana R Kotha
- Center for Computational Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine in St. Louis, Saint Louis, MO 63110, USA; Center for Science and Engineering of Living Systems, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Rohit V Pappu
- Center for Science and Engineering of Living Systems, Washington University in St. Louis, St. Louis, MO 63130, USA; Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Barak A Cohen
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, Saint Louis, MO 63110, USA; Department of Genetics, Washington University School of Medicine in St. Louis, Saint Louis, MO 63110, USA.
| |
Collapse
|
39
|
Alerasool N, Leng H, Lin ZY, Gingras AC, Taipale M. Identification and functional characterization of transcriptional activators in human cells. Mol Cell 2022; 82:677-695.e7. [PMID: 35016035 DOI: 10.1016/j.molcel.2021.12.008] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 11/04/2021] [Accepted: 12/09/2021] [Indexed: 12/13/2022]
Abstract
Transcription is orchestrated by thousands of transcription factors (TFs) and chromatin-associated proteins, but how these are causally connected to transcriptional activation is poorly understood. Here, we conduct an unbiased proteome-scale screen to systematically uncover human proteins that activate transcription in a natural chromatin context. By combining interaction proteomics and chemical inhibitors, we delineate the preference of these transcriptional activators for specific co-activators, highlighting how even closely related TFs can function via distinct cofactors. We also identify potent transactivation domains among the hits and use AlphaFold2 to predict and experimentally validate interaction interfaces of two activation domains with BRD4. Finally, we show that many novel activators are partners in fusion events in tumors and functionally characterize a myofibroma-associated fusion between SRF and C3orf62, a potent p300-dependent activator. Our work provides a functional catalog of potent transactivators in the human proteome and a platform for discovering transcriptional regulators at genome scale.
Collapse
Affiliation(s)
- Nader Alerasool
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - He Leng
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Zhen-Yuan Lin
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Sinai Health System, Toronto, ON M5G 1X5, Canada
| | - Anne-Claude Gingras
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Sinai Health System, Toronto, ON M5G 1X5, Canada.
| | - Mikko Taipale
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada.
| |
Collapse
|
40
|
Abstract
Auxin signaling regulates growth and developmental processes in plants. The core of nuclear auxin signaling relies on just three components: TIR1/AFBs, Aux/IAAs, and ARFs. Each component is itself made up of several domains, all of which contribute to the regulation of auxin signaling. Studies of the structural aspects of these three core signaling components have deepened our understanding of auxin signaling dynamics and regulation. In addition to the structured domains of these components, intrinsically disordered regions within the proteins also impact auxin signaling outcomes. New research is beginning to uncover the role intrinsic disorder plays in auxin-regulated degradation and subcellular localization. Structured and intrinsically disordered domains affect auxin perception, protein degradation dynamics, and DNA binding. Taken together, subtle differences within the domains and motifs of each class of auxin signaling component affect signaling outcomes and specificity.
Collapse
Affiliation(s)
- Nicholas Morffy
- Department of Biology, Duke University, Durham, North Carolina 27708, USA
- Center for Science and Engineering Living Systems (CSELS), Washington University, St. Louis, Missouri 63130, USA
| | - Lucia C Strader
- Department of Biology, Duke University, Durham, North Carolina 27708, USA
- Center for Science and Engineering Living Systems (CSELS), Washington University, St. Louis, Missouri 63130, USA
- Center for Engineering Mechanobiology, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
41
|
Broyles BK, Gutierrez AT, Maris TP, Coil DA, Wagner TM, Wang X, Kihara D, Class CA, Erkine AM. Activation of gene expression by detergent-like protein domains. iScience 2021; 24:103017. [PMID: 34522860 PMCID: PMC8426559 DOI: 10.1016/j.isci.2021.103017] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 07/08/2021] [Accepted: 08/18/2021] [Indexed: 11/24/2022] Open
Abstract
The mechanisms by which transcriptional activation domains (tADs) initiate eukaryotic gene expression have been an enigma for decades because most tADs lack specificity in sequence, structure, and interactions with targets. Machine learning analysis of data sets of tAD sequences generated in vivo elucidated several functionality rules: the functional tAD sequences should (i) be devoid of or depleted with basic amino acid residues, (ii) be enriched with aromatic and acidic residues, (iii) be with aromatic residues localized mostly near the terminus of the sequence, and acidic residues localized more internally within a span of 20-30 amino acids, (iv) be with both aromatic and acidic residues preferably spread out in the sequence and not clustered, and (v) not be separated by occasional basic residues. These and other more subtle rules are not absolute, reflecting absence of a tAD consensus sequence, enormous variability, and consistent with surfactant-like tAD biochemical properties. The findings are compatible with the paradigm-shifting nucleosome detergent mechanism of gene expression activation, contributing to the development of the liquid-liquid phase separation model and the biochemistry of near-stochastic functional allosteric interactions.
Collapse
Affiliation(s)
- Bradley K Broyles
- College of Pharmacy and Health Sciences, Butler University, Indianapolis, IN 46208, USA
| | - Andrew T Gutierrez
- College of Pharmacy and Health Sciences, Butler University, Indianapolis, IN 46208, USA
| | - Theodore P Maris
- College of Pharmacy and Health Sciences, Butler University, Indianapolis, IN 46208, USA
| | - Daniel A Coil
- College of Pharmacy and Health Sciences, Butler University, Indianapolis, IN 46208, USA
| | - Thomas M Wagner
- College of Pharmacy and Health Sciences, Butler University, Indianapolis, IN 46208, USA
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | - Caleb A Class
- College of Pharmacy and Health Sciences, Butler University, Indianapolis, IN 46208, USA
| | - Alexandre M Erkine
- College of Pharmacy and Health Sciences, Butler University, Indianapolis, IN 46208, USA
| |
Collapse
|
42
|
Griffith D, Holehouse AS. PARROT is a flexible recurrent neural network framework for analysis of large protein datasets. eLife 2021; 10:e70576. [PMID: 34533455 PMCID: PMC8448528 DOI: 10.7554/elife.70576] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 09/06/2021] [Indexed: 11/29/2022] Open
Abstract
The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.
Collapse
Affiliation(s)
- Daniel Griffith
- Department of Biochemistry and Molecular Biophysics, Washington University School of MedicineSt LouisUnited States
- Center for Science and Engineering Living Systems, Washington UniversitySt LouisUnited States
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of MedicineSt LouisUnited States
- Center for Science and Engineering Living Systems, Washington UniversitySt LouisUnited States
| |
Collapse
|
43
|
Lindorff-Larsen K, Kragelund BB. On the potential of machine learning to examine the relationship between sequence, structure, dynamics and function of intrinsically disordered proteins. J Mol Biol 2021; 433:167196. [PMID: 34390736 DOI: 10.1016/j.jmb.2021.167196] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 08/03/2021] [Accepted: 08/04/2021] [Indexed: 11/29/2022]
Abstract
Intrinsically disordered proteins (IDPs) constitute a broad set of proteins with few uniting and many diverging properties. IDPs-and intrinsically disordered regions (IDRs) interspersed between folded domains-are generally characterized as having no persistent tertiary structure; instead they interconvert between a large number of different and often expanded structures. IDPs and IDRs are involved in an enormously wide range of biological functions and reveal novel mechanisms of interactions, and while they defy the common structure-function paradigm of folded proteins, their structural preferences and dynamics are important for their function. We here discuss open questions in the field of IDPs and IDRs, focusing on areas where machine learning and other computational methods play a role. We discuss computational methods aimed to predict transiently formed local and long-range structure, including methods for integrative structural biology. We discuss the many different ways in which IDPs and IDRs can bind to other molecules, both via short linear motifs, as well as in the formation of larger dynamic complexes such as biomolecular condensates. We discuss how experiments are providing insight into such complexes and may enable more accurate predictions. Finally, we discuss the role of IDPs in disease and how new methods are needed to interpret the mechanistic effects of genomic variants in IDPs.
Collapse
Affiliation(s)
- Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory & Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen. Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - Birthe B Kragelund
- Structural Biology and NMR Laboratory & Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen. Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| |
Collapse
|
44
|
Sanborn AL, Yeh BT, Feigerle JT, Hao CV, Townshend RJ, Lieberman Aiden E, Dror RO, Kornberg RD. Simple biochemical features underlie transcriptional activation domain diversity and dynamic, fuzzy binding to Mediator. eLife 2021; 10:68068. [PMID: 33904398 PMCID: PMC8137143 DOI: 10.7554/elife.68068] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 04/25/2021] [Indexed: 01/07/2023] Open
Abstract
Gene activator proteins comprise distinct DNA-binding and transcriptional activation domains (ADs). Because few ADs have been described, we tested domains tiling all yeast transcription factors for activation in vivo and identified 150 ADs. By mRNA display, we showed that 73% of ADs bound the Med15 subunit of Mediator, and that binding strength was correlated with activation. AD-Mediator interaction in vitro was unaffected by a large excess of free activator protein, pointing to a dynamic mechanism of interaction. Structural modeling showed that ADs interact with Med15 without shape complementarity (‘fuzzy’ binding). ADs shared no sequence motifs, but mutagenesis revealed biochemical and structural constraints. Finally, a neural network trained on AD sequences accurately predicted ADs in human proteins and in other yeast proteins, including chromosomal proteins and chromatin remodeling complexes. These findings solve the longstanding enigma of AD structure and function and provide a rationale for their role in biology. Cells adapt and respond to changes by regulating the activity of their genes. To turn genes on or off, they use a family of proteins called transcription factors. Transcription factors influence specific but overlapping groups of genes, so that each gene is controlled by several transcription factors that act together like a dimmer switch to regulate gene activity. The presence of transcription factors attracts proteins such as the Mediator complex, which activates genes by gathering the protein machines that read the genes. The more transcription factors are found near a specific gene, the more strongly they attract Mediator and the more active the gene is. A specific region on the transcription factor called the activation domain is necessary for this process. The biochemical sequences of these domains vary greatly between species, yet activation domains from, for example, yeast and human proteins are often interchangeable. To understand why this is the case, Sanborn et al. analyzed the genome of baker’s yeast and identified 150 activation domains, each very different in sequence. Three-quarters of them bound to a subunit of the Mediator complex called Med15. Sanborn et al. then developed a machine learning algorithm to predict activation domains in both yeast and humans. This algorithm also showed that negatively charged and greasy regions on the activation domains were essential to be activated by the Mediator complex. Further analyses revealed that activation domains used different poses to bind multiple sites on Med15, a behavior known as ‘fuzzy’ binding. This creates a high overall affinity even though the binding strength at each individual site is low, enabling the protein complexes to remain dynamic. These weak interactions together permit fine control over the activity of several genes, allowing cells to respond quickly and precisely to many changes. The computer algorithm used here provides a new way to identify activation domains across species and could improve our understanding of how living things grow, adapt and evolve. It could also give new insights into mechanisms of disease, particularly cancer, where transcription factors are often faulty.
Collapse
Affiliation(s)
- Adrian L Sanborn
- Department of Structural Biology, Stanford University School of Medicine, Stanford, United States.,Department of Computer Science, Stanford University, Stanford, United States
| | - Benjamin T Yeh
- Department of Computer Science, Stanford University, Stanford, United States
| | - Jordan T Feigerle
- Department of Structural Biology, Stanford University School of Medicine, Stanford, United States
| | - Cynthia V Hao
- Department of Structural Biology, Stanford University School of Medicine, Stanford, United States
| | | | - Erez Lieberman Aiden
- The Center for Genome Architecture, Baylor College of Medicine, Houston, United States.,Center for Theoretical Biological Physics, Rice University, Houston, United States
| | - Ron O Dror
- Department of Computer Science, Stanford University, Stanford, United States
| | - Roger D Kornberg
- Department of Structural Biology, Stanford University School of Medicine, Stanford, United States
| |
Collapse
|
45
|
Mediator subunit Med15 dictates the conserved "fuzzy" binding mechanism of yeast transcription activators Gal4 and Gcn4. Nat Commun 2021; 12:2220. [PMID: 33850123 PMCID: PMC8044209 DOI: 10.1038/s41467-021-22441-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 03/11/2021] [Indexed: 02/05/2023] Open
Abstract
The acidic activation domain (AD) of yeast transcription factor Gal4 plays a dual role in transcription repression and activation through binding to Gal80 repressor and Mediator subunit Med15. The activation function of Gal4 arises from two hydrophobic regions within the 40-residue AD. We show by NMR that each AD region binds the Mediator subunit Med15 using a “fuzzy” protein interface. Remarkably, comparison of chemical shift perturbations shows that Gal4 and Gcn4, two intrinsically disordered ADs of different sequence, interact nearly identically with Med15. The finding that two ADs of different sequence use an identical fuzzy binding mechanism shows a common sequence-independent mechanism for AD-Mediator binding, similar to interactions within a hydrophobic cloud. In contrast, the same region of Gal4 AD interacts strongly with Gal80 via a distinct structured complex, implying that the structured binding partner of an intrinsically disordered protein dictates the type of protein–protein interaction. The intrinsically disordered acidic activation domain (AD) of the yeast transcription factor Gal4 acts through binding to the Med15 subunit of the Mediator complex. Here, the authors show that Gal4 interacts with Med15 through an identical fuzzy binding mechanism as Gcn4 AD, which has a different sequence, revealing a common sequence-independent mechanism for AD-Mediator binding. In contrast, Gal4 AD binds to the Gal80 repressor as a structured polypeptide, which strongly suggests that the structured binding partner dictates the type of protein–protein interaction for an intrinsically disordered protein.
Collapse
|
46
|
Sabari BR. Biomolecular Condensates and Gene Activation in Development and Disease. Dev Cell 2021; 55:84-96. [PMID: 33049213 DOI: 10.1016/j.devcel.2020.09.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 08/18/2020] [Accepted: 09/04/2020] [Indexed: 01/04/2023]
Abstract
Activating the right gene at the right time and place is essential for development. Emerging evidence suggests that this process is regulated by the mesoscale compartmentalization of the gene-control machinery, RNA polymerase II and its cofactors, within biomolecular condensates. Coupling gene activity to the reversible and dynamic process of condensate formation is proposed to enable the robust and precise changes in gene-regulatory programs during signaling and development. The macromolecular features that enable condensates and the regulatory pathways that control them are dysregulated in disease, highlighting their importance for normal physiology. In this review, we will discuss the role of condensates in gene activation; the multivalent features of protein, RNA, and DNA that enable reversible condensate formation; and how these processes are utilized in normal and disease biology. Understanding the regulation of condensates promises to provide novel insights into how organization of the gene-control machinery regulates development and disease.
Collapse
Affiliation(s)
- Benjamin R Sabari
- Laboratory of Nuclear Organization, Cecil H. and Ida Green Center for Reproductive Biology Sciences, Division of Basic Research, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA; Department of Molecular Biology, Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
| |
Collapse
|
47
|
Bugge K, Staby L, Salladini E, Falbe-Hansen RG, Kragelund BB, Skriver K. αα-Hub domains and intrinsically disordered proteins: A decisive combo. J Biol Chem 2021; 296:100226. [PMID: 33361159 PMCID: PMC7948954 DOI: 10.1074/jbc.rev120.012928] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 12/22/2020] [Accepted: 12/22/2020] [Indexed: 01/02/2023] Open
Abstract
Hub proteins are central nodes in protein-protein interaction networks with critical importance to all living organisms. Recently, a new group of folded hub domains, the αα-hubs, was defined based on a shared αα-hairpin supersecondary structural foundation. The members PAH, RST, TAFH, NCBD, and HHD are found in large proteins such as Sin3, RCD1, TAF4, CBP, and harmonin, which organize disordered transcriptional regulators and membrane scaffolds in interactomes of importance to human diseases and plant quality. In this review, studies of structures, functions, and complexes across the αα-hubs are described and compared to provide a unified description of the group. This analysis expands the associated molecular concepts of "one domain-one binding site", motif-based ligand binding, and coupled folding and binding of intrinsically disordered ligands to additional concepts of importance to signal fidelity. These include context, motif reversibility, multivalency, complex heterogeneity, synergistic αα-hub:ligand folding, accessory binding sites, and supramodules. We propose that these multifaceted protein-protein interaction properties are made possible by the characteristics of the αα-hub fold, including supersite properties, dynamics, variable topologies, accessory helices, and malleability and abetted by adaptability of the disordered ligands. Critically, these features provide additional filters for specificity. With the presentations of new concepts, this review opens for new research questions addressing properties across the group, which are driven from concepts discovered in studies of the individual members. Combined, the members of the αα-hubs are ideal models for deconvoluting signal fidelity maintained by folded hubs and their interactions with intrinsically disordered ligands.
Collapse
Affiliation(s)
- Katrine Bugge
- REPIN and The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark; Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Lasse Staby
- REPIN and The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark; Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Edoardo Salladini
- REPIN and The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Rasmus G Falbe-Hansen
- REPIN and The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Birthe B Kragelund
- REPIN and The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark; Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Karen Skriver
- REPIN and The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
48
|
Serebreni L, Stark A. Insights into gene regulation: From regulatory genomic elements to DNA-protein and protein-protein interactions. Curr Opin Cell Biol 2020; 70:58-66. [PMID: 33385708 DOI: 10.1016/j.ceb.2020.11.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 11/19/2020] [Accepted: 11/29/2020] [Indexed: 01/19/2023]
Abstract
Transcription is orchestrated by non-coding regulatory elements embedded in chromatin, which exist within the larger context of chromosome topology. Here, we review recent insights into the functions of non-coding regulatory elements and their protein interactors during transcription control. A picture emerges in which the topological environment constraints enhancer-promoter interactions and specific enhancer-bound proteins with distinct promoter-compatibilities refine target promoter choice. Such compatibilities are encoded within the sequences of enhancers and promoters and realized by diverse transcription factors and cofactors with distinct biochemical activities. An emerging property of transcription factors and cofactors is the formation of nuclear microenvironments or membraneless compartments that can have properties of phase-separated liquids. These environments are able to selectively enrich certain proteins and small molecules over others. Further investigation into the interaction of transcriptional regulators with themselves and regulatory DNA elements will help reveal the complexities of gene regulation within the context of the nucleus.
Collapse
Affiliation(s)
- Leonid Serebreni
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria
| | - Alexander Stark
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria; Medical University of Vienna, Vienna BioCenter (VBC), Vienna, Austria.
| |
Collapse
|
49
|
Tycko J, DelRosso N, Hess GT, Aradhana, Banerjee A, Mukund A, Van MV, Ego BK, Yao D, Spees K, Suzuki P, Marinov GK, Kundaje A, Bassik MC, Bintu L. High-Throughput Discovery and Characterization of Human Transcriptional Effectors. Cell 2020; 183:2020-2035.e16. [PMID: 33326746 PMCID: PMC8178797 DOI: 10.1016/j.cell.2020.11.024] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 09/22/2020] [Accepted: 11/13/2020] [Indexed: 02/07/2023]
Abstract
Thousands of proteins localize to the nucleus; however, it remains unclear which contain transcriptional effectors. Here, we develop HT-recruit, a pooled assay where protein libraries are recruited to a reporter, and their transcriptional effects are measured by sequencing. Using this approach, we measure gene silencing and activation for thousands of domains. We find a relationship between repressor function and evolutionary age for the KRAB domains, discover that Homeodomain repressor strength is collinear with Hox genetic organization, and identify activities for several domains of unknown function. Deep mutational scanning of the CRISPRi KRAB maps the co-repressor binding surface and identifies substitutions that improve stability/silencing. By tiling 238 proteins, we find repressors as short as ten amino acids. Finally, we report new activator domains, including a divergent KRAB. These results provide a resource of 600 human proteins containing effectors and demonstrate a scalable strategy for assigning functions to protein domains.
Collapse
Affiliation(s)
- Josh Tycko
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Nicole DelRosso
- Biophysics Program, Stanford University, Stanford, CA 94305, USA
| | - Gaelen T Hess
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Aradhana
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | | | - Aditya Mukund
- Biophysics Program, Stanford University, Stanford, CA 94305, USA
| | - Mike V Van
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Braeden K Ego
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - David Yao
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Kaitlyn Spees
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Peter Suzuki
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Georgi K Marinov
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA 94305, USA; Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Michael C Bassik
- Department of Genetics, Stanford University, Stanford, CA 94305, USA.
| | - Lacramioara Bintu
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
50
|
Salladini E, Jørgensen MLM, Theisen FF, Skriver K. Intrinsic Disorder in Plant Transcription Factor Systems: Functional Implications. Int J Mol Sci 2020; 21:E9755. [PMID: 33371315 PMCID: PMC7767404 DOI: 10.3390/ijms21249755] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 12/17/2020] [Accepted: 12/18/2020] [Indexed: 01/07/2023] Open
Abstract
Eukaryotic cells are complex biological systems that depend on highly connected molecular interaction networks with intrinsically disordered proteins as essential components. Through specific examples, we relate the conformational ensemble nature of intrinsic disorder (ID) in transcription factors to functions in plants. Transcription factors contain large regulatory ID-regions with numerous orphan sequence motifs, representing potential important interaction sites. ID-regions may affect DNA-binding through electrostatic interactions or allosterically as for the bZIP transcription factors, in which the DNA-binding domains also populate ensembles of dynamic transient structures. The flexibility of ID is well-suited for interaction networks requiring efficient molecular adjustments. For example, Radical Induced Cell Death1 depends on ID in transcription factors for its numerous, structurally heterogeneous interactions, and the JAZ:MYC:MED15 regulatory unit depends on protein dynamics, including binding-associated unfolding, for regulation of jasmonate-signaling. Flexibility makes ID-regions excellent targets of posttranslational modifications. For example, the extent of phosphorylation of the NAC transcription factor SOG1 regulates target gene expression and the DNA-damage response, and phosphorylation of the AP2/ERF transcription factor DREB2A acts as a switch enabling heat-regulated degradation. ID-related phase separation is emerging as being important to transcriptional regulation with condensates functioning in storage and inactivation of transcription factors. The applicative potential of ID-regions is apparent, as removal of an ID-region of the AP2/ERF transcription factor WRI1 affects its stability and consequently oil biosynthesis. The highlighted examples show that ID plays essential functional roles in plant biology and has a promising potential in engineering.
Collapse
Affiliation(s)
| | | | | | - Karen Skriver
- REPIN and the Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200 Copenhagen, Denmark; (E.S.); (M.L.M.J.); (F.F.T.)
| |
Collapse
|