1
|
Roessner C, Griep S, Becker A. A land plant phylogenetic framework for GLABROUS INFLORESCENCE STEMS (GIS), SUPERMAN, JAGGED and allies plus their TOPLESS co-repressor. Mol Phylogenet Evol 2024; 201:108195. [PMID: 39260627 DOI: 10.1016/j.ympev.2024.108195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 08/27/2024] [Accepted: 09/07/2024] [Indexed: 09/13/2024]
Abstract
Members of the plant specific family of C1-1i zincfinger transcriptionfactors (ZF-TFs), such as SUPERMAN, JAGGED, KNUCKLES or GIS,regulatediversedevelopmental processes including sexual reproduction. C1-1is consist of one zinc-finger and one to two EAR domains, connected by large intrinsically disordered regions (IDR). While the role of C1-i1 ZF-TFs in development processes is well known for some genes in Arabidopsis, rice or tomatoa comprehensive and broadphylogenetic background is lacking, yet knowledge of orthology is a requirement for a better understanding of C1-1i-Zf-TFs diverse roles in plants. Here, we provide a fine-grained and land plant wide classification of C1-1i sub-families and their known co-repressors TOPLESS and TOPLESS RELATED. Our work combines the identification of orthologous groups with Maximum-Likelihood phylogeny reconstructions and digital gene expression analyses mining high quality land plant genomes and transcriptomes to generate a comprehensive framework of C1-1i ZF-TF evolution. We show that C1-1i's are low to moderate copy genesand that orthologous genesonly partiallyhaveconserved sub-family and life cycle stage dependent expression pattern across land plants while others are highly diverged. Our workprovides the phylogenetic framework for C1-1i ZF-TFs, s and strengthen C1-1 ZF-TFs as a potential model for IDR-research in plants.
Collapse
Affiliation(s)
| | - Sven Griep
- Bioinformatics and Systems Biology, Justus-Liebig-University, Giessen, Germany
| | - Annette Becker
- Institute of Botany, Justus-Liebig-University, Giessen, Germany.
| |
Collapse
|
2
|
Halpin JC, Keating AE. PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.23.604860. [PMID: 39091826 PMCID: PMC11291154 DOI: 10.1101/2024.07.23.604860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Protein-protein interactions are often mediated by a modular peptide recognition domain binding to a short linear motif (SLiM) in the disordered region of another protein. The ability to predict domain-SLiM interactions would allow researchers to map protein interaction networks, predict the effects of perturbations to those networks, and develop biologically meaningful hypotheses. Unfortunately, sequence database searches for SLiMs generally yield mostly biologically irrelevant motif matches or false positives. To improve the prediction of novel SLiM interactions, researchers employ filters to discriminate between biologically relevant and improbable motif matches. One promising criterion for identifying biologically relevant SLiMs is the sequence conservation of the motif, exploiting the fact that functional motifs are more likely to be conserved than spurious motif matches. However, the difficulty of aligning disordered regions has significantly hampered the utility of this approach. We present PairK (pairwise k-mer alignment), an MSA-free method to quantify motif conservation in disordered regions. PairK outperforms both standard MSA-based conservation scores and a modern LLM-based conservation score predictor on the task of identifying biologically important motif instances. PairK can quantify conservation over wider phylogenetic distances than MSAs, indicating that SLiMs may be more conserved than is implied by MSA-based metrics. PairK is available as open-source code at https://github.com/jacksonh1/pairk.
Collapse
Affiliation(s)
- Jackson C. Halpin
- MIT Department of Biology, 77 Massachusetts Ave., Cambridge, MA 02139
| | - Amy E. Keating
- MIT Department of Biology, 77 Massachusetts Ave., Cambridge, MA 02139
- MIT Department of Biological Engineering, 77 Massachusetts Ave., Cambridge, MA 02139
- Koch Institute for Integrative Cancer Research, 77 Massachusetts Ave., Cambridge, MA 02139
| |
Collapse
|
3
|
Hocher A, Warnecke T. Nucleosomes at the Dawn of Eukaryotes. Genome Biol Evol 2024; 16:evae029. [PMID: 38366053 PMCID: PMC10919886 DOI: 10.1093/gbe/evae029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/09/2024] [Accepted: 02/11/2024] [Indexed: 02/18/2024] Open
Abstract
Genome regulation in eukaryotes revolves around the nucleosome, the fundamental building block of eukaryotic chromatin. Its constituent parts, the four core histones (H3, H4, H2A, H2B), are universal to eukaryotes. Yet despite its exceptional conservation and central role in orchestrating transcription, repair, and other DNA-templated processes, the origins and early evolution of the nucleosome remain opaque. Histone-fold proteins are also found in archaea, but the nucleosome we know-a hetero-octameric complex composed of histones with long, disordered tails-is a hallmark of eukaryotes. What were the properties of the earliest nucleosomes? Did ancestral histones inevitably assemble into nucleosomes? When and why did the four core histones evolve? This review will look at the evolution of the eukaryotic nucleosome from the vantage point of archaea, focusing on the key evolutionary transitions required to build a modern nucleosome. We will highlight recent work on the closest archaeal relatives of eukaryotes, the Asgardarchaea, and discuss what their histones can and cannot tell us about the early evolution of eukaryotic chromatin. We will also discuss how viruses have become an unexpected source of information about the evolutionary path toward the nucleosome. Finally, we highlight the properties of early nucleosomes as an area where new tools and data promise tangible progress in the not-too-distant future.
Collapse
Affiliation(s)
- Antoine Hocher
- Medical Research Council Laboratory of Medical Sciences, London, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK
| | - Tobias Warnecke
- Medical Research Council Laboratory of Medical Sciences, London, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK
- Trinity College, University of Oxford, Oxford, UK
| |
Collapse
|
4
|
Holehouse AS, Kragelund BB. The molecular basis for cellular function of intrinsically disordered protein regions. Nat Rev Mol Cell Biol 2024; 25:187-211. [PMID: 37957331 DOI: 10.1038/s41580-023-00673-0] [Citation(s) in RCA: 61] [Impact Index Per Article: 61.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/26/2023] [Indexed: 11/15/2023]
Abstract
Intrinsically disordered protein regions exist in a collection of dynamic interconverting conformations that lack a stable 3D structure. These regions are structurally heterogeneous, ubiquitous and found across all kingdoms of life. Despite the absence of a defined 3D structure, disordered regions are essential for cellular processes ranging from transcriptional control and cell signalling to subcellular organization. Through their conformational malleability and adaptability, disordered regions extend the repertoire of macromolecular interactions and are readily tunable by their structural and chemical context, making them ideal responders to regulatory cues. Recent work has led to major advances in understanding the link between protein sequence and conformational behaviour in disordered regions, yet the link between sequence and molecular function is less well defined. Here we consider the biochemical and biophysical foundations that underlie how and why disordered regions can engage in productive cellular functions, provide examples of emerging concepts and discuss how protein disorder contributes to intracellular information processing and regulation of cellular function.
Collapse
Affiliation(s)
- Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St Louis, MO, USA.
- Center for Biomolecular Condensates, Washington University in St Louis, St Louis, MO, USA.
| | - Birthe B Kragelund
- REPIN, Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
5
|
Taneja I, Lasker K. Machine-learning-based methods to generate conformational ensembles of disordered proteins. Biophys J 2024; 123:101-113. [PMID: 38053335 PMCID: PMC10808026 DOI: 10.1016/j.bpj.2023.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 10/24/2023] [Accepted: 12/01/2023] [Indexed: 12/07/2023] Open
Abstract
Intrinsically disordered proteins are characterized by a conformational ensemble. While computational approaches such as molecular dynamics simulations have been used to generate such ensembles, their computational costs can be prohibitive. An alternative approach is to learn from data and train machine-learning models to generate conformational ensembles of disordered proteins. This has been a relatively unexplored approach, and in this work we demonstrate a proof-of-principle approach to do so. Specifically, we devised a two-stage computational pipeline: in the first stage, we employed supervised machine-learning models to predict ensemble-derived two-dimensional (2D) properties of a sequence, given the conformational ensemble of a closely related sequence. In the second stage, we used denoising diffusion models to generate three-dimensional (3D) coarse-grained conformational ensembles, given the two-dimensional predictions outputted by the first stage. We trained our models on a data set of coarse-grained molecular dynamics simulations of thousands of rationally designed synthetic sequences. The accuracy of our 2D and 3D predictions was validated across multiple metrics, and our work demonstrates the applicability of machine-learning techniques to predicting higher-dimensional properties of disordered proteins.
Collapse
Affiliation(s)
- Ishan Taneja
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California
| | - Keren Lasker
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California.
| |
Collapse
|
6
|
Schuck P, Zhao H. Diversity of short linear interaction motifs in SARS-CoV-2 nucleocapsid protein. mBio 2023; 14:e0238823. [PMID: 38018991 PMCID: PMC10746173 DOI: 10.1128/mbio.02388-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 10/16/2023] [Indexed: 11/30/2023] Open
Abstract
IMPORTANCE Short linear motifs (SLiMs) are 3-10 amino acid long binding motifs in intrinsically disordered protein regions (IDRs) that serve as ubiquitous protein-protein interaction modules in eukaryotic cells. Through molecular mimicry, viruses hijack these sequence motifs to control host cellular processes. It is thought that the small size of SLiMs and the high mutation frequencies of viral IDRs allow rapid host adaptation. However, a salient characteristic of RNA viruses, due to high replication errors, is their obligate existence as mutant swarms. Taking advantage of the uniquely large genomic database of SARS-CoV-2, here, we analyze the role of sequence diversity in the presentation of SLiMs, focusing on the highly abundant, multi-functional nucleocapsid protein. We find that motif mimicry is a highly dynamic process that produces an abundance of motifs transiently present in subsets of mutant species. This diversity allows the virus to efficiently explore eukaryotic motifs and evolve the host-virus interface.
Collapse
Affiliation(s)
- Peter Schuck
- Laboratory of Dynamics of Macromolecular Assembly, National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Bethesda, Maryland, USA
| | - Huaying Zhao
- Laboratory of Dynamics of Macromolecular Assembly, National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
7
|
Alston JJ, Soranno A, Holehouse AS. Conserved molecular recognition by an intrinsically disordered region in the absence of sequence conservation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.06.552128. [PMID: 37609146 PMCID: PMC10441348 DOI: 10.1101/2023.08.06.552128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Intrinsically disordered regions (IDRs) are critical for cellular function, yet often appear to lack sequence conservation when assessed by multiple sequence alignments. This raises the question of if and how function can be encoded and preserved in these regions despite massive sequence variation. To address this question, we have applied coarse-grained molecular dynamics simulations to investigate non-specific RNA binding of coronavirus nucleocapsid proteins. Coronavirus nucleocapsid proteins consist of multiple interspersed disordered and folded domains that bind RNA. We focussed here on the first two domains of coronavirus nucleocapsid proteins, the disordered N-terminal domain (NTD) followed by the folded RNA binding domain (RBD). While the NTD is highly variable across evolution, the RBD is structurally conserved. This combination makes the NTD-RBD a convenient model system to explore the interplay between an IDR adjacent to a folded domain, and how changes in IDR sequence can influence molecular recognition of a partner. Our results reveal a surprising degree of sequence-specificity encoded by both the composition and the precise order of the amino acids in the NTD. The presence of an NTD can - depending on the sequence - either suppress or enhance RNA binding. Despite this sensitivity, large-scale variation in NTD sequences is possible while certain sequence features are retained. Consequently, a conformationally-conserved fuzzy RNA:protein complex is found across nucleocapsid protein orthologs, despite large-scale changes in both NTD sequence and RBD surface chemistry. Taken together, these insights shed light on the ability of disordered regions to preserve functional characteristics despite their sequence variability.
Collapse
|
8
|
Schuck P, Zhao H. Diversity of Short Linear Interaction Motifs in SARS-CoV-2 Nucleocapsid Protein. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.01.551467. [PMID: 37790474 PMCID: PMC10542142 DOI: 10.1101/2023.08.01.551467] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Molecular mimicry of short linear interaction motifs has emerged as a key mechanism for viral proteins binding host domains and hijacking host cell processes. Here, we examine the role of RNA-virus sequence diversity in the dynamics of the virus-host interface, by analyzing the uniquely vast sequence record of viable SARS-CoV-2 species with focus on the multi-functional nucleocapsid protein. We observe the abundant presentation of motifs encoding several essential host protein interactions, alongside a majority of possibly non-functional and randomly occurring motif sequences absent in subsets of viable virus species. A large number of motifs emerge ex nihilo through transient mutations relative to the ancestral consensus sequence. The observed mutational landscape implies an accessible motif space that spans at least 25% of known eukaryotic motifs. This reveals motif mimicry as a highly dynamic process with the capacity to broadly explore host motifs, allowing the virus to rapidly evolve the virus-host interface.
Collapse
Affiliation(s)
- Peter Schuck
- Laboratory of Dynamics of Macromolecular Assembly, National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Bethesda, MD 20892, USA
| | - Huaying Zhao
- Laboratory of Dynamics of Macromolecular Assembly, National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
9
|
Abstract
Multivalent proteins and nucleic acids, collectively referred to as multivalent associative biomacromolecules, provide the driving forces for the formation and compositional regulation of biomolecular condensates. Here, we review the key concepts of phase transitions of aqueous solutions of associative biomacromolecules, specifically proteins that include folded domains and intrinsically disordered regions. The phase transitions of these systems come under the rubric of coupled associative and segregative transitions. The concepts underlying these processes are presented, and their relevance to biomolecular condensates is discussed.
Collapse
Affiliation(s)
- Rohit V Pappu
- Department of Biomedical Engineering, Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Samuel R Cohen
- Department of Biomedical Engineering, Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, Missouri 63130, United States
- Center of Regenerative Medicine, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Furqan Dar
- Department of Biomedical Engineering, Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Mina Farag
- Department of Biomedical Engineering, Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Mrityunjoy Kar
- Max Planck Institute of Cell Biology and Genetics, 01307 Dresden, Germany
| |
Collapse
|
10
|
Evolution of SLiM-mediated hijack functions in intrinsically disordered viral proteins. Essays Biochem 2022; 66:945-958. [DOI: 10.1042/ebc20220059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 11/07/2022] [Accepted: 11/08/2022] [Indexed: 12/07/2022]
Abstract
Abstract
Viruses and their hosts are involved in an ‘arms race’ where they continually evolve mechanisms to overcome each other. It has long been proposed that intrinsic disorder provides a substrate for the evolution of viral hijack functions and that short linear motifs (SLiMs) are important players in this process. Here, we review evidence in support of this tenet from two model systems: the papillomavirus E7 protein and the adenovirus E1A protein. Phylogenetic reconstructions reveal that SLiMs appear and disappear multiple times across evolution, providing evidence of convergent evolution within individual viral phylogenies. Multiple functionally related SLiMs show strong coevolution signals that persist across long distances in the primary sequence and occur in unrelated viral proteins. Moreover, changes in SLiMs are associated with changes in phenotypic traits such as host range and tropism. Tracking viral evolutionary events reveals that host switch events are associated with the loss of several SLiMs, suggesting that SLiMs are under functional selection and that changes in SLiMs support viral adaptation. Fine-tuning of viral SLiM sequences can improve affinity, allowing them to outcompete host counterparts. However, viral SLiMs are not always competitive by themselves, and tethering of two suboptimal SLiMs by a disordered linker may instead enable viral hijack. Coevolution between the SLiMs and the linker indicates that the evolution of disordered regions may be more constrained than previously thought. In summary, experimental and computational studies support a role for SLiMs and intrinsic disorder in viral hijack functions and in viral adaptive evolution.
Collapse
|