1
|
Bradley D, Garand C, Belda H, Gagnon-Arsenault I, Treeck M, Elowe S, Landry CR. The substrate quality of CK2 target sites has a determinant role on their function and evolution. Cell Syst 2024; 15:544-562.e8. [PMID: 38861992 DOI: 10.1016/j.cels.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 02/29/2024] [Accepted: 05/20/2024] [Indexed: 06/13/2024]
Abstract
Most biological processes are regulated by signaling modules that bind to short linear motifs. For protein kinases, substrates may have full or only partial matches to the kinase recognition motif, a property known as "substrate quality." However, it is not clear whether differences in substrate quality represent neutral variation or if they have functional consequences. We examine this question for the kinase CK2, which has many fundamental functions. We show that optimal CK2 sites are phosphorylated at maximal stoichiometries and found in many conditions, whereas minimal substrates are more weakly phosphorylated and have regulatory functions. Optimal CK2 sites tend to be more conserved, and substrate quality is often tuned by selection. For intermediate sites, increases or decreases in substrate quality may be deleterious, as we demonstrate for a CK2 substrate at the kinetochore. The results together suggest a strong role for substrate quality in phosphosite function and evolution. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- David Bradley
- Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec City, QC G1V 0A6, Canada; Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec City, QC G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec City, QC G1V 0A6, Canada; Centre de Recherche sur les Données Massives (CRDM), Université Laval, Québec City, QC G1V 0A6, Canada; Département de Biologie, Faculté des Sciences et de Génie, Université Laval, Québec City, QC G1V 0A6, Canada.
| | - Chantal Garand
- PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec City, QC G1V 0A6, Canada; Axe de Reproduction, Santé de la mère et de l'enfant, CHU de Québec, Université Laval, Québec City, QC, Canada
| | - Hugo Belda
- Signalling in Host-Pathogen Interaction Laboratory, The Francis Crick Institute, London NW11AT, UK
| | - Isabelle Gagnon-Arsenault
- Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec City, QC G1V 0A6, Canada; Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec City, QC G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec City, QC G1V 0A6, Canada; Centre de Recherche sur les Données Massives (CRDM), Université Laval, Québec City, QC G1V 0A6, Canada; Département de Biologie, Faculté des Sciences et de Génie, Université Laval, Québec City, QC G1V 0A6, Canada
| | - Moritz Treeck
- Signalling in Host-Pathogen Interaction Laboratory, The Francis Crick Institute, London NW11AT, UK; Cell Biology of Host-Pathogen Interaction Laboratory, The Gulbenkian Institute of Science, Oeiras 2780-156, Portugal
| | - Sabine Elowe
- PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec City, QC G1V 0A6, Canada; Axe de Reproduction, Santé de la mère et de l'enfant, CHU de Québec, Université Laval, Québec City, QC, Canada; Department of Pediatrics, Faculty of Medicine, Université Laval, Québec City, QC, Canada; Centre de Recherche sur le Cancer, CHU de Québec, Université Laval, Québec City, QC, Canada
| | - Christian R Landry
- Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec City, QC G1V 0A6, Canada; Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec City, QC G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec City, QC G1V 0A6, Canada; Centre de Recherche sur les Données Massives (CRDM), Université Laval, Québec City, QC G1V 0A6, Canada; Département de Biologie, Faculté des Sciences et de Génie, Université Laval, Québec City, QC G1V 0A6, Canada.
| |
Collapse
|
2
|
Singleton MD, Eisen MB. Evolutionary analyses of intrinsically disordered regions reveal widespread signals of conservation. PLoS Comput Biol 2024; 20:e1012028. [PMID: 38662765 PMCID: PMC11075841 DOI: 10.1371/journal.pcbi.1012028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 05/07/2024] [Accepted: 03/28/2024] [Indexed: 05/08/2024] Open
Abstract
Intrinsically disordered regions (IDRs) are segments of proteins without stable three-dimensional structures. As this flexibility allows them to interact with diverse binding partners, IDRs play key roles in cell signaling and gene expression. Despite the prevalence and importance of IDRs in eukaryotic proteomes and various biological processes, associating them with specific molecular functions remains a significant challenge due to their high rates of sequence evolution. However, by comparing the observed values of various IDR-associated properties against those generated under a simulated model of evolution, a recent study found most IDRs across the entire yeast proteome contain conserved features. Furthermore, it showed clusters of IDRs with common "evolutionary signatures," i.e. patterns of conserved features, were associated with specific biological functions. To determine if similar patterns of conservation are found in the IDRs of other systems, in this work we applied a series of phylogenetic models to over 7,500 orthologous IDRs identified in the Drosophila genome to dissect the forces driving their evolution. By comparing models of constrained and unconstrained continuous trait evolution using the Brownian motion and Ornstein-Uhlenbeck models, respectively, we identified signals of widespread constraint, indicating conservation of distributed features is mechanism of IDR evolution common to multiple biological systems. In contrast to the previous study in yeast, however, we observed limited evidence of IDR clusters with specific biological functions, which suggests a more complex relationship between evolutionary constraints and function in the IDRs of multicellular organisms.
Collapse
Affiliation(s)
- Marc D. Singleton
- Howard Hughes Medical Institute, UC Berkeley, Berkeley, California, United States of America
| | - Michael B. Eisen
- Howard Hughes Medical Institute, UC Berkeley, Berkeley, California, United States of America
- Department of Molecular and Cell Biology, UC Berkeley, Berkeley, California, United States of America
| |
Collapse
|
3
|
Saikia B, Baruah A. Recent advances in de novo computational design and redesign of intrinsically disordered proteins and intrinsically disordered protein regions. Arch Biochem Biophys 2024; 752:109857. [PMID: 38097100 DOI: 10.1016/j.abb.2023.109857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 12/10/2023] [Accepted: 12/10/2023] [Indexed: 12/17/2023]
Abstract
In the early 2000s, the concept of "unstructured biology" has emerged to be an important field in protein science by generating various new research directions. Many novel strategies and methods have been developed that are focused on effectively identifying/predicting intrinsically disordered proteins (IDPs) and intrinsically disordered protein regions (IDPRs), identifying their potential functions, disorder based drug design etc. Due to the range of functions of IDPs/IDPRs and their involvement in various debilitating diseases they are of contemporary interest to the scientific community. Recent researches are focused on designing/redesigning specific IDPs/IDPRs de novo. These de novo design/redesigns of IDPs/IDPRs are carried out by altering compositional biases and specific sequence patterning parameters. The main focus of these researches is to influence specific molecular functions, phase behavior, cellular phenotypes etc. In this review, we first provide the differences of natively folded and natively unfolded or IDPs with respect to their potential energy landscapes. Here, we provide current understandings on the different computational design strategies and methods that have been utilized in de novo design and redesigns of IDPs and IDPRs. Finally, we conclude the review by discussing the challenges that have been faced during the computational design/design attempts of IDPs/IDPRs.
Collapse
Affiliation(s)
- Bondeepa Saikia
- Department of Chemistry, Dibrugarh University, Dibrugarh, 786004, Assam, India
| | - Anupaul Baruah
- Department of Chemistry, Dibrugarh University, Dibrugarh, 786004, Assam, India.
| |
Collapse
|
4
|
Sangster AG, Zarin T, Moses AM. Evolution of short linear motifs and disordered proteins Topic: yeast as model system to study evolution. Curr Opin Genet Dev 2022; 76:101964. [PMID: 35939968 DOI: 10.1016/j.gde.2022.101964] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 06/29/2022] [Accepted: 07/08/2022] [Indexed: 11/26/2022]
Abstract
Evolutionary preservation of protein structure had a major influence on the field of molecular evolution: changes in individual amino acids that did not disrupt protein folding would either have no effect or subtly change the 'lock' so that it could fit a new 'key'. Homology of individual amino acids could be confidently assigned through sequence alignments, and models of evolution could be tested. This view of molecular evolution excluded large regions of proteins that could not be confidently aligned, such as intrinsically disordered regions (IDRs) that do not fold into stable structures. In the last decade, major progress has been made in understanding the evolution of IDRs, much of it facilitated by new experimental and computational approaches in yeast. Here, we review this progress as well as several still outstanding questions.
Collapse
Affiliation(s)
- Ami G Sangster
- Cell & Systems Biology, University of Toronto, 25 Harbord St., Toronto, ON M5S 3G5, Canada
| | - Taraneh Zarin
- Cell & Systems Biology, University of Toronto, 25 Harbord St., Toronto, ON M5S 3G5, Canada. https://twitter.com/@taraneh_z
| | - Alan M Moses
- Cell & Systems Biology, University of Toronto, 25 Harbord St., Toronto, ON M5S 3G5, Canada.
| |
Collapse
|
5
|
Kulkarni P, Behal A, Mohanty A, Salgia R, Nedelcu AM, Uversky VN. Co-opting disorder into order: Intrinsically disordered proteins and the early evolution of complex multicellularity. Int J Biol Macromol 2022; 201:29-36. [PMID: 34998872 DOI: 10.1016/j.ijbiomac.2021.12.182] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 12/18/2021] [Accepted: 12/28/2021] [Indexed: 02/07/2023]
Abstract
Intrinsically disordered proteins (IDPs) are proteins that lack rigid structures yet play important roles in myriad biological phenomena. A distinguishing feature of IDPs is that they often mediate specific biological outcomes via multivalent weak cooperative interactions with multiple partners. Here, we show that several proteins specifically associated with processes that were key in the evolution of complex multicellularity in the lineage leading to the multicellular green alga Volvox carteri are IDPs. We suggest that, by rewiring cellular protein interaction networks, IDPs facilitated the co-option of ancestral pathways for specialized multicellular functions, underscoring the importance of IDPs in the early evolution of complex multicellularity.
Collapse
Affiliation(s)
- Prakash Kulkarni
- Department of Medical Oncology and Experimental Therapeutics, City of Hope National Medical Center, Duarte, CA, USA.
| | - Amita Behal
- Department of Medical Oncology and Experimental Therapeutics, City of Hope National Medical Center, Duarte, CA, USA
| | - Atish Mohanty
- Department of Medical Oncology and Experimental Therapeutics, City of Hope National Medical Center, Duarte, CA, USA
| | - Ravi Salgia
- Department of Medical Oncology and Experimental Therapeutics, City of Hope National Medical Center, Duarte, CA, USA
| | - Aurora M Nedelcu
- Department of Biology, University of New Brunswick, Fredericton, Canada.
| | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA; Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy pereulok, 9, Dolgoprudny, Moscow region 141700, Russia.
| |
Collapse
|
6
|
Schultz CJ, Wu Y, Baumann U. A targeted bioinformatics approach identifies highly variable cell surface proteins that are unique to Glomeromycotina. MYCORRHIZA 2022; 32:45-66. [PMID: 35031894 PMCID: PMC8786786 DOI: 10.1007/s00572-021-01066-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 12/24/2021] [Indexed: 06/14/2023]
Abstract
Diversity in arbuscular mycorrhizal fungi (AMF) contributes to biodiversity and resilience in natural environments and healthy agricultural systems. Functional complementarity exists among species of AMF in symbiosis with their plant hosts, but the molecular basis of this is not known. We hypothesise this is in part due to the difficulties that current sequence assembly methodologies have assembling sequences for intrinsically disordered proteins (IDPs) due to their low sequence complexity. IDPs are potential candidates for functional complementarity because they often exist as extended (non-globular) proteins providing additional amino acids for molecular interactions. Rhizophagus irregularis arabinogalactan-protein-like proteins (AGLs) are small secreted IDPs with no known orthologues in AMF or other fungi. We developed a targeted bioinformatics approach to identify highly variable AGLs/IDPs in RNA-sequence datasets. The approach includes a modified multiple k-mer assembly approach (Oases) to identify candidate sequences, followed by targeted sequence capture and assembly (mirabait-mira). All AMF species analysed, including the ancestral family Paraglomeraceae, have small families of proteins rich in disorder promoting amino acids such as proline and glycine, or glycine and asparagine. Glycine- and asparagine-rich proteins also were found in Geosiphon pyriformis (an obligate symbiont of a cyanobacterium), from the same subphylum (Glomeromycotina) as AMF. The sequence diversity of AGLs likely translates to functional diversity, based on predicted physical properties of tandem repeats (elastic, amyloid, or interchangeable) and their broad pI ranges. We envisage that AGLs/IDPs could contribute to functional complementarity in AMF through processes such as self-recognition, retention of nutrients, soil stability, and water movement.
Collapse
Affiliation(s)
- Carolyn J Schultz
- School of Agriculture, Food, and Wine, Waite Research Institute, University of Adelaide, Adelaide, SA, Australia.
| | - Yue Wu
- School of Agriculture, Food, and Wine, Waite Research Institute, University of Adelaide, Adelaide, SA, Australia
| | - Ute Baumann
- School of Agriculture, Food, and Wine, Waite Research Institute, University of Adelaide, Adelaide, SA, Australia
| |
Collapse
|
7
|
Dasmeh P, Wagner A. Natural Selection on the Phase-Separation Properties of FUS during 160 My of Mammalian Evolution. Mol Biol Evol 2021; 38:940-951. [PMID: 33022038 PMCID: PMC7947763 DOI: 10.1093/molbev/msaa258] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Protein phase separation can help explain the formation of many nonmembranous organelles. However, we know little about its ability to change in evolution. Here we studied the evolution of the mammalian RNA-binding protein Fused in Sarcoma (FUS), a protein whose prion-like domain (PLD) contributes to the formation of stress granules through liquid–liquid phase separation. Although the PLD evolves three times as rapidly as the remainder of FUS, it harbors absolutely conserved tyrosine residues that are crucial for phase separation. Ancestral reconstruction shows that the phosphorylation sites within the PLD are subject to stabilizing selection. They toggle among a small number of amino acid states. One exception to this pattern is primates, where the number of such phosphosites has increased through positive selection. In addition, we find frequent glutamine to proline changes that help maintain the unstructured state of FUS that is necessary for phase separation. Our work provides evidence that natural selection has stabilized the liquid forming potential of FUS and minimized the propensity of cytotoxic liquid-to-solid phase transitions during 160 My of mammalian evolution.
Collapse
Affiliation(s)
- Pouria Dasmeh
- Institute for Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland.,Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Andreas Wagner
- Institute for Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
8
|
Trivedi R, Nagarajaram HA. Substitution scoring matrices for proteins - An overview. Protein Sci 2020; 29:2150-2163. [PMID: 32954566 DOI: 10.1002/pro.3954] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 09/17/2020] [Accepted: 09/18/2020] [Indexed: 01/17/2023]
Abstract
Sequence analysis is the primary and simplest approach to discover structural, functional and evolutionary details of related proteins. All the alignment based approaches of sequence analysis make use of amino acid substitution matrices, and the accuracy of the results largely depends on the type of scoring matrices used to perform alignment tasks. An amino acid substitution matrix is a 20 × 20 matrix in which the individual elements encapsulate the rates at which each of the 20 amino acid residues in proteins are substituted by other amino acid residues over time. In contrast to most globular/ordered proteins whose amino acids composition is considered as standard, there are several classes of proteins (e.g., transmembrane proteins) in which certain types of amino acid (e.g., hydrophobic residues) are enriched. These compositional differences among various classes of proteins are manifested in their underlying residue substitution frequencies. Therefore, each of the compositionally distinct class of proteins or protein segments should be studied using specific scoring matrices that reflect their distinct residue substitution pattern. In this review, we describe the development and application of various substitution scoring matrices peculiar to proteins with standard and biased compositions. Along with most commonly used standard matrices (PAM, BLOSUM, MD and VTML) that act as default parameters in various homologs search and alignment tools, different substitution scoring matrices specific to compositionally distinct class of proteins are discussed in detail.
Collapse
Affiliation(s)
- Rakesh Trivedi
- Laboratory of Computational Biology, Centre for DNA Fingerprinting and Diagnostics, Uppal, Hyderabad, Telangana, India.,Graduate School, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Hampapathalu Adimurthy Nagarajaram
- Laboratory of Computational Biology, Department of Systems and Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India.,Centre for Modelling, Simulation and Design, University of Hyderabad, Hyderabad, Telangana, India
| |
Collapse
|
9
|
Evolutionary Forces and Codon Bias in Different Flavors of Intrinsic Disorder in the Human Proteome. J Mol Evol 2019; 88:164-178. [DOI: 10.1007/s00239-019-09921-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Accepted: 11/26/2019] [Indexed: 12/22/2022]
|
10
|
Trivedi R, Nagarajaram HA. Amino acid substitution scoring matrices specific to intrinsically disordered regions in proteins. Sci Rep 2019; 9:16380. [PMID: 31704957 PMCID: PMC6841959 DOI: 10.1038/s41598-019-52532-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 10/15/2019] [Indexed: 01/09/2023] Open
Abstract
An amino acid substitution scoring matrix encapsulates the rates at which various amino acid residues in proteins are substituted by other amino acid residues, over time. Database search methods make use of substitution scoring matrices to identify sequences with homologous relationships. However, widely used substitution scoring matrices, such as BLOSUM series, have been developed using aligned blocks that are mostly devoid of disordered regions in proteins. Hence, these substitution-scoring matrices are mostly inappropriate for homology searches involving proteins enriched with disordered regions as the disordered regions have distinct amino acid compositional bias, and therefore expected to have undergone amino acid substitutions that are distinct from those in the ordered regions. We, therefore, developed a novel series of substitution scoring matrices referred to as EDSSMat by exclusively considering the substitution frequencies of amino acids in the disordered regions of the eukaryotic proteins. The newly developed matrices were tested for their ability to detect homologs of proteins enriched with disordered regions by means of SSEARCH tool. The results unequivocally demonstrate that EDSSMat matrices detect more number of homologs than the widely used BLOSUM, PAM and other standard matrices, indicating their utility value for homology searches of intrinsically disordered proteins.
Collapse
Affiliation(s)
- Rakesh Trivedi
- Laboratory of Computational Biology, Centre for DNA Fingerprinting and Diagnostics, Uppal, Hyderabad, Telangana, 500039, India
- Graduate School, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
| | - Hampapathalu Adimurthy Nagarajaram
- Department of Systems and Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, 500 046, India.
- Centre for Modelling, Simulation and Design, University of Hyderabad, Hyderabad, Telangana, 500 046, India.
| |
Collapse
|
11
|
Narasumani M, Harrison PM. Discerning evolutionary trends in post-translational modification and the effect of intrinsic disorder: Analysis of methylation, acetylation and ubiquitination sites in human proteins. PLoS Comput Biol 2018; 14:e1006349. [PMID: 30096183 PMCID: PMC6105011 DOI: 10.1371/journal.pcbi.1006349] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 08/22/2018] [Accepted: 07/07/2018] [Indexed: 11/18/2022] Open
Abstract
Intrinsically disordered regions (IDRs) of proteins play significant biological functional roles despite lacking a well-defined 3D structure. For example, IDRs provide efficient housing for large numbers of post-translational modification (PTM) sites in eukaryotic proteins. Here, we study the distribution of more than 15,000 experimentally determined human methylation, acetylation and ubiquitination sites (collectively termed 'MAU' sites) in ordered and disordered regions, and analyse their conservation across 380 eukaryotic species. Conservation signals for the maintenance and novel emergence of MAU sites are examined at 11 evolutionary levels from the whole eukaryotic domain down to the ape superfamily, in both ordered and disordered regions. We discover that MAU PTM is a major driver of conservation for arginines and lysines in both ordered and disordered regions, across the 11 levels, most significantly across the mammalian clade. Conservation of human methylatable arginines is very strongly favoured for ordered regions rather than for disordered, whereas methylatable lysines are conserved in either set of regions, and conservation of acetylatable and ubiquitinatable lysines is favoured in disordered over ordered. Notably, we find evidence for the emergence of new lysine MAU sites in disordered regions of proteins in deuterostomes and mammals, and in ordered regions after the dawn of eutherians. For histones specifically, MAU sites demonstrate an idiosyncratic significant conservation pattern that is evident since the last common ancestor of mammals. Similarly, folding-on-binding (FB) regions are highly enriched for MAU sites relative to either ordered or disordered regions, with ubiquitination sites in FBs being highly conserved at all evolutionary levels back as far as mammals. This investigation clearly demonstrates the complex patterns of PTM evolution across the human proteome and that it is necessary to consider conservation of sequence features at multiple evolutionary levels in order not to get an incomplete or misleading picture.
Collapse
|
12
|
Afanasyeva A, Bockwoldt M, Cooney CR, Heiland I, Gossmann TI. Human long intrinsically disordered protein regions are frequent targets of positive selection. Genome Res 2018; 28:975-982. [PMID: 29858274 PMCID: PMC6028134 DOI: 10.1101/gr.232645.117] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 06/01/2018] [Indexed: 12/20/2022]
Abstract
Intrinsically disordered regions occur frequently in proteins and are characterized by a lack of a well-defined three-dimensional structure. Although these regions do not show a higher order of structural organization, they are known to be functionally important. Disordered regions are rapidly evolving, largely attributed to relaxed purifying selection and an increased role of genetic drift. It has also been suggested that positive selection might contribute to their rapid diversification. However, for our own species, it is currently unknown whether positive selection has played a role during the evolution of these protein regions. Here, we address this question by investigating the evolutionary pattern of more than 6600 human proteins with intrinsically disordered regions and their ordered counterparts. Our comparative approach with data from more than 90 mammalian genomes uses a priori knowledge of disordered protein regions, and we show that this increases the power to detect positive selection by an order of magnitude. We can confirm that human intrinsically disordered regions evolve more rapidly, not only within humans but also across the entire mammalian phylogeny. They have, however, experienced substantial evolutionary constraint, hinting at their fundamental functional importance. We find compelling evidence that disordered protein regions are frequent targets of positive selection and estimate that the relative rate of adaptive substitutions differs fourfold between disordered and ordered protein regions in humans. Our results suggest that disordered protein regions are important targets of genetic innovation and that the contribution of positive selection in these regions is more pronounced than in other protein parts.
Collapse
Affiliation(s)
- Arina Afanasyeva
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S102TN, United Kingdom.,Institute of Nanobiotechnologies, Peter the Great St. Petersburg Polytechnic University, Saint-Petersburg 195251, Russia.,Petersburg Nuclear Physics Institute, B.P. Konstantinov NRC Kurchatov Institute, Gatchina, Leningrad District 188300, Russia.,National Institutes of Biomedical Innovation, Health and Nutrition, Ibaraki City, Osaka 567-0085, Japan
| | - Mathias Bockwoldt
- Department of Arctic and Marine Biology, UiT The Arctic University of Norway, 9037 Tromsø, Norway
| | - Christopher R Cooney
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S102TN, United Kingdom
| | - Ines Heiland
- Department of Arctic and Marine Biology, UiT The Arctic University of Norway, 9037 Tromsø, Norway
| | - Toni I Gossmann
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S102TN, United Kingdom
| |
Collapse
|
13
|
Saravanan KM, Dunker AK, Krishnaswamy S. Sequence fingerprints distinguish erroneous from correct predictions of intrinsically disordered protein regions. J Biomol Struct Dyn 2017; 36:4338-4351. [PMID: 29228892 DOI: 10.1080/07391102.2017.1415822] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
More than 60 prediction methods for intrinsically disordered proteins (IDPs) have been developed over the years, many of which are accessible on the World Wide Web. Nearly, all of these predictors give balanced accuracies in the ~65%-~80% range. Since predictors are not perfect, further studies are required to uncover the role of amino acid residues in native IDP as compared to predicted IDP regions. In the present work, we make use of sequences of 100% predicted IDP regions, false positive disorder predictions, and experimentally determined IDP regions to distinguish the characteristics of native versus predicted IDP regions. A higher occurrence of asparagine is observed in sequences of native IDP regions but not in sequences of false positive predictions of IDP regions. The occurrences of certain combinations of amino acids at the pentapeptide level provide a distinguishing feature in the IDPs with respect to globular proteins. The distinguishing features presented in this paper provide insights into the sequence fingerprints of amino acid residues in experimentally determined as compared to predicted IDP regions. These observations and additional work along these lines should enable the development of improvements in the accuracy of disorder prediction algorithm.
Collapse
Affiliation(s)
- Konda Mani Saravanan
- a Centre of Advanced Study in Crystallography & Biophysics , University of Madras , Guindy Campus, Chennai 600 025 , Tamilnadu , India
| | - A Keith Dunker
- b Centre for Computational Biology and Bioinformatics , Indiana University School of Medicine , Indianapolis , IN , USA
| | - Sankaran Krishnaswamy
- c Institute of Mathematical Sciences , CIT Campus, Tharamani , Chennai 600 113 , Tamilnadu , India
| |
Collapse
|
14
|
Ahrens JB, Nunez-Castilla J, Siltberg-Liberles J. Evolution of intrinsic disorder in eukaryotic proteins. Cell Mol Life Sci 2017; 74:3163-3174. [PMID: 28597295 PMCID: PMC11107722 DOI: 10.1007/s00018-017-2559-0] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 06/01/2017] [Indexed: 12/23/2022]
Abstract
Conformational flexibility conferred though regions of intrinsic structural disorder allows proteins to behave as dynamic molecules. While it is well-known that intrinsically disordered regions can undergo disorder-to-order transitions in real-time as part of their function, we also are beginning to learn more about the dynamics of disorder-to-order transitions along evolutionary time-scales. Intrinsically disordered regions endow proteins with functional promiscuity, which is further enhanced by the ability of some of these regions to undergo real-time disorder-to-order transitions. Disorder content affects gene retention after whole genome duplication, but it is not necessarily conserved. Altered patterns of disorder resulting from evolutionary disorder-to-order transitions indicate that disorder evolves to modify function through refining stability, regulation, and interactions. Here, we review the evolution of intrinsically disordered regions in eukaryotic proteins. We discuss the interplay between secondary structure and disorder on evolutionary time-scales, the importance of disorder for eukaryotic proteome expansion and functional divergence, and the evolutionary dynamics of disorder.
Collapse
Affiliation(s)
- Joseph B Ahrens
- Department of Biological Sciences, Biomolecular Sciences Institute, Florida International University, 11200 SW 8th St, Miami, FL, 33199, USA
| | - Janelle Nunez-Castilla
- Department of Biological Sciences, Biomolecular Sciences Institute, Florida International University, 11200 SW 8th St, Miami, FL, 33199, USA
| | - Jessica Siltberg-Liberles
- Department of Biological Sciences, Biomolecular Sciences Institute, Florida International University, 11200 SW 8th St, Miami, FL, 33199, USA.
| |
Collapse
|
15
|
Johnson KL, Cassin AM, Lonsdale A, Bacic A, Doblin MS, Schultz CJ. Pipeline to Identify Hydroxyproline-Rich Glycoproteins. PLANT PHYSIOLOGY 2017; 174:886-903. [PMID: 28446635 PMCID: PMC5462032 DOI: 10.1104/pp.17.00294] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2017] [Accepted: 04/21/2017] [Indexed: 05/14/2023]
Abstract
Intrinsically disordered proteins (IDPs) are functional proteins that lack a well-defined three-dimensional structure. The study of IDPs is a rapidly growing area as the crucial biological functions of more of these proteins are uncovered. In plants, IDPs are implicated in plant stress responses, signaling, and regulatory processes. A superfamily of cell wall proteins, the hydroxyproline-rich glycoproteins (HRGPs), have characteristic features of IDPs. Their protein backbones are rich in the disordering amino acid proline, they contain repeated sequence motifs and extensive posttranslational modifications (glycosylation), and they have been implicated in many biological functions. HRGPs are evolutionarily ancient, having been isolated from the protein-rich walls of chlorophyte algae to the cellulose-rich walls of embryophytes. Examination of HRGPs in a range of plant species should provide valuable insights into how they have evolved. Commonly divided into the arabinogalactan proteins, extensins, and proline-rich proteins, in reality, a continuum of structures exists within this diverse and heterogenous superfamily. An inability to accurately classify HRGPs leads to inconsistent gene ontologies limiting the identification of HRGP classes in existing and emerging omics data sets. We present a novel and robust motif and amino acid bias (MAAB) bioinformatics pipeline to classify HRGPs into 23 descriptive subclasses. Validation of MAAB was achieved using available genomic resources and then applied to the 1000 Plants transcriptome project (www.onekp.com) data set. Significant improvement in the detection of HRGPs using multiple-k-mer transcriptome assembly methodology was observed. The MAAB pipeline is readily adaptable and can be modified to optimize the recovery of IDPs from other organisms.
Collapse
Affiliation(s)
- Kim L Johnson
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| | - Andrew M Cassin
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| | - Andrew Lonsdale
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| | - Antony Bacic
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| | - Monika S Doblin
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| | - Carolyn J Schultz
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| |
Collapse
|
16
|
Trujillo JT, Beilstein MA, Mosher RA. The Argonaute-binding platform of NRPE1 evolves through modulation of intrinsically disordered repeats. THE NEW PHYTOLOGIST 2016; 212:1094-1105. [PMID: 27431917 PMCID: PMC5125548 DOI: 10.1111/nph.14089] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Accepted: 06/04/2016] [Indexed: 05/26/2023]
Abstract
Argonaute (Ago) proteins are important effectors in RNA silencing pathways, but they must interact with other machinery to trigger silencing. Ago hooks have emerged as a conserved motif responsible for interaction with Ago proteins, but little is known about the sequence surrounding Ago hooks that must restrict or enable interaction with specific Argonautes. Here we investigated the evolutionary dynamics of an Ago-binding platform in NRPE1, the largest subunit of RNA polymerase V. We compared NRPE1 sequences from > 50 species, including dense sampling of two plant lineages. This study demonstrates that the Ago-binding platform of NRPE1 retains Ago hooks, intrinsic disorder, and repetitive character while being highly labile at the sequence level. We reveal that loss of sequence conservation is the result of relaxed selection and frequent expansions and contractions of tandem repeat arrays. These factors allow a complete restructuring of the Ago-binding platform over 50-60 million yr. This evolutionary pattern is also detected in a second Ago-binding platform, suggesting it is a general mechanism. The presence of labile repeat arrays in all analyzed NRPE1 Ago-binding platforms indicates that selection maintains repetitive character, potentially to retain the ability to rapidly restructure the Ago-binding platform.
Collapse
Affiliation(s)
- Joshua T Trujillo
- The School of Plant Sciences, The University of Arizona, Tucson, AZ, 85721-0036, USA
| | - Mark A Beilstein
- The School of Plant Sciences, The University of Arizona, Tucson, AZ, 85721-0036, USA
| | - Rebecca A Mosher
- The School of Plant Sciences, The University of Arizona, Tucson, AZ, 85721-0036, USA
| |
Collapse
|
17
|
Abstract
Repeats are ubiquitous elements of proteins and they play important roles for cellular function and during evolution. Repeats are, however, also notoriously difficult to capture computationally and large scale studies so far had difficulties in linking genetic causes, structural properties and evolutionary trajectories of protein repeats. Here we apply recently developed methods for repeat detection and analysis to a large dataset comprising over hundred metazoan genomes. We find that repeats in larger protein families experience generally very few insertions or deletions (indels) of repeat units but there is also a significant fraction of noteworthy volatile outliers with very high indel rates. Analysis of structural data indicates that repeats with an open structure and independently folding units are more volatile and more likely to be intrinsically disordered. Such disordered repeats are also significantly enriched in sites with a high functional potential such as linear motifs. Furthermore, the most volatile repeats have a high sequence similarity between their units. Since many volatile repeats also show signs of recombination, we conclude they are often shaped by concerted evolution. Intriguingly, many of these conserved yet volatile repeats are involved in host-pathogen interactions where they might foster fast but subtle adaptation in biological arms races. KEY WORDS: protein evolution, domain rearrangements, protein repeats, concerted evolution.
Collapse
Affiliation(s)
- Andreas Schüler
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University, Huefferstrasse 1, Muenster, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University, Huefferstrasse 1, Muenster, Germany
| |
Collapse
|
18
|
Chi PB, Liberles DA. Selection on protein structure, interaction, and sequence. Protein Sci 2016; 25:1168-78. [PMID: 26808055 PMCID: PMC4918422 DOI: 10.1002/pro.2886] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Revised: 01/18/2016] [Accepted: 01/19/2016] [Indexed: 11/10/2022]
Abstract
Characterizing the probabilities of observing amino acid substitutions at specific sites in a protein over evolutionary time is a major goal in the field of molecular evolution. While purely statistical approaches at different levels of complexity exist, approaches rooted in underlying biological processes are necessary to characterize both the context-dependence of sequence changes (epistasis) and to extrapolate to sequences not observed in biological databases. To develop such approaches, an understanding of the different selective forces that act on amino acid substitution is necessary. Here, an overview of selection on and corresponding modeling of folding stability, folding specificity, binding affinity and specificity for ligands, the evolution of new binding sites on protein surfaces, protein dynamics, intrinsic disorder, and protein aggregation as well as the interplay with protein expression level (concentration) and biased mutational processes are presented.
Collapse
Affiliation(s)
- Peter B Chi
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122
- Department of Mathematics and Computer Science, Ursinus College, Collegeville, Pennsylvania, 19426
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122
| |
Collapse
|
19
|
Houchmandzadeh B, Vallade M. A Simple, General Result for the Variance of Substitution Number in Molecular Evolution. Mol Biol Evol 2016; 33:1858-69. [PMID: 27189545 PMCID: PMC4915360 DOI: 10.1093/molbev/msw063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The number of substitutions (of nucleotides, amino acids, etc.) that take place during the evolution of a sequence is a stochastic variable of fundamental importance in the field of molecular evolution. Although the mean number of substitutions during molecular evolution of a sequence can be estimated for a given substitution model, no simple solution exists for the variance of this random variable. We show in this article that the computation of the variance is as simple as that of the mean number of substitutions for both short and long times. Apart from its fundamental importance, this result can be used to investigate the dispersion index R, that is, the ratio of the variance to the mean substitution number, which is of prime importance in the neutral theory of molecular evolution. By investigating large classes of substitution models, we demonstrate that although R≥1, to obtain R significantly larger than unity necessitates in general additional hypotheses on the structure of the substitution model.
Collapse
|
20
|
Bioinformatical parsing of folding-on-binding proteins reveals their compositional and evolutionary sequence design. Sci Rep 2015; 5:18586. [PMID: 26678310 PMCID: PMC4683461 DOI: 10.1038/srep18586] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 11/20/2015] [Indexed: 12/29/2022] Open
Abstract
Intrinsic disorder occurs when (part of) a protein remains unfolded during normal functioning. Intrinsically-disordered regions can contain segments that ‘fold on binding’ to another molecule. Here, we perform bioinformatical parsing of human ‘folding-on-binding’ (FB) proteins, into four subsets: Ordered regions, FB regions, Disordered regions that surround FB regions (‘Disordered-around-FB’), and Other-Disordered regions. We examined the composition and evolutionary behaviour (across vertebrate orthologs) of these subsets. From a convergence of three separate analyses, we find that for hydrophobicity, Ordered regions segregate from the other subsets, but the Ordered and FB regions group together as highly conserved, and the Disordered-around-FB and Other-Disordered regions as less conserved (with a lesser significant difference between Ordered and FB regions). FB regions are highly-conserved with net positive charge, whereas Disordered-around-FB have net negative charge and are relatively less hydrophobic than FB regions. Indeed, these Disordered-around-FB regions are excessively hydrophilic compared to other disordered regions generally. We describe how our results point towards a possible compositionally-based steering mechanism of folding-on-binding.
Collapse
|
21
|
Vidhyasagar V, He Y, Guo M, Ding H, Talwar T, Nguyen V, Nwosu J, Katselis G, Wu Y. C-termini are essential and distinct for nucleic acid binding of human NABP1 and NABP2. Biochim Biophys Acta Gen Subj 2015; 1860:371-83. [PMID: 26550690 DOI: 10.1016/j.bbagen.2015.11.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Revised: 10/30/2015] [Accepted: 11/04/2015] [Indexed: 01/03/2023]
Abstract
BACKGROUND Human Nucleic Acid Binding Protein 1 and 2 (hNABP1 and 2; also known as hSSB2 and 1, respectively) are two newly identified single-stranded (ss) DNA binding proteins (SSB). Both NABP1 and NABP2 have a conserved oligonucleotide/oligosaccharide-binding (OB)-fold domain and a divergent carboxy-terminal domain, the functional importance of which is unknown. METHODS Recombinant hNABP1/2 proteins were purified using affinity and size exclusion chromatography and their identities confirmed by mass spectrometry. Oligomerization state was checked by sucrose gradient centrifugation. Secondary structure was determined by circular dichroism spectroscopy. Nucleic acid binding ability was examined by EMSA and ITC. RESULTS Both hNABP1 and hNABP2 exist as monomers in solution; however, hNABP2 exhibits anomalous behavior. CD spectroscopy revealed that the C-terminus of hNABP2 is highly disordered. Deletion of the C-terminal tail diminishes the DNA binding ability and protein stability of hNABP2. Although both hNABP1 and hNABP2 prefer to bind ssDNA than double-stranded (ds) DNA, hNABP1 has a higher affinity for ssDNA than hNABP2. Unlike hNABP2, hNABP1 protein binds and multimerizes on ssDNA with the C-terminal tail responsible for its multimerization. Both hNABP1 and hNABP2 are able to bind single-stranded RNA, with hNABP2 having a higher affinity than hNABP1. CONCLUSIONS Biochemical evidence suggests that the C-terminal region of NABP1 and NABP2 is essential for their functionality and may lead to different roles in DNA and RNA metabolism. GENERAL SIGNIFICANCE This is the first report demonstrating the regulation and functional properties of the C-terminal domain of hNABP1/2, which might be a general characteristic of OB-fold proteins.
Collapse
Affiliation(s)
- Venkatasubramanian Vidhyasagar
- Department of Biochemistry, University of Saskatchewan, Health Sciences Building, 107 Wiggins Road, Saskatoon, Saskatchewan S7N 5E5, Canada
| | - Yujiong He
- Department of Biochemistry, University of Saskatchewan, Health Sciences Building, 107 Wiggins Road, Saskatoon, Saskatchewan S7N 5E5, Canada
| | - Manhong Guo
- Department of Biochemistry, University of Saskatchewan, Health Sciences Building, 107 Wiggins Road, Saskatoon, Saskatchewan S7N 5E5, Canada
| | - Hao Ding
- Department of Biochemistry, University of Saskatchewan, Health Sciences Building, 107 Wiggins Road, Saskatoon, Saskatchewan S7N 5E5, Canada
| | - Tanu Talwar
- Department of Biochemistry, University of Saskatchewan, Health Sciences Building, 107 Wiggins Road, Saskatoon, Saskatchewan S7N 5E5, Canada
| | - Vi Nguyen
- Department of Biochemistry, University of Saskatchewan, Health Sciences Building, 107 Wiggins Road, Saskatoon, Saskatchewan S7N 5E5, Canada
| | - Jessica Nwosu
- Department of Biochemistry, University of Saskatchewan, Health Sciences Building, 107 Wiggins Road, Saskatoon, Saskatchewan S7N 5E5, Canada
| | - George Katselis
- Department of Medicine, University of Saskatchewan, Health Sciences Building, 107 Wiggins Road, Saskatoon, Saskatchewan S7N 5E5, Canada; Canadian Centre for Health and Safety in Agriculture, University of Saskatchewan, Health Sciences Building, 107 Wiggins Road, Saskatoon, Saskatchewan S7N 5E5, Canada
| | - Yuliang Wu
- Department of Biochemistry, University of Saskatchewan, Health Sciences Building, 107 Wiggins Road, Saskatoon, Saskatchewan S7N 5E5, Canada.
| |
Collapse
|
22
|
Khan T, Douglas GM, Patel P, Nguyen Ba AN, Moses AM. Polymorphism Analysis Reveals Reduced Negative Selection and Elevated Rate of Insertions and Deletions in Intrinsically Disordered Protein Regions. Genome Biol Evol 2015; 7:1815-26. [PMID: 26047845 PMCID: PMC4494057 DOI: 10.1093/gbe/evv105] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Intrinsically disordered protein regions are abundant in eukaryotic proteins and lack stable tertiary structures and enzymatic functions. Previous studies of disordered region evolution based on interspecific alignments have revealed an increased propensity for indels and rapid rates of amino acid substitution. How disordered regions are maintained at high abundance in the proteome and across taxa, despite apparently weak evolutionary constraints, remains unclear. Here, we use single nucleotide and indel polymorphism data in yeast and human populations to survey the population variation within disordered regions. First, we show that single nucleotide polymorphisms in disordered regions are under weaker negative selection compared with more structured protein regions and have a higher proportion of neutral non-synonymous sites. We also confirm previous findings that nonframeshifting indels are much more abundant in disordered regions relative to structured regions. We find that the rate of nonframeshifting indel polymorphism in intrinsically disordered regions resembles that of noncoding DNA and pseudogenes, and that large indels segregate in disordered regions in the human population. Our survey of polymorphism confirms patterns of evolution in disordered regions inferred based on longer evolutionary comparisons.
Collapse
Affiliation(s)
- Tahsin Khan
- Department of Cell & Systems Biology, University of Toronto, Ontario, Canada
| | - Gavin M Douglas
- Department of Ecology & Evolutionary Biology, University of Toronto, Ontario, Canada
| | - Priyenbhai Patel
- Department of Cell & Systems Biology, University of Toronto, Ontario, Canada
| | - Alex N Nguyen Ba
- Department of Cell & Systems Biology, University of Toronto, Ontario, Canada
| | - Alan M Moses
- Department of Cell & Systems Biology, University of Toronto, Ontario, Canada Department of Ecology & Evolutionary Biology, University of Toronto, Ontario, Canada Centre for the Analysis of Genome Evolution and Function, University of Toronto, Ontario, Canada
| |
Collapse
|
23
|
Limongelli I, Marini S, Bellazzi R. PaPI: pseudo amino acid composition to score human protein-coding variants. BMC Bioinformatics 2015; 16:123. [PMID: 25928477 PMCID: PMC4411653 DOI: 10.1186/s12859-015-0554-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2014] [Accepted: 01/15/2015] [Indexed: 12/31/2022] Open
Abstract
Background High throughput sequencing technologies are able to identify the whole genomic variation of an individual. Gene-targeted and whole-exome experiments are mainly focused on coding sequence variants related to a single or multiple nucleotides. The analysis of the biological significance of this multitude of genomic variant is challenging and computational demanding. Results We present PaPI, a new machine-learning approach to classify and score human coding variants by estimating the probability to damage their protein-related function. The novelty of this approach consists in using pseudo amino acid composition through which wild and mutated protein sequences are represented in a discrete model. A machine learning classifier has been trained on a set of known deleterious and benign coding variants with the aim to score unobserved variants by taking into account hidden sequence patterns in human genome potentially leading to diseases. We show how the combination of amphiphilic pseudo amino acid composition, evolutionary conservation and homologous proteins based methods outperforms several prediction algorithms and it is also able to score complex variants such as deletions, insertions and indels. Conclusions This paper describes a machine-learning approach to predict the deleteriousness of human coding variants. A freely available web application (http://papi.unipv.it) has been developed with the presented method, able to score up to thousands variants in a single run. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0554-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ivan Limongelli
- IRCCS Policlinico S. Matteo, Pzz.le Volontari del Sangue 2, 27100, Pavia, Italy. .,Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 1, 27100, Pavia, Italy.
| | - Simone Marini
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 1, 27100, Pavia, Italy.
| | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 1, 27100, Pavia, Italy.
| |
Collapse
|
24
|
DBC1/CCAR2 and CCAR1 Are Largely Disordered Proteins that Have Evolved from One Common Ancestor. BIOMED RESEARCH INTERNATIONAL 2014; 2014:418458. [PMID: 25610865 PMCID: PMC4287135 DOI: 10.1155/2014/418458] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2014] [Revised: 09/18/2014] [Accepted: 09/18/2014] [Indexed: 01/07/2023]
Abstract
Deleted in breast cancer 1 (DBC1, CCAR2, KIAA1967) is a large, predominantly nuclear, multidomain protein that modulates gene expression by inhibiting several epigenetic modifiers, including the deacetylases SIRT1 and HDAC3, and the methyltransferase SUV39H1. DBC1 shares many highly conserved protein domains with its paralog cell cycle and apoptosis regulator 1 (CCAR1, CARP-1). In this study, we examined the full-length sequential and structural properties of DBC1 and CCAR1 from multiple species and correlated these properties with evolution. Our data shows that the conserved domains shared between DBC1 and CCAR1 have similar domain structures, as well as similar patterns of predicted disorder in less-conserved intrinsically disordered regions. Our analysis indicates similarities between DBC1, CCAR1, and the nematode protein lateral signaling target 3 (LST-3), suggesting that DBC1 and CCAR1 may have evolved from LST-3. Our data also suggests that DBC1 emerged later in evolution than CCAR1. DBC1 contains regions that show less conservation across species as compared to the same regions in CCAR1, suggesting a continuously evolving scenario for DBC1. Overall, this study provides insight into the structure and evolution of DBC1 and CCAR1, which may impact future studies on the biological functions of these proteins.
Collapse
|
25
|
Schaper E, Gascuel O, Anisimova M. Deep conservation of human protein tandem repeats within the eukaryotes. Mol Biol Evol 2014; 31:1132-48. [PMID: 24497029 PMCID: PMC3995336 DOI: 10.1093/molbev/msu062] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Tandem repeats (TRs) are a major element of protein sequences in all domains of life. They are particularly abundant in mammals, where by conservative estimates one in three proteins contain a TR. High generation-scale duplication and deletion rates were reported for nucleic TR units. However, it is not known whether protein TR units can also be frequently lost or gained providing a source of variation for rapid adaptation of protein function, or alternatively, tend to have conserved TR unit configurations over long evolutionary times. To obtain a systematic picture, we performed a proteome-wide analysis of the mode of evolution for human protein TRs. For this purpose, we propose a novel method for the detection of orthologous TRs based on circular profile hidden Markov models. For all detected TRs, we reconstructed bispecies TR unit phylogenies across 61 eukaryotes ranging from human to yeast. Moreover, we performed additional analyses to correlate functional and structural annotations of human TRs with their mode of evolution. Surprisingly, we find that the vast majority of human TRs are ancient, with TR unit number and order preserved intact since distant speciation events. For example, ≥61% of all human TRs have been strongly conserved at least since the root of all mammals, approximately 300 Ma. Further, we find no human protein TR that shows evidence for strong recent duplications and deletions. The results are in contrast to the high generation-scale mutability of nucleic TRs. Presumably, most protein TRs fold into stable and conserved structures that are indispensable for the function of the TR-containing protein. All of our data and results are available for download from http://www.atgc-montpellier.fr/TRE.
Collapse
Affiliation(s)
- Elke Schaper
- Department of Computer Science, ETH Zürich, Zürich, Switzerland
| | | | | |
Collapse
|
26
|
Peeters N, Carrère S, Anisimova M, Plener L, Cazalé AC, Genin S. Repertoire, unified nomenclature and evolution of the Type III effector gene set in the Ralstonia solanacearum species complex. BMC Genomics 2013; 14:859. [PMID: 24314259 PMCID: PMC3878972 DOI: 10.1186/1471-2164-14-859] [Citation(s) in RCA: 139] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Accepted: 11/29/2013] [Indexed: 12/21/2022] Open
Abstract
Background Ralstonia solanacearum is a soil-borne beta-proteobacterium that causes bacterial wilt disease in many food crops and is a major problem for agriculture in intertropical regions. R. solanacearum is a heterogeneous species, both phenotypically and genetically, and is considered as a species complex. Pathogenicity of R. solanacearum relies on the Type III secretion system that injects Type III effector (T3E) proteins into plant cells. T3E collectively perturb host cell processes and modulate plant immunity to enable bacterial infection. Results We provide the catalogue of T3E in the R. solanacearum species complex, as well as candidates in newly sequenced strains. 94 T3E orthologous groups were defined on phylogenetic bases and ordered using a uniform nomenclature. This curated T3E catalog is available on a public website and a bioinformatic pipeline has been designed to rapidly predict T3E genes in newly sequenced strains. Systematical analyses were performed to detect lateral T3E gene transfer events and identify T3E genes under positive selection. Our analyses also pinpoint the RipF translocon proteins as major discriminating determinants among the phylogenetic lineages. Conclusions Establishment of T3E repertoires in strains representatives of the R. solanacearum biodiversity allowed determining a set of 22 T3E present in all the strains but provided no clues on host specificity determinants. The definition of a standardized nomenclature and the optimization of predictive tools will pave the way to understanding how variation of these repertoires is correlated to the diversification of this species complex and how they contribute to the different strain pathotypes.
Collapse
Affiliation(s)
- Nemo Peeters
- INRA, Laboratoire des Interactions Plantes-Microorganismes (LIPM), UMR441, F-31326 Castanet-Tolosan, France.
| | | | | | | | | | | |
Collapse
|
27
|
Szalkowski AM, Anisimova M. Graph-based modeling of tandem repeats improves global multiple sequence alignment. Nucleic Acids Res 2013; 41:e162. [PMID: 23877246 PMCID: PMC3783189 DOI: 10.1093/nar/gkt628] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Tandem repeats (TRs) are often present in proteins with crucial functions, responsible for resistance, pathogenicity and associated with infectious or neurodegenerative diseases. This motivates numerous studies of TRs and their evolution, requiring accurate multiple sequence alignment. TRs may be lost or inserted at any position of a TR region by replication slippage or recombination, but current methods assume fixed unit boundaries, and yet are of high complexity. We present a new global graph-based alignment method that does not restrict TR unit indels by unit boundaries. TR indels are modeled separately and penalized using the phylogeny-aware alignment algorithm. This ensures enhanced accuracy of reconstructed alignments, disentangling TRs and measuring indel events and rates in a biologically meaningful way. Our method detects not only duplication events but also all changes in TR regions owing to recombination, strand slippage and other events inserting or deleting TR units. We evaluate our method by simulation incorporating TR evolution, by either sampling TRs from a profile hidden Markov model or by mimicking strand slippage with duplications. The new method is illustrated on a family of type III effectors, a pathogenicity determinant in agriculturally important bacteria Ralstonia solanacearum. We show that TR indel rate variation contributes to the diversification of this protein family.
Collapse
Affiliation(s)
- Adam M Szalkowski
- Swiss Institute of Bioinformatics, Quartier Sorge Batiment Genopode, 1015 Lausanne, Switzerland and Department of Computer Science, ETH Zürich, Universitätstrasse 6, 8092 Zürich, Switzerland
| | | |
Collapse
|
28
|
Chemes LB, Glavina J, Alonso LG, Marino-Buslje C, de Prat-Gay G, Sánchez IE. Sequence evolution of the intrinsically disordered and globular domains of a model viral oncoprotein. PLoS One 2012; 7:e47661. [PMID: 23118886 PMCID: PMC3485249 DOI: 10.1371/journal.pone.0047661] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2012] [Accepted: 09/14/2012] [Indexed: 12/11/2022] Open
Abstract
In the present work, we have used the papillomavirus E7 oncoprotein to pursue structure-function and evolutionary studies that take into account intrinsic disorder and the conformational diversity of globular domains. The intrinsically disordered (E7N) and globular (E7C) domains of E7 show similar degrees of conservation and co-evolution. We found that E7N can be described in terms of conserved and coevolving linear motifs separated by variable linkers, while sequence evolution of E7C is compatible with the known homodimeric structure yet suggests other activities for the domain. Within E7N, inter-residue relationships such as residue co-evolution and restricted intermotif distances map functional coupling and co-occurrence of linear motifs that evolve in a coordinate manner. Within E7C, additional cysteine residues proximal to the zinc-binding site may allow redox regulation of E7 function. Moreover, we describe a conserved binding site for disordered domains on the surface of E7C and suggest a putative target linear motif. Both homodimerization and peptide binding activities of E7C are also present in the distantly related host PHD domains, showing that these two proteins share not only structural homology but also functional similarities, and strengthening the view that they evolved from a common ancestor. Finally, we integrate the multiple activities and conformations of E7 into a hierarchy of structure-function relationships.
Collapse
Affiliation(s)
- Lucía B. Chemes
- Protein Structure-Function and Engineering Laboratory, Fundación Instituto Leloir and IIBBA-CONICET, Buenos Aires, Argentina
| | - Juliana Glavina
- Protein Physiology Laboratory, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Buenos Aires, Argentina
| | - Leonardo G. Alonso
- Protein Structure-Function and Engineering Laboratory, Fundación Instituto Leloir and IIBBA-CONICET, Buenos Aires, Argentina
| | - Cristina Marino-Buslje
- Structural Bioinformatics Laboratory. Fundación Instituto Leloir and IIBBA-CONICET, Buenos Aires, Argentina
| | - Gonzalo de Prat-Gay
- Protein Structure-Function and Engineering Laboratory, Fundación Instituto Leloir and IIBBA-CONICET, Buenos Aires, Argentina
| | - Ignacio E. Sánchez
- Protein Physiology Laboratory, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Buenos Aires, Argentina
| |
Collapse
|
29
|
Huang H, Sarai A. Analysis of the relationships between evolvability, thermodynamics, and the functions of intrinsically disordered proteins/regions. Comput Biol Chem 2012; 41:51-7. [PMID: 23153654 DOI: 10.1016/j.compbiolchem.2012.10.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2012] [Revised: 08/23/2012] [Accepted: 10/11/2012] [Indexed: 01/08/2023]
Abstract
The evolvability of proteins is not only restricted by functional and structural importance, but also by other factors such as gene duplication, protein stability, and an organism's robustness. Recently, intrinsically disordered proteins (IDPs)/regions (IDRs) have been suggested to play a role in facilitating protein evolution. However, the mechanisms by which this occurs remain largely unknown. To address this, we have systematically analyzed the relationship between the evolvability, stability, and function of IDPs/IDRs. Evolutionary analysis shows that more recently emerged IDRs have higher evolutionary rates with more functional constraints relaxed (or experiencing more positive selection), and that this may have caused accelerated evolution in the flanking regions and in the whole protein. A systematic analysis of observed stability changes due to single amino acid mutations in IDRs and ordered regions shows that while most mutations induce a destabilizing effect in proteins, mutations in IDRs cause smaller stability changes than in ordered regions. The weaker impact of mutations in IDRs on protein stability may have advantages for protein evolvability in the gain of new functions. Interestingly, however, an analysis of functional motifs in the PROSITE and ELM databases showed that motifs in IDRs are more conserved, characterized by smaller entropy and lower evolutionary rate, than in ordered regions. This apparently opposing evolutionary effect may be partly due to the flexible nature of motifs in IDRs, which require some key amino acid residues to engage in tighter interactions with other molecules. Our study suggests that the unique conformational and thermodynamic characteristics of IDPs/IDRs play an important role in the evolvability of proteins to gain new functions.
Collapse
Affiliation(s)
- He Huang
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
| | | |
Collapse
|
30
|
Szalkowski AM. Fast and robust multiple sequence alignment with phylogeny-aware gap placement. BMC Bioinformatics 2012; 13:129. [PMID: 22694311 PMCID: PMC3495709 DOI: 10.1186/1471-2105-13-129] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2011] [Accepted: 06/13/2012] [Indexed: 12/02/2022] Open
Abstract
Background ProGraphMSA is a state-of-the-art multiple sequence alignment tool which produces phylogenetically sensible gap patterns while maintaining robustness by allowing alternative splicings and errors in the branching pattern of the guide tree. Results This is achieved by incorporating a graph-based sequence representation combined with the advantages of the phylogeny-aware gap placement algorithm of Prank. Further, we account for variations in the substitution pattern by implementing context-specific profiles as in CS-Blast and by estimating amino acid frequencies from input data. Conclusions ProGraphMSA shows good performance and competitive execution times in various benchmarks.
Collapse
Affiliation(s)
- Adam M Szalkowski
- Department of Computer Science, ETH Zürich, Universitätstrasse, Switzerland.
| |
Collapse
|
31
|
Kipnis Y, Dellus-Gur E, Tawfik DS. TRINS: a method for gene modification by randomized tandem repeat insertions. Protein Eng Des Sel 2012; 25:437-44. [DOI: 10.1093/protein/gzs023] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
|
32
|
Dalquen DA, Anisimova M, Gonnet GH, Dessimoz C. ALF--a simulation framework for genome evolution. Mol Biol Evol 2011; 29:1115-23. [PMID: 22160766 PMCID: PMC3341827 DOI: 10.1093/molbev/msr268] [Citation(s) in RCA: 111] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
In computational evolutionary biology, verification and benchmarking is a challenging task because the evolutionary history of studied biological entities is usually not known. Computer programs for simulating sequence evolution in silico have shown to be viable test beds for the verification of newly developed methods and to compare different algorithms. However, current simulation packages tend to focus either on gene-level aspects of genome evolution such as character substitutions and insertions and deletions (indels) or on genome-level aspects such as genome rearrangement and speciation events. Here, we introduce Artificial Life Framework (ALF), which aims at simulating the entire range of evolutionary forces that act on genomes: nucleotide, codon, or amino acid substitution (under simple or mixture models), indels, GC-content amelioration, gene duplication, gene loss, gene fusion, gene fission, genome rearrangement, lateral gene transfer (LGT), or speciation. The other distinctive feature of ALF is its user-friendly yet powerful web interface. We illustrate the utility of ALF with two possible applications: 1) we reanalyze data from a study of selection after globin gene duplication and test the statistical significance of the original conclusions and 2) we demonstrate that LGT can dramatically decrease the accuracy of two well-established orthology inference methods. ALF is available as a stand-alone application or via a web interface at http://www.cbrg.ethz.ch/alf.
Collapse
Affiliation(s)
- Daniel A Dalquen
- Computational Biochemistry Research Group, Department of Computer Science, ETH Zurich, Universitätstrasse 6, Zürich, Switzerland.
| | | | | | | |
Collapse
|
33
|
The evolution of protein structures and structural ensembles under functional constraint. Genes (Basel) 2011; 2:748-62. [PMID: 24710290 PMCID: PMC3927589 DOI: 10.3390/genes2040748] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2011] [Revised: 10/15/2011] [Accepted: 10/19/2011] [Indexed: 02/06/2023] Open
Abstract
Protein sequence, structure, and function are inherently linked through evolution and population genetics. Our knowledge of protein structure comes from solved structures in the Protein Data Bank (PDB), our knowledge of sequence through sequences found in the NCBI sequence databases (http://www.ncbi.nlm.nih.gov/), and our knowledge of function through a limited set of in-vitro biochemical studies. How these intersect through evolution is described in the first part of the review. In the second part, our understanding of a series of questions is addressed. This includes how sequences evolve within structures, how evolutionary processes enable structural transitions, how the folding process can change through evolution and what the fitness impacts of this might be. Moving beyond static structures, the evolution of protein kinetics (including normal modes) is discussed, as is the evolution of conformational ensembles and structurally disordered proteins. This ties back to a question of the role of neostructuralization and how it relates to selection on sequences for functions. The relationship between metastability, the fitness landscape, sequence divergence, and organismal effective population size is explored. Lastly, a brief discussion of modeling the evolution of sequences of ordered and disordered proteins is entertained.
Collapse
|