1
|
Halpin JC, Keating AE. PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.23.604860. [PMID: 39091826 PMCID: PMC11291154 DOI: 10.1101/2024.07.23.604860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Protein-protein interactions are often mediated by a modular peptide recognition domain binding to a short linear motif (SLiM) in the disordered region of another protein. The ability to predict domain-SLiM interactions would allow researchers to map protein interaction networks, predict the effects of perturbations to those networks, and develop biologically meaningful hypotheses. Unfortunately, sequence database searches for SLiMs generally yield mostly biologically irrelevant motif matches or false positives. To improve the prediction of novel SLiM interactions, researchers employ filters to discriminate between biologically relevant and improbable motif matches. One promising criterion for identifying biologically relevant SLiMs is the sequence conservation of the motif, exploiting the fact that functional motifs are more likely to be conserved than spurious motif matches. However, the difficulty of aligning disordered regions has significantly hampered the utility of this approach. We present PairK (pairwise k-mer alignment), an MSA-free method to quantify motif conservation in disordered regions. PairK outperforms both standard MSA-based conservation scores and a modern LLM-based conservation score predictor on the task of identifying biologically important motif instances. PairK can quantify conservation over wider phylogenetic distances than MSAs, indicating that SLiMs may be more conserved than is implied by MSA-based metrics. PairK is available as open-source code at https://github.com/jacksonh1/pairk.
Collapse
Affiliation(s)
- Jackson C. Halpin
- MIT Department of Biology, 77 Massachusetts Ave., Cambridge, MA 02139
| | - Amy E. Keating
- MIT Department of Biology, 77 Massachusetts Ave., Cambridge, MA 02139
- MIT Department of Biological Engineering, 77 Massachusetts Ave., Cambridge, MA 02139
- Koch Institute for Integrative Cancer Research, 77 Massachusetts Ave., Cambridge, MA 02139
| |
Collapse
|
2
|
Saikia B, Baruah A. Recent advances in de novo computational design and redesign of intrinsically disordered proteins and intrinsically disordered protein regions. Arch Biochem Biophys 2024; 752:109857. [PMID: 38097100 DOI: 10.1016/j.abb.2023.109857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 12/10/2023] [Accepted: 12/10/2023] [Indexed: 12/17/2023]
Abstract
In the early 2000s, the concept of "unstructured biology" has emerged to be an important field in protein science by generating various new research directions. Many novel strategies and methods have been developed that are focused on effectively identifying/predicting intrinsically disordered proteins (IDPs) and intrinsically disordered protein regions (IDPRs), identifying their potential functions, disorder based drug design etc. Due to the range of functions of IDPs/IDPRs and their involvement in various debilitating diseases they are of contemporary interest to the scientific community. Recent researches are focused on designing/redesigning specific IDPs/IDPRs de novo. These de novo design/redesigns of IDPs/IDPRs are carried out by altering compositional biases and specific sequence patterning parameters. The main focus of these researches is to influence specific molecular functions, phase behavior, cellular phenotypes etc. In this review, we first provide the differences of natively folded and natively unfolded or IDPs with respect to their potential energy landscapes. Here, we provide current understandings on the different computational design strategies and methods that have been utilized in de novo design and redesigns of IDPs and IDPRs. Finally, we conclude the review by discussing the challenges that have been faced during the computational design/design attempts of IDPs/IDPRs.
Collapse
Affiliation(s)
- Bondeepa Saikia
- Department of Chemistry, Dibrugarh University, Dibrugarh, 786004, Assam, India
| | - Anupaul Baruah
- Department of Chemistry, Dibrugarh University, Dibrugarh, 786004, Assam, India.
| |
Collapse
|
3
|
Chakraborty A, Hussain A, Sabnam N. Uncovering the structural stability of Magnaporthe oryzae effectors: a secretome-wide in silico analysis. J Biomol Struct Dyn 2023:1-22. [PMID: 38109060 DOI: 10.1080/07391102.2023.2292795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Accepted: 11/23/2023] [Indexed: 12/19/2023]
Abstract
Rice blast, caused by the ascomycete fungus Magnaporthe oryzae, is a deadly disease and a major threat to global food security. The pathogen secretes small proteinaceous effectors, virulence factors, inside the host to manipulate and perturb the host immune system, allowing the pathogen to colonize and establish a successful infection. While the molecular functions of several effectors are characterized, very little is known about the structural stability of these effectors. We analyzed a total of 554 small secretory proteins (SSPs) from the M. oryzae secretome to decipher key features of intrinsic disorder (ID) and the structural dynamics of the selected putative effectors through thorough and systematic in silico studies. Our results suggest that out of the total SSPs, 66% were predicted as effector proteins, released either into the apoplast or cytoplasm of the host cell. Of these, 68% were found to be intrinsically disordered effector proteins (IDEPs). Among the six distinct classes of disordered effectors, we observed peculiar relationships between the localization of several effectors in the apoplast or cytoplasm and the degree of disorder. We determined the degree of structural disorder and its impact on protein foldability across all the putative small secretory effector proteins from the blast pathogen, further validated by molecular dynamics simulation studies. This study provides definite clues toward unraveling the mystery behind the importance of structural distortions in effectors and their impact on plant-pathogen interactions. The study of these dynamical segments may help identify new effectors as well.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
| | - Afzal Hussain
- Department of Bioinformatics, Maulana Azad National Institute of Technology, Bhopal, India
| | - Nazmiara Sabnam
- Department of Life Sciences, Presidency University, Kolkata, India
| |
Collapse
|
4
|
Riley AC, Ashlock DA, Graether SP. The difficulty of aligning intrinsically disordered protein sequences as assessed by conservation and phylogeny. PLoS One 2023; 18:e0288388. [PMID: 37440576 DOI: 10.1371/journal.pone.0288388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 06/26/2023] [Indexed: 07/15/2023] Open
Abstract
Intrinsically disordered proteins (IDPs) are proteins that lack a stable 3D structure but maintain a biological function. It has been frequently suggested that IDPs are difficult to align because they tend to have fewer conserved residues compared to ordered proteins, but to our knowledge this has never been directly tested. To compare the alignments of ordered proteins to IDPs, their multiple sequence alignments (MSAs) were assessed using two different methods. The first compared the similarity between MSAs produced using the same sequences but created with Clustal Omega, MAFFT, and MUSCLE. The second assessed MSAs based on how well they recapitulated the species tree. These two methods measure the "correctness" of an MSA with two different approaches; the first method measures consistency while the second measures the underlying phylogenetic signal. Proteins that contained both regions of disorder and order were analyzed along with proteins that were fully disordered and fully ordered, using nucleotide, codon and peptide sequence alignments. We observed that IDPs had less similar MSAs than ordered proteins, which is most likely linked to the lower sequence conservation in IDPs. However, comparisons of tree distances found that trees from the ordered sequence MSAs were not significantly closer to the species tree than those inferred from disordered sequence MSAs. Our results show that it is correct to say that IDPs are difficult to align on the basis of MSA consistency, but that this does not equate with alignments being of poor quality when assessed by their ability to correctly infer a species tree.
Collapse
Affiliation(s)
- Andrew C Riley
- Graduate Program in Bioinformatics, University of Guelph, Guelph, Ontario, Canada
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, Ontario, Canada
| | - Daniel A Ashlock
- Graduate Program in Bioinformatics, University of Guelph, Guelph, Ontario, Canada
- Department of Mathematics & Statistics, University of Guelph, Guelph, Ontario, Canada
| | - Steffen P Graether
- Graduate Program in Bioinformatics, University of Guelph, Guelph, Ontario, Canada
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, Ontario, Canada
| |
Collapse
|
5
|
Funneling modulatory peptide design with generative models: Discovery and characterization of disruptors of calcineurin protein-protein interactions. PLoS Comput Biol 2023; 19:e1010874. [PMID: 36730443 PMCID: PMC9928118 DOI: 10.1371/journal.pcbi.1010874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 02/14/2023] [Accepted: 01/16/2023] [Indexed: 02/04/2023] Open
Abstract
Design of peptide binders is an attractive strategy for targeting "undruggable" protein-protein interfaces. Current design protocols rely on the extraction of an initial sequence from one known protein interactor of the target protein, followed by in-silico or in-vitro mutagenesis-based optimization of its binding affinity. Wet lab protocols can explore only a minor portion of the vast sequence space and cannot efficiently screen for other desirable properties such as high specificity and low toxicity, while in-silico design requires intensive computational resources and often relies on simplified binding models. Yet, for a multivalent protein target, dozens to hundreds of natural protein partners already exist in the cellular environment. Here, we describe a peptide design protocol that harnesses this diversity via a machine learning generative model. After identifying putative natural binding fragments by literature and homology search, a compositional Restricted Boltzmann Machine is trained and sampled to yield hundreds of diverse candidate peptides. The latter are further filtered via flexible molecular docking and an in-vitro microchip-based binding assay. We validate and test our protocol on calcineurin, a calcium-dependent protein phosphatase involved in various cellular pathways in health and disease. In a single screening round, we identified multiple 16-length peptides with up to six mutations from their closest natural sequence that successfully interfere with the binding of calcineurin to its substrates. In summary, integrating protein interaction and sequence databases, generative modeling, molecular docking and interaction assays enables the discovery of novel protein-protein interaction modulators.
Collapse
|
6
|
Ilzhöfer D, Heinzinger M, Rost B. SETH predicts nuances of residue disorder from protein embeddings. FRONTIERS IN BIOINFORMATICS 2022; 2:1019597. [PMID: 36304335 PMCID: PMC9580958 DOI: 10.3389/fbinf.2022.1019597] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 09/20/2022] [Indexed: 11/07/2022] Open
Abstract
Predictions for millions of protein three-dimensional structures are only a few clicks away since the release of AlphaFold2 results for UniProt. However, many proteins have so-called intrinsically disordered regions (IDRs) that do not adopt unique structures in isolation. These IDRs are associated with several diseases, including Alzheimer’s Disease. We showed that three recent disorder measures of AlphaFold2 predictions (pLDDT, “experimentally resolved” prediction and “relative solvent accessibility”) correlated to some extent with IDRs. However, expert methods predict IDRs more reliably by combining complex machine learning models with expert-crafted input features and evolutionary information from multiple sequence alignments (MSAs). MSAs are not always available, especially for IDRs, and are computationally expensive to generate, limiting the scalability of the associated tools. Here, we present the novel method SETH that predicts residue disorder from embeddings generated by the protein Language Model ProtT5, which explicitly only uses single sequences as input. Thereby, our method, relying on a relatively shallow convolutional neural network, outperformed much more complex solutions while being much faster, allowing to create predictions for the human proteome in about 1 hour on a consumer-grade PC with one NVIDIA GeForce RTX 3060. Trained on a continuous disorder scale (CheZOD scores), our method captured subtle variations in disorder, thereby providing important information beyond the binary classification of most methods. High performance paired with speed revealed that SETH’s nuanced disorder predictions for entire proteomes capture aspects of the evolution of organisms. Additionally, SETH could also be used to filter out regions or proteins with probable low-quality AlphaFold2 3D structures to prioritize running the compute-intensive predictions for large data sets. SETH is freely publicly available at: https://github.com/Rostlab/SETH.
Collapse
Affiliation(s)
- Dagmar Ilzhöfer
- Faculty of Informatics, TUM (Technical University of Munich), Munich, Germany
| | - Michael Heinzinger
- Faculty of Informatics, TUM (Technical University of Munich), Munich, Germany,Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), TUM Graduate School, Garching, Germany,*Correspondence: Michael Heinzinger,
| | - Burkhard Rost
- Faculty of Informatics, TUM (Technical University of Munich), Munich, Germany,Institute for Advanced Study (TUM-IAS), TUM (Technical University of Munich), Garching, Germany,TUM School of Life Sciences Weihenstephan (WZW), TUM (Technical University of Munich), Freising, Germany
| |
Collapse
|
7
|
Ghosh K, Huihui J, Phillips M, Haider A. Rules of Physical Mathematics Govern Intrinsically Disordered Proteins. Annu Rev Biophys 2022; 51:355-376. [PMID: 35119946 PMCID: PMC9190209 DOI: 10.1146/annurev-biophys-120221-095357] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
In stark contrast to foldable proteins with a unique folded state, intrinsically disordered proteins and regions (IDPs) persist in perpetually disordered ensembles. Yet an IDP ensemble has conformational features-even when averaged-that are specific to its sequence. In fact, subtle changes in an IDP sequence can modulate its conformational features and its function. Recent advances in theoretical physics reveal a set of elegant mathematical expressions that describe the intricate relationships among IDP sequences, their ensemble conformations, and the regulation of their biological functions. These equations also describe the molecular properties of IDP sequences that predict similarities and dissimilarities in their functions and facilitate classification of sequences by function, an unmet challenge to traditional bioinformatics. These physical sequence-patterning metrics offer a promising new avenue for advancing synthetic biology at a time when multiple novel functional modes mediated by IDPs are emerging.
Collapse
Affiliation(s)
- Kingshuk Ghosh
- Department of Physics and Astronomy, University of Denver, Denver, Colorado, USA,Molecular and Cellular Biophysics Program, University of Denver, Denver, Colorado, USA
| | - Jonathan Huihui
- Department of Physics and Astronomy, University of Denver, Denver, Colorado, USA
| | - Michael Phillips
- Department of Physics and Astronomy, University of Denver, Denver, Colorado, USA
| | - Austin Haider
- Molecular and Cellular Biophysics Program, University of Denver, Denver, Colorado, USA
| |
Collapse
|
8
|
Pajkos M, Dosztányi Z. Functions of intrinsically disordered proteins through evolutionary lenses. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2021; 183:45-74. [PMID: 34656334 DOI: 10.1016/bs.pmbts.2021.06.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Protein sequences are the result of an evolutionary process that involves the balancing act of experimenting with novel mutations and selecting out those that have an undesirable functional outcome. In the case of globular proteins, the function relies on a well-defined conformation, therefore, there is a strong evolutionary pressure to preserve the structure. However, different evolutionary rules might apply for the group of intrinsically disordered regions and proteins (IDR/IDPs) that exist as an ensemble of fluctuating conformations. The function of IDRs can directly originate from their disordered state or arise through different types of molecular recognition processes. There is an amazing variety of ways IDRs can carry out their functions, and this is also reflected in their evolutionary properties. In this chapter we give an overview of the different types of evolutionary behavior of disordered proteins and associated functions in normal and disease settings.
Collapse
Affiliation(s)
- Mátyás Pajkos
- Department of Biochemistry, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, ELTE Eötvös Loránd University, Budapest, Hungary.
| |
Collapse
|
9
|
Iqbal S, Halim Z. Orienting Conflicted Graph Edges Using Genetic Algorithms to Discover Pathways in Protein-Protein Interaction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1970-1985. [PMID: 31944985 DOI: 10.1109/tcbb.2020.2966703] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Advanced computational techniques of the current era help to identify proteins from the complex biological network that interact with each other and with the cell's environment. Biological pathways are a chain of molecular actions that leads to a new molecular product creation or alters the cellular state. These pathways are helpful in the predication of many real-world issues. Rebuilding these pathways is a challenging task due to the fact that protein interactions are undirected, whereas pathways are directed. To discover these pathways in protein-protein interaction data from specified source and target, it is essential to orient protein interactions. Unfortunately, the edge orientation problem is NP-hard, which makes it challenging to develop effective algorithms. This work rebuilds biologically important pathways in a weighted network of protein interactions of yeast species. The proposed algorithm, pseudo-guided multi-objective genetic algorithm (PGMOGA) rebuilds pathways by assigning orientation to the edges of the weighted network. Extending the past research, mathematical modeling of single-objective and multi-objective functions is performed. The PGMOGA is compared with four state-of-the-art approaches, namely, random orientation plus local search (ROLS), single-objective genetic algorithm (SOGA), multi-objective genetic algorithm (MOGA), and multi random search (MRS). The comparison is based on three general and four path specific metrics. Results show that the current proposal performs better.
Collapse
|
10
|
Huihui J, Ghosh K. Intrachain interaction topology can identify functionally similar intrinsically disordered proteins. Biophys J 2021; 120:1860-1868. [PMID: 33865811 DOI: 10.1016/j.bpj.2020.11.2282] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 10/17/2020] [Accepted: 11/19/2020] [Indexed: 01/06/2023] Open
Abstract
Functionally similar IDPs (intrinsically disordered proteins) often have little sequence similarity. This is in stark contrast to folded proteins and poses a challenge for the inverse problem, functional classification of IDPs using sequence alignment. The problem is further compounded because of the lack of structure in IDPs, preventing structural alignment as an alternate tool for classification. Recent advances in heteropolymer theory unveiled a powerful set of sequence-patterning metrics bridging molecular interaction with chain conformation. Focusing only on charge patterning, these set of metrics yield a sequence charge decoration matrix (SCDM). SCDMs can potentially identify functionally similar IDPs not apparent from sequence alignment alone. Here, we illustrate how these information-rich "molecular blueprints" encoded in SCDMs can be used for functional classification of IDPs with specific application in three protein families-Ste50, PSC, and RAM-in which electrostatics is known to be important. For both the Ste50 and PSC protein family, the set of metrics appropriately classifies proteins in functional and nonfunctional groups in agreement with experiment. Furthermore, our algorithm groups synthetic variants of the disordered RAM region of the Notch receptor protein-important in gene expression-in reasonable accordance with classification based on experimentally measured binding constants of RAM and transcription factor. Taken together, the novel classification scheme reveals the critical role of a high-dimensional set of metrics-manifest in self-interaction maps and topology-in functional annotation of IDPs even when there is low sequence homology, providing the much-needed alternate to a traditional sequence alignment tool.
Collapse
Affiliation(s)
- Jonathan Huihui
- Department of Physics and Astronomy, University of Denver, Denver, Colorado
| | - Kingshuk Ghosh
- Department of Physics and Astronomy, University of Denver, Denver, Colorado.
| |
Collapse
|
11
|
de Bruijn SE, Smits JJ, Liu C, Lanting CP, Beynon AJ, Blankevoort J, Oostrik J, Koole W, de Vrieze E, Cremers CWRJ, Cremers FPM, Roosing S, Yntema HG, Kunst HPM, Zhao B, Pennings RJE, Kremer H. A RIPOR2 in-frame deletion is a frequent and highly penetrant cause of adult-onset hearing loss. J Med Genet 2020; 58:jmedgenet-2020-106863. [PMID: 32631815 PMCID: PMC8120656 DOI: 10.1136/jmedgenet-2020-106863] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 03/25/2020] [Accepted: 04/01/2020] [Indexed: 12/23/2022]
Abstract
BACKGROUND Hearing loss is one of the most prevalent disabilities worldwide, and has a significant impact on quality of life. The adult-onset type of the condition is highly heritable but the genetic causes are largely unknown, which is in contrast to childhood-onset hearing loss. METHODS Family and cohort studies included exome sequencing and characterisation of the hearing phenotype. Ex vivo protein expression addressed the functional effect of a DNA variant. RESULTS An in-frame deletion of 12 nucleotides in RIPOR2 was identified as a highly penetrant cause of adult-onset progressive hearing loss that segregated as an autosomal dominant trait in 12 families from the Netherlands. Hearing loss associated with the deletion in 63 subjects displayed variable audiometric characteristics and an average (SD) age of onset of 30.6 (14.9) years (range 0-70 years). A functional effect of the RIPOR2 variant was demonstrated by aberrant localisation of the mutant RIPOR2 in the stereocilia of cochlear hair cells and failure to rescue morphological defects in RIPOR2-deficient hair cells, in contrast to the wild-type protein. Strikingly, the RIPOR2 variant is present in 18 of 22 952 individuals not selected for hearing loss in the Southeast Netherlands. CONCLUSION Collectively, the presented data demonstrate that an inherited form of adult-onset hearing loss is relatively common, with potentially thousands of individuals at risk in the Netherlands and beyond, which makes it an attractive target for developing a (genetic) therapy.
Collapse
Affiliation(s)
- Suzanne E de Bruijn
- Department of Human Genetics, Radboudumc, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboudumc, Nijmegen, The Netherlands
| | - Jeroen J Smits
- Donders Institute for Brain, Cognition and Behaviour, Radboudumc, Nijmegen, The Netherlands
- Department of Otorhinolaryngology, Radboudumc, Nijmegen, The Netherlands
| | - Chang Liu
- Department of Otolaryngology-Head and Neck Surgery, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Cornelis P Lanting
- Department of Otorhinolaryngology, Radboudumc, Nijmegen, The Netherlands
| | - Andy J Beynon
- Department of Otorhinolaryngology, Radboudumc, Nijmegen, The Netherlands
| | | | - Jaap Oostrik
- Donders Institute for Brain, Cognition and Behaviour, Radboudumc, Nijmegen, The Netherlands
- Department of Otorhinolaryngology, Radboudumc, Nijmegen, The Netherlands
| | - Wouter Koole
- Department of Human Genetics, Radboudumc, Nijmegen, The Netherlands
| | - Erik de Vrieze
- Donders Institute for Brain, Cognition and Behaviour, Radboudumc, Nijmegen, The Netherlands
- Department of Otorhinolaryngology, Radboudumc, Nijmegen, The Netherlands
| | - Cor W R J Cremers
- Department of Otorhinolaryngology, Radboudumc, Nijmegen, The Netherlands
| | - Frans P M Cremers
- Department of Human Genetics, Radboudumc, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboudumc, Nijmegen, The Netherlands
| | - Susanne Roosing
- Department of Human Genetics, Radboudumc, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboudumc, Nijmegen, The Netherlands
| | - Helger G Yntema
- Department of Human Genetics, Radboudumc, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboudumc, Nijmegen, The Netherlands
| | - Henricus P M Kunst
- Department of Otorhinolaryngology, Radboudumc, Nijmegen, The Netherlands
- Radboud Institute for Health Sciences, Radboudumc, Nijmegen, The Netherlands
| | - Bo Zhao
- Department of Otolaryngology-Head and Neck Surgery, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Ronald J E Pennings
- Donders Institute for Brain, Cognition and Behaviour, Radboudumc, Nijmegen, The Netherlands
- Department of Otorhinolaryngology, Radboudumc, Nijmegen, The Netherlands
| | - Hannie Kremer
- Department of Human Genetics, Radboudumc, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboudumc, Nijmegen, The Netherlands
- Department of Otorhinolaryngology, Radboudumc, Nijmegen, The Netherlands
| |
Collapse
|
12
|
Cohan MC, Ruff KM, Pappu RV. Information theoretic measures for quantifying sequence-ensemble relationships of intrinsically disordered proteins. Protein Eng Des Sel 2020; 32:191-202. [PMID: 31375817 PMCID: PMC7462041 DOI: 10.1093/protein/gzz014] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 06/19/2019] [Indexed: 01/26/2023] Open
Abstract
Intrinsically disordered proteins (IDPs) contribute to a multitude of functions. De novo design of IDPs should open the door to modulating functions and phenotypes controlled by these systems. Recent design efforts have focused on compositional biases and specific sequence patterns as the design features. Analysis of the impact of these designs on sequence-function relationships indicates that individual sequence/compositional parameters are insufficient for describing sequence-function relationships in IDPs. To remedy this problem, we have developed information theoretic measures for sequence–ensemble relationships (SERs) of IDPs. These measures rely on prior availability of statistically robust conformational ensembles derived from all atom simulations. We show that the measures we have developed are useful for comparing sequence-ensemble relationships even when sequence is poorly conserved. Based on our results, we propose that de novo designs of IDPs, guided by knowledge of their SERs, should provide improved insights into their sequence–ensemble–function relationships.
Collapse
Affiliation(s)
- Megan C Cohan
- Department of Biomedical Engineering and Center for Science & Engineering of Living Systems (CSELS) Washington University in St. Louis, One Brookings Drive, Campus Box 1097, St. Louis MO, USA
| | - Kiersten M Ruff
- Department of Biomedical Engineering and Center for Science & Engineering of Living Systems (CSELS) Washington University in St. Louis, One Brookings Drive, Campus Box 1097, St. Louis MO, USA
| | - Rohit V Pappu
- Department of Biomedical Engineering and Center for Science & Engineering of Living Systems (CSELS) Washington University in St. Louis, One Brookings Drive, Campus Box 1097, St. Louis MO, USA
| |
Collapse
|
13
|
Shafee T, Bacic A, Johnson K. Evolution of Sequence-Diverse Disordered Regions in a Protein Family: Order within the Chaos. Mol Biol Evol 2020; 37:2155-2172. [DOI: 10.1093/molbev/msaa096] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Abstract
Approaches for studying the evolution of globular proteins are now well established yet are unsuitable for disordered sequences. Our understanding of the evolution of proteins containing disordered regions therefore lags that of globular proteins, limiting our capacity to estimate their evolutionary history, classify paralogs, and identify potential sequence–function relationships. Here, we overcome these limitations by using new analytical approaches that project representations of sequence space to dissect the evolution of proteins with both ordered and disordered regions, and the correlated changes between these. We use the fasciclin-like arabinogalactan proteins (FLAs) as a model family, since they contain a variable number of globular fasciclin domains as well as several distinct types of disordered regions: proline (Pro)-rich arabinogalactan (AG) regions and longer Pro-depleted regions.
Sequence space projections of fasciclin domains from 2019 FLAs from 78 species identified distinct clusters corresponding to different types of fasciclin domains. Clusters can be similarly identified in the seemingly random Pro-rich AG and Pro-depleted disordered regions. Sequence features of the globular and disordered regions clearly correlate with one another, implying coevolution of these distinct regions, as well as with the N-linked and O-linked glycosylation motifs. We reconstruct the overall evolutionary history of the FLAs, annotated with the changing domain architectures, glycosylation motifs, number and length of AG regions, and disordered region sequence features. Mapping these features onto the functionally characterized FLAs therefore enables their sequence–function relationships to be interrogated. These findings will inform research on the abundant disordered regions in protein families from all kingdoms of life.
Collapse
Affiliation(s)
- Thomas Shafee
- Department of Animal, Plant and Soil Sciences, La Trobe Institute for Agriculture & Food, La Trobe University, Melbourne, VIC, Australia
| | - Antony Bacic
- Department of Animal, Plant and Soil Sciences, La Trobe Institute for Agriculture & Food, La Trobe University, Melbourne, VIC, Australia
- Sino-Australia Plant Cell Wall Research Centre, College of Forestry and Biotechnology, Zhejiang Agriculture and Forestry University, Lin’an, Hangzhou, China
| | - Kim Johnson
- Department of Animal, Plant and Soil Sciences, La Trobe Institute for Agriculture & Food, La Trobe University, Melbourne, VIC, Australia
- Sino-Australia Plant Cell Wall Research Centre, College of Forestry and Biotechnology, Zhejiang Agriculture and Forestry University, Lin’an, Hangzhou, China
| |
Collapse
|
14
|
Su WC, Harrison PM. Deep conservation of prion-like composition in the eukaryotic prion-former Pub1/Tia1 family and its relatives. PeerJ 2020; 8:e9023. [PMID: 32337108 PMCID: PMC7169965 DOI: 10.7717/peerj.9023] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Accepted: 03/30/2020] [Indexed: 12/12/2022] Open
Abstract
Pub1 protein is an important RNA-binding protein functional in stress granule assembly in budding yeast Saccharomyces cerevisiae and, as its co-ortholog Tia1, in humans. It is unique among proteins in evidencing prion-like aggregation in both its yeast and human forms. Previously, we noted that Pub1/Tia1 was the only protein linked to human disease that has prion-like character and and has demonstrated such aggregation in both species. Thus, we were motivated to probe further into the evolution of the Pub1/Tia1 family (and its close relative Nam8 and its orthologs) to gain a picture of how such a protein has evolved over deep evolutionary time since the last common ancestor of eukaryotes. Here, we discover that the prion-like composition of this protein family is deeply conserved across eukaryotes, as is the prion-like composition of its close relative Nam8/Ngr1. A sizeable minority of protein orthologs have multiple prion-like domains within their sequences (6-20% depending on criteria). The number of RNA-binding RRM domains is conserved at three copies over >86% of the Pub1 family (>71% of the Nam8 family), but proteins with just one or two RRM domains occur frequently in some clades, indicating that these are not due to annotation errors. Overall, our results indicate that a basic scaffold comprising three RNA-binding domains and at least one prion-like region has been largely conserved since the last common ancestor of eukaryotes, providing further evidence that prion-like aggregation may be a very ancient and conserved phenomenon for certain specific proteins.
Collapse
Affiliation(s)
- Wan-Chun Su
- Department of Biology, McGill University, Montreal, QC, Canada
| | - Paul M Harrison
- Department of Biology, McGill University, Montreal, QC, Canada
| |
Collapse
|
15
|
Trivedi R, Nagarajaram HA. Amino acid substitution scoring matrices specific to intrinsically disordered regions in proteins. Sci Rep 2019; 9:16380. [PMID: 31704957 PMCID: PMC6841959 DOI: 10.1038/s41598-019-52532-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 10/15/2019] [Indexed: 01/09/2023] Open
Abstract
An amino acid substitution scoring matrix encapsulates the rates at which various amino acid residues in proteins are substituted by other amino acid residues, over time. Database search methods make use of substitution scoring matrices to identify sequences with homologous relationships. However, widely used substitution scoring matrices, such as BLOSUM series, have been developed using aligned blocks that are mostly devoid of disordered regions in proteins. Hence, these substitution-scoring matrices are mostly inappropriate for homology searches involving proteins enriched with disordered regions as the disordered regions have distinct amino acid compositional bias, and therefore expected to have undergone amino acid substitutions that are distinct from those in the ordered regions. We, therefore, developed a novel series of substitution scoring matrices referred to as EDSSMat by exclusively considering the substitution frequencies of amino acids in the disordered regions of the eukaryotic proteins. The newly developed matrices were tested for their ability to detect homologs of proteins enriched with disordered regions by means of SSEARCH tool. The results unequivocally demonstrate that EDSSMat matrices detect more number of homologs than the widely used BLOSUM, PAM and other standard matrices, indicating their utility value for homology searches of intrinsically disordered proteins.
Collapse
Affiliation(s)
- Rakesh Trivedi
- Laboratory of Computational Biology, Centre for DNA Fingerprinting and Diagnostics, Uppal, Hyderabad, Telangana, 500039, India
- Graduate School, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
| | - Hampapathalu Adimurthy Nagarajaram
- Department of Systems and Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, 500 046, India.
- Centre for Modelling, Simulation and Design, University of Hyderabad, Hyderabad, Telangana, 500 046, India.
| |
Collapse
|
16
|
Su TY, Harrison PM. Conservation of Prion-Like Composition and Sequence in Prion-Formers and Prion-Like Proteins of Saccharomyces cerevisiae. Front Mol Biosci 2019; 6:54. [PMID: 31355208 PMCID: PMC6639077 DOI: 10.3389/fmolb.2019.00054] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Accepted: 06/26/2019] [Indexed: 01/15/2023] Open
Abstract
Prions in eukaryotes have been linked to diseases, evolutionary capacitance, large-scale genetic control, and long-term memory formation. Prion formation and propagation have been studied extensively in the budding yeast Saccharomyces cerevisiae. Here, we have analysed the conservation of sequence and of prion-like composition for prion-forming proteins and for other prion-like proteins from S. cerevisiae, across three evolutionary levels. We discover that prion-like status is well-conserved for about half the set of prion-formers at the Saccharomycetes level, and that prion-forming domains evolve more quickly as sequences than other prion-like domains do. Such increased mutation rates may be linked to the acquisition of functional roles for prion-forming domains during the evolutionary epoch of Saccharomycetes. Domain scores for prion-like composition in S. cerevisiae are strongly correlated with scores for such composition weighted evolutionarily over the dozens of fungal species examined, indicating conservation of such prion-like status. Examples of notable prion-like proteins that are highly conserved both in sequence and prion-like composition are discussed.
Collapse
Affiliation(s)
- Ting-Yi Su
- Department of Biology, McGill University, Montreal, QC, Canada
| | - Paul M Harrison
- Department of Biology, McGill University, Montreal, QC, Canada
| |
Collapse
|
17
|
Narasumani M, Harrison PM. Discerning evolutionary trends in post-translational modification and the effect of intrinsic disorder: Analysis of methylation, acetylation and ubiquitination sites in human proteins. PLoS Comput Biol 2018; 14:e1006349. [PMID: 30096183 PMCID: PMC6105011 DOI: 10.1371/journal.pcbi.1006349] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 08/22/2018] [Accepted: 07/07/2018] [Indexed: 11/18/2022] Open
Abstract
Intrinsically disordered regions (IDRs) of proteins play significant biological functional roles despite lacking a well-defined 3D structure. For example, IDRs provide efficient housing for large numbers of post-translational modification (PTM) sites in eukaryotic proteins. Here, we study the distribution of more than 15,000 experimentally determined human methylation, acetylation and ubiquitination sites (collectively termed 'MAU' sites) in ordered and disordered regions, and analyse their conservation across 380 eukaryotic species. Conservation signals for the maintenance and novel emergence of MAU sites are examined at 11 evolutionary levels from the whole eukaryotic domain down to the ape superfamily, in both ordered and disordered regions. We discover that MAU PTM is a major driver of conservation for arginines and lysines in both ordered and disordered regions, across the 11 levels, most significantly across the mammalian clade. Conservation of human methylatable arginines is very strongly favoured for ordered regions rather than for disordered, whereas methylatable lysines are conserved in either set of regions, and conservation of acetylatable and ubiquitinatable lysines is favoured in disordered over ordered. Notably, we find evidence for the emergence of new lysine MAU sites in disordered regions of proteins in deuterostomes and mammals, and in ordered regions after the dawn of eutherians. For histones specifically, MAU sites demonstrate an idiosyncratic significant conservation pattern that is evident since the last common ancestor of mammals. Similarly, folding-on-binding (FB) regions are highly enriched for MAU sites relative to either ordered or disordered regions, with ubiquitination sites in FBs being highly conserved at all evolutionary levels back as far as mammals. This investigation clearly demonstrates the complex patterns of PTM evolution across the human proteome and that it is necessary to consider conservation of sequence features at multiple evolutionary levels in order not to get an incomplete or misleading picture.
Collapse
|
18
|
Kinjo AR. Cooperative "folding transition" in the sequence space facilitates function-driven evolution of protein families. J Theor Biol 2018; 443:18-27. [PMID: 29355538 DOI: 10.1016/j.jtbi.2018.01.019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2017] [Revised: 01/16/2018] [Accepted: 01/17/2018] [Indexed: 12/23/2022]
Abstract
In the protein sequence space, natural proteins form clusters of families which are characterized by their unique native folds whereas the great majority of random polypeptides are neither clustered nor foldable to unique structures. Since a given polypeptide can be either foldable or unfoldable, a kind of "folding transition" is expected at the boundary of a protein family in the sequence space. By Monte Carlo simulations of a statistical mechanical model of protein sequence alignment that coherently incorporates both short-range and long-range interactions as well as variable-length insertions to reproduce the statistics of the multiple sequence alignment of a given protein family, we demonstrate the existence of such transition between natural-like sequences and random sequences in the sequence subspaces for 15 domain families of various folds. The transition was found to be highly cooperative and two-state-like. Furthermore, enforcing or suppressing consensus residues on a few of the well-conserved sites enhanced or diminished, respectively, the natural-like pattern formation over the entire sequence. In most families, the key sites included ligand binding sites. These results suggest some selective pressure on the key residues, such as ligand binding activity, may cooperatively facilitate the emergence of a protein family during evolution. From a more practical aspect, the present results highlight an essential role of long-range effects in precisely defining protein families, which are absent in conventional sequence models.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka 565-0871, Japan.
| |
Collapse
|
19
|
MSX1 mutations and associated disease phenotypes: genotype-phenotype relations. Eur J Hum Genet 2016; 24:1663-1670. [PMID: 27381090 DOI: 10.1038/ejhg.2016.78] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Revised: 05/21/2016] [Accepted: 05/26/2016] [Indexed: 02/06/2023] Open
Abstract
The Msx1 transcription factor is involved in multiple epithelial-mesenchymal interactions during vertebrate embryogenesis. It has pleiotropic effects in several tissues. In humans, MSX1 variants have been related to tooth agenesis, orofacial clefting, and nail dysplasia. We correlate all MSX1 disease causing variants to phenotypic features to shed light on this hitherto unclear association. MSX1 truncations cause more severe phenotypes than in-frame variants. Mutations in the homeodomain always cause tooth agenesis with or without other phenotypes while mutations outside the homeodomain are mostly associated with non-syndromic orofacial clefts. Downstream effects can be further explored by the edgetic perturbation model. This information provides new insights for genetic diagnosis and for further functional analysis of MSX1 variants.
Collapse
|