1
|
Peng Z, Li Z, Meng Q, Zhao B, Kurgan L. CLIP: accurate prediction of disordered linear interacting peptides from protein sequences using co-evolutionary information. Brief Bioinform 2023; 24:6858950. [PMID: 36458437 DOI: 10.1093/bib/bbac502] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 09/30/2022] [Accepted: 10/24/2022] [Indexed: 12/04/2022] Open
Abstract
One of key features of intrinsically disordered regions (IDRs) is facilitation of protein-protein and protein-nucleic acids interactions. These disordered binding regions include molecular recognition features (MoRFs), short linear motifs (SLiMs) and longer binding domains. Vast majority of current predictors of disordered binding regions target MoRFs, with a handful of methods that predict SLiMs and disordered protein-binding domains. A new and broader class of disordered binding regions, linear interacting peptides (LIPs), was introduced recently and applied in the MobiDB resource. LIPs are segments in protein sequences that undergo disorder-to-order transition upon binding to a protein or a nucleic acid, and they cover MoRFs, SLiMs and disordered protein-binding domains. Although current predictors of MoRFs and disordered protein-binding regions could be used to identify some LIPs, there are no dedicated sequence-based predictors of LIPs. To this end, we introduce CLIP, a new predictor of LIPs that utilizes robust logistic regression model to combine three complementary types of inputs: co-evolutionary information derived from multiple sequence alignments, physicochemical profiles and disorder predictions. Ablation analysis suggests that the co-evolutionary information is particularly useful for this prediction and that combining the three inputs provides substantial improvements when compared to using these inputs individually. Comparative empirical assessments using low-similarity test datasets reveal that CLIP secures area under receiver operating characteristic curve (AUC) of 0.8 and substantially improves over the results produced by the closest current tools that predict MoRFs and disordered protein-binding regions. The webserver of CLIP is freely available at http://biomine.cs.vcu.edu/servers/CLIP/ and the standalone code can be downloaded from http://yanglab.qd.sdu.edu.cn/download/CLIP/.
Collapse
Affiliation(s)
- Zhenling Peng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China.,Frontier Science Center for Nonlinear Expectations, Ministry of Education, Qingdao, 266237, China
| | - Zixia Li
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Qiaozhen Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, 300072, China
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
2
|
Short Linear Motifs in Colorectal Cancer Interactome and Tumorigenesis. Cells 2022; 11:cells11233739. [PMID: 36496998 PMCID: PMC9737320 DOI: 10.3390/cells11233739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 11/16/2022] [Accepted: 11/21/2022] [Indexed: 11/25/2022] Open
Abstract
Colorectal tumorigenesis is driven by alterations in genes and proteins responsible for cancer initiation, progression, and invasion. This multistage process is based on a dense network of protein-protein interactions (PPIs) that become dysregulated as a result of changes in various cell signaling effectors. PPIs in signaling and regulatory networks are known to be mediated by short linear motifs (SLiMs), which are conserved contiguous regions of 3-10 amino acids within interacting protein domains. SLiMs are the minimum sequences required for modulating cellular PPI networks. Thus, several in silico approaches have been developed to predict and analyze SLiM-mediated PPIs. In this review, we focus on emerging evidence supporting a crucial role for SLiMs in driver pathways that are disrupted in colorectal cancer (CRC) tumorigenesis and related PPI network alterations. As a result, SLiMs, along with short peptides, are attracting the interest of researchers to devise small molecules amenable to be used as novel anti-CRC targeted therapies. Overall, the characterization of SLiMs mediating crucial PPIs in CRC may foster the development of more specific combined pharmacological approaches.
Collapse
|
3
|
Martín M, Brunello FG, Modenutti CP, Nicola JP, Marti MA. MotSASi: Functional short linear motifs (SLiMs) prediction based on genomic single nucleotide variants and structural data. Biochimie 2022; 197:59-73. [DOI: 10.1016/j.biochi.2022.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 01/17/2022] [Accepted: 02/02/2022] [Indexed: 11/28/2022]
|
4
|
Pathogen Moonlighting Proteins: From Ancestral Key Metabolic Enzymes to Virulence Factors. Microorganisms 2021; 9:microorganisms9061300. [PMID: 34203698 PMCID: PMC8232316 DOI: 10.3390/microorganisms9061300] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 06/02/2021] [Accepted: 06/09/2021] [Indexed: 12/22/2022] Open
Abstract
Moonlighting and multitasking proteins refer to proteins with two or more functions performed by a single polypeptide chain. An amazing example of the Gain of Function (GoF) phenomenon of these proteins is that 25% of the moonlighting functions of our Multitasking Proteins Database (MultitaskProtDB-II) are related to pathogen virulence activity. Moreover, they usually have a canonical function belonging to highly conserved ancestral key functions, and their moonlighting functions are often involved in inducing extracellular matrix (ECM) protein remodeling. There are three main questions in the context of moonlighting proteins in pathogen virulence: (A) Why are a high percentage of pathogen moonlighting proteins involved in virulence? (B) Why do most of the canonical functions of these moonlighting proteins belong to primary metabolism? Moreover, why are they common in many pathogen species? (C) How are these different protein sequences and structures able to bind the same set of host ECM protein targets, mainly plasminogen (PLG), and colonize host tissues? By means of an extensive bioinformatics analysis, we suggest answers and approaches to these questions. There are three main ideas derived from the work: first, moonlighting proteins are not good candidates for vaccines. Second, several motifs that might be important in the adhesion to the ECM were identified. Third, an overrepresentation of GO codes related with virulence in moonlighting proteins were seen.
Collapse
|
5
|
Upadhyayula RS. Computational Investigation of Structural Interfaces of Protein Complexes with Short Linear Motifs. J Proteome Res 2020; 19:3254-3263. [DOI: 10.1021/acs.jproteome.0c00212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Raghavender Surya Upadhyayula
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research (BISR), Jaipur, Rajasthan 302001, India
| |
Collapse
|
6
|
Mittal A, Changani AM, Taparia S. Unique and exclusive peptide signatures directly identify intrinsically disordered proteins from sequences without structural information. J Biomol Struct Dyn 2020; 39:2885-2893. [PMID: 32295482 DOI: 10.1080/07391102.2020.1756410] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Intrinsically disordered proteins are now widely accepted to play crucial roles in biological functions. Identification of signatures of intrinsic disorder is one of the key steps towards building a proper repertoire for their occurrence in proteomes. In this work, systematic computational synthesis of a library of all possible (3368400) dipeptides, tripeptides, tetrapeptides and pentapeptides using the natural 20 amino acids allowed us to identify 36 unique tetrapeptides present exclusively in intrinsically disordered proteins and absent in the complete primary sequence space of naturally occurring structured proteins. Further, out of more than 530000 known naturally occurring primary sequences without any structural information, 1349 sequences contain the above identified unique signatures of intrinsic disorder. These sequences, having cellular functions varying from housekeeping to metabolic to transport, more than double the number of the currently known intrinsically disordered proteins. On similar lines, we report that 26577 pentapeptide signatures exclusive to intrinsically disordered proteins, and absent in naturally occurring structured proteins, identify ∼50% of more than half-a-million curated protein sequences without structural information to be intrinsically disordered. The results reported are a major leap forward in exploring functional manifestations of intrinsically disordered proteins.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Aditya Mittal
- Kusuma School of Biological Sciences, Indian Institute of Technology Delhi (IIT Delhi), New Delhi, India.,Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi (IIT Delhi), New Delhi, India
| | | | - Sakshi Taparia
- Department of Mathematics (Bachelors Program in Mathematics & Computing), Indian Institute of Technology Delhi (IIT Delhi), New Delhi, India
| |
Collapse
|
7
|
Strazic Geljic I, Kucan Brlic P, Angulo G, Brizic I, Lisnic B, Jenus T, Juranic Lisnic V, Pietri GP, Engel P, Kaynan N, Zeleznjak J, Schu P, Mandelboim O, Krmpotic A, Angulo A, Jonjic S, Lenac Rovis T. Cytomegalovirus protein m154 perturbs the adaptor protein-1 compartment mediating broad-spectrum immune evasion. eLife 2020; 9:50803. [PMID: 31928630 PMCID: PMC6957316 DOI: 10.7554/elife.50803] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 01/03/2020] [Indexed: 12/21/2022] Open
Abstract
Cytomegaloviruses (CMVs) are ubiquitous pathogens known to employ numerous immunoevasive strategies that significantly impair the ability of the immune system to eliminate the infected cells. Here, we report that the single mouse CMV (MCMV) protein, m154, downregulates multiple surface molecules involved in the activation and costimulation of the immune cells. We demonstrate that m154 uses its cytoplasmic tail motif, DD, to interfere with the adaptor protein-1 (AP-1) complex, implicated in intracellular protein sorting and packaging. As a consequence of the perturbed AP-1 sorting, m154 promotes lysosomal degradation of several proteins involved in T cell costimulation, thus impairing virus-specific CD8+ T cell response and virus control in vivo. Additionally, we show that HCMV infection similarly interferes with the AP-1 complex. Altogether, we identify the robust mechanism employed by single viral immunomodulatory protein targeting a broad spectrum of cell surface molecules involved in the antiviral immune response.
Collapse
Affiliation(s)
- Ivana Strazic Geljic
- Center for Proteomics, Faculty of Medicine, University of Rijeka, Rijeka, Croatia
| | - Paola Kucan Brlic
- Center for Proteomics, Faculty of Medicine, University of Rijeka, Rijeka, Croatia
| | - Guillem Angulo
- Immunology Unit, Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, University of Barcelona, Barcelona, Spain
| | - Ilija Brizic
- Center for Proteomics, Faculty of Medicine, University of Rijeka, Rijeka, Croatia.,Department of Histology and Embryology, Faculty of Medicine, University of Rijeka, Rijeka, Croatia
| | - Berislav Lisnic
- Center for Proteomics, Faculty of Medicine, University of Rijeka, Rijeka, Croatia.,Department of Histology and Embryology, Faculty of Medicine, University of Rijeka, Rijeka, Croatia
| | - Tina Jenus
- Center for Proteomics, Faculty of Medicine, University of Rijeka, Rijeka, Croatia
| | - Vanda Juranic Lisnic
- Center for Proteomics, Faculty of Medicine, University of Rijeka, Rijeka, Croatia.,Department of Histology and Embryology, Faculty of Medicine, University of Rijeka, Rijeka, Croatia
| | - Gian Pietro Pietri
- Center for Proteomics, Faculty of Medicine, University of Rijeka, Rijeka, Croatia
| | - Pablo Engel
- Immunology Unit, Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, University of Barcelona, Barcelona, Spain.,Institut d'Investigacions Biomèdiques August Pi i Sunyer, Barcelona, Spain
| | - Noa Kaynan
- The Lautenberg Center for General and Tumor Immunology, The BioMedical Research Institute, Hadassah Medical School, The Hebrew University, Jerusalem, Israel
| | - Jelena Zeleznjak
- Center for Proteomics, Faculty of Medicine, University of Rijeka, Rijeka, Croatia.,Department of Histology and Embryology, Faculty of Medicine, University of Rijeka, Rijeka, Croatia
| | - Peter Schu
- Zentrum für Biochemie und Molekulare Zellbiologie Institut für Zellbiochemie, Georg-August-Universität Göttingen, Goettingen, Germany
| | - Ofer Mandelboim
- The Lautenberg Center for General and Tumor Immunology, The BioMedical Research Institute, Hadassah Medical School, The Hebrew University, Jerusalem, Israel
| | - Astrid Krmpotic
- Department of Histology and Embryology, Faculty of Medicine, University of Rijeka, Rijeka, Croatia
| | - Ana Angulo
- Immunology Unit, Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, University of Barcelona, Barcelona, Spain.,Institut d'Investigacions Biomèdiques August Pi i Sunyer, Barcelona, Spain
| | - Stipan Jonjic
- Center for Proteomics, Faculty of Medicine, University of Rijeka, Rijeka, Croatia.,Department of Histology and Embryology, Faculty of Medicine, University of Rijeka, Rijeka, Croatia
| | - Tihana Lenac Rovis
- Center for Proteomics, Faculty of Medicine, University of Rijeka, Rijeka, Croatia.,Department of Histology and Embryology, Faculty of Medicine, University of Rijeka, Rijeka, Croatia
| |
Collapse
|
8
|
Guillien M, le Maire A, Mouhand A, Bernadó P, Bourguet W, Banères JL, Sibille N. IDPs and their complexes in GPCR and nuclear receptor signaling. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 174:105-155. [DOI: 10.1016/bs.pmbts.2020.05.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
9
|
Lyon KF, Cai X, Young RJ, Mamun AA, Rajasekaran S, Schiller MR. Minimotif Miner 4: a million peptide minimotifs and counting. Nucleic Acids Res 2019; 46:D465-D470. [PMID: 29140456 PMCID: PMC5753208 DOI: 10.1093/nar/gkx1085] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 11/09/2017] [Indexed: 12/27/2022] Open
Abstract
Minimotif Miner (MnM) is a database and web system for analyzing short functional peptide motifs, termed minimotifs. We present an update to MnM growing the database from ∼300 000 to >1 000 000 minimotif consensus sequences and instances. This growth comes largely from updating data from existing databases and annotation of articles with high-throughput approaches analyzing different types of post-translational modifications. Another update is mapping human proteins and their minimotifs to know human variants from the dbSNP, build 150. Now MnM 4 can be used to generate mechanistic hypotheses about how human genetic variation affect minimotifs and outcomes. One example of the utility of the combined minimotif/SNP tool identifies a loss of function missense SNP in a ubiquitylation minimotif encoded in the excision repair cross-complementing 2 (ERCC2) nucleotide excision repair gene. This SNP reaches genome wide significance for many types of cancer and the variant identified with MnM 4 reveals a more detailed mechanistic hypothesis concerning the role of ERCC2 in cancer. Other updates to the web system include a new architecture with migration of the web system and database to Docker containers for better performance and management. Weblinks:minimotifminer.org and mnm.engr.uconn.edu
Collapse
Affiliation(s)
- Kenneth F Lyon
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada, Las Vegas, 89154 4004 NV, USA
| | - Xingyu Cai
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269 2155, USA
| | - Richard J Young
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada, Las Vegas, 89154 4004 NV, USA
| | - Abdullah-Al Mamun
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269 2155, USA
| | - Sanguthevar Rajasekaran
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269 2155, USA
| | - Martin R Schiller
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada, Las Vegas, 89154 4004 NV, USA
| |
Collapse
|
10
|
Krystkowiak I, Davey NE. SLiMSearch: a framework for proteome-wide discovery and annotation of functional modules in intrinsically disordered regions. Nucleic Acids Res 2019; 45:W464-W469. [PMID: 28387819 PMCID: PMC5570202 DOI: 10.1093/nar/gkx238] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2017] [Accepted: 04/05/2017] [Indexed: 12/12/2022] Open
Abstract
The extensive intrinsically disordered regions of higher eukaryotic proteomes contain vast numbers of functional interaction modules known as short linear motifs (SLiMs). Here, we present SLiMSearch, a motif discovery tool that scans a motif consensus, representing the specificity determinants of a motif-binding domain, against a proteome to discover putative novel motif instances. SLiMSearch applies several distinct and complementary approaches exploiting the common properties of SLiMs to predict novel motifs. Consensus matches are annotated with overlapping sequence annotation, including feature information describing protein modular architecture, post-translational modification, structure, sequence variation and experimental characterisation of functional regions. Discriminatory motif attributes such as conservation and accessibility are also calculated. In addition, SLiMSearch provides functional enrichment and evolutionary analysis tools. The enrichment tool analyses GO terms, keywords and interacting partner enrichment to indicate possible motif function. The evolutionary tool evaluates motif taxonomic range and the conservation of motif sequence context. Consensus matches can be filtered based on motif attributes such as accessibility and taxonomic range; or by the localisation, interacting partners or ontology annotation of the peptide-containing protein. SLiMSearch supports a range of species of experimental and therapeutic relevance and is available online at http://slim.ucd.ie/slimsearch/.
Collapse
Affiliation(s)
- Izabella Krystkowiak
- Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland.,UCD School of Medicine & Medical Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Norman E Davey
- Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland.,UCD School of Medicine & Medical Science, University College Dublin, Belfield, Dublin 4, Ireland
| |
Collapse
|
11
|
Abstract
All proteins end with a carboxyl terminus that has unique biophysical properties and is often disordered. Although there are examples of important C-termini functions, a more global role for the C-terminus is not yet established. In this review, we summarize research on C-termini, a unique region in proteins that cells exploit. Alternative splicing and proteolysis increase the diversity of proteins and peptides in cells with unique C-termini. The C-termini of proteins contain minimotifs, short peptides with an encoded function generally characterized as binding, posttranslational modifications, and trafficking. Many of these activities are specific to minimotifs on the C-terminus. Approximately 13% of C-termini in the human proteome have a known minimotif, and the majority, if not all of the remaining termini have conserved motifs inferring a function that remains to be discovered. C-termini, their predictions, and their functions are collated in the C-terminome, Proteus, and Terminus Oriented Protein Function INferred Database (TopFIND) database/web systems. Many C-termini are well conserved, and some have a known role in health and disease. We envision that this summary of C-termini will guide future investigation of their biochemical and physiological significance.
Collapse
Affiliation(s)
- Surbhi Sharma
- a Nevada Institute of Personalized Medicine and School of Life Sciences , University of Nevada , Las Vegas , NV , USA
| | - Martin R Schiller
- a Nevada Institute of Personalized Medicine and School of Life Sciences , University of Nevada , Las Vegas , NV , USA
| |
Collapse
|
12
|
Li Y, Maleki M, Carruthers NJ, Stemmer PM, Ngom A, Rueda L. The predictive performance of short-linear motif features in the prediction of calmodulin-binding proteins. BMC Bioinformatics 2018; 19:410. [PMID: 30453876 PMCID: PMC6245490 DOI: 10.1186/s12859-018-2378-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/21/2023] Open
Abstract
Background The prediction of calmodulin-binding (CaM-binding) proteins plays a very important role in the fields of biology and biochemistry, because the calmodulin protein binds and regulates a multitude of protein targets affecting different cellular processes. Computational methods that can accurately identify CaM-binding proteins and CaM-binding domains would accelerate research in calcium signaling and calmodulin function. Short-linear motifs (SLiMs), on the other hand, have been effectively used as features for analyzing protein-protein interactions, though their properties have not been utilized in the prediction of CaM-binding proteins. Results We propose a new method for the prediction of CaM-binding proteins based on both the total and average scores of known and new SLiMs in protein sequences using a new scoring method called sliding window scoring (SWS) as features for the prediction module. A dataset of 194 manually curated human CaM-binding proteins and 193 mitochondrial proteins have been obtained and used for testing the proposed model. The motif generation tool, Multiple EM for Motif Elucidation (MEME), has been used to obtain new motifs from each of the positive and negative datasets individually (the SM approach) and from the combined negative and positive datasets (the CM approach). Moreover, the wrapper criterion with random forest for feature selection (FS) has been applied followed by classification using different algorithms such as k-nearest neighbors (k-NN), support vector machines (SVM), naive Bayes (NB) and random forest (RF). Conclusions Our proposed method shows very good prediction results and demonstrates how information contained in SLiMs is highly relevant in predicting CaM-binding proteins. Further, three new CaM-binding motifs have been computationally selected and biologically validated in this study, and which can be used for predicting CaM-binding proteins. Electronic supplementary material The online version of this article (10.1186/s12859-018-2378-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yixun Li
- School of Computer Science, University of Windsor, Windsor, Ontario, Canada
| | - Mina Maleki
- School of Computer Science, University of Windsor, Windsor, Ontario, Canada
| | | | - Paul M Stemmer
- Inst. of Env. Health Sci., Wayne State University, Detroit, MI, USA
| | - Alioune Ngom
- School of Computer Science, University of Windsor, Windsor, Ontario, Canada
| | - Luis Rueda
- School of Computer Science, University of Windsor, Windsor, Ontario, Canada.
| |
Collapse
|
13
|
Idrees S, Pérez-Bercoff Å, Edwards RJ. SLiM-Enrich: computational assessment of protein-protein interaction data as a source of domain-motif interactions. PeerJ 2018; 6:e5858. [PMID: 30402352 PMCID: PMC6215436 DOI: 10.7717/peerj.5858] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Accepted: 10/02/2018] [Indexed: 01/21/2023] Open
Abstract
Many important cellular processes involve protein–protein interactions (PPIs) mediated by a Short Linear Motif (SLiM) in one protein interacting with a globular domain in another. Despite their significance, these domain-motif interactions (DMIs) are typically low affinity, which makes them challenging to identify by classical experimental approaches, such as affinity pulldown mass spectrometry (AP-MS) and yeast two-hybrid (Y2H). DMIs are generally underrepresented in PPI networks as a result. A number of computational methods now exist to predict SLiMs and/or DMIs from experimental interaction data but it is yet to be established how effective different PPI detection methods are for capturing these low affinity SLiM-mediated interactions. Here, we introduce a new computational pipeline (SLiM-Enrich) to assess how well a given source of PPI data captures DMIs and thus, by inference, how useful that data should be for SLiM discovery. SLiM-Enrich interrogates a PPI network for pairs of interacting proteins in which the first protein is known or predicted to interact with the second protein via a DMI. Permutation tests compare the number of known/predicted DMIs to the expected distribution if the two sets of proteins are randomly associated. This provides an estimate of DMI enrichment within the data and the false positive rate for individual DMIs. As a case study, we detect significant DMI enrichment in a high-throughput Y2H human PPI study. SLiM-Enrich analysis supports Y2H data as a source of DMIs and highlights the high false positive rates associated with naïve DMI prediction. SLiM-Enrich is available as an R Shiny app. The code is open source and available via a GNU GPL v3 license at: https://github.com/slimsuite/SLiMEnrich. A web server is available at: http://shiny.slimsuite.unsw.edu.au/SLiMEnrich/.
Collapse
Affiliation(s)
- Sobia Idrees
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Åsa Pérez-Bercoff
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Richard J Edwards
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
14
|
Rak MA, Buehler J, Zeltzer S, Reitsma J, Molina B, Terhune S, Goodrum F. Human Cytomegalovirus UL135 Interacts with Host Adaptor Proteins To Regulate Epidermal Growth Factor Receptor and Reactivation from Latency. J Virol 2018; 92:e00919-18. [PMID: 30089695 PMCID: PMC6158428 DOI: 10.1128/jvi.00919-18] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Accepted: 07/27/2018] [Indexed: 01/03/2023] Open
Abstract
Human cytomegalovirus, HCMV, is a betaherpesvirus that establishes a lifelong latent infection in its host that is marked by recurrent episodes of reactivation. The molecular mechanisms by which the virus and host regulate entry into and exit from latency remain poorly understood. We have previously reported that UL135 is critical for reactivation, functioning in part by overcoming suppressive effects of the latency determinant UL138 We have demonstrated a role for UL135 in diminishing cell surface levels and targeting epidermal growth factor receptor (EGFR) for turnover. The attenuation of EGFR signaling promotes HCMV reactivation in combination with cellular differentiation. In this study, we sought to define the mechanisms by which UL135 functions in regulating EGFR turnover and viral reactivation. Screens to identify proteins interacting with pUL135 identified two host adaptor proteins, CIN85 and Abi-1, with overlapping activities in regulating EGFR levels in the cell. We mapped the amino acids in pUL135 necessary for interaction with Abi-1 and CIN85 and generated recombinant viruses expressing variants of pUL135 that do not interact with CIN85 or Abi-1. These recombinant viruses replicate in fibroblasts but are defective for reactivation in an experimental model for latency using primary CD34+ hematopoietic progenitor cells (HPCs). These UL135 variants have altered trafficking of EGFR and are defective in targeting EGFR for turnover. These studies demonstrate a requirement for pUL135 interactions with Abi-1 and CIN85 for regulation of EGFR and mechanistically link the regulation of EGFR to reactivation.IMPORTANCE Human cytomegalovirus (HCMV) establishes a lifelong latent infection in the human host. While the infection is typically asymptomatic in healthy individuals, HCMV infection poses life-threatening disease risk in immunocompromised individuals and is the leading cause of birth defects. Understanding how HCMV controls the lifelong latent infection and reactivation of replication from latency is critical to developing strategies to control HCMV disease. Here, we identify the host factors targeted by a viral protein that is required for reactivation. We define the importance of this virus-host interaction in reactivation from latency, providing new insights into the molecular underpinnings of HCMV latency and reactivation.
Collapse
Affiliation(s)
- Michael A Rak
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, Arizona, USA
| | - Jason Buehler
- BIO5 Institute, University of Arizona, Tucson, Arizona, USA
| | - Sebastian Zeltzer
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, Arizona, USA
| | - Justin Reitsma
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Belen Molina
- Department of Immunobiology, University of Arizona, Tucson, Arizona, USA
| | - Scott Terhune
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Felicia Goodrum
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, Arizona, USA
- BIO5 Institute, University of Arizona, Tucson, Arizona, USA
- Department of Immunobiology, University of Arizona, Tucson, Arizona, USA
- University of Arizona Center on Aging, Tucson, Arizona, USA
| |
Collapse
|
15
|
Sharma S, Young RJ, Chen J, Chen X, Oh EC, Schiller MR. Minimotifs dysfunction is pervasive in neurodegenerative disorders. ALZHEIMER'S & DEMENTIA (NEW YORK, N. Y.) 2018; 4:414-432. [PMID: 30225339 PMCID: PMC6139474 DOI: 10.1016/j.trci.2018.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Minimotifs are modular contiguous peptide sequences in proteins that are important for posttranslational modifications, binding to other molecules, and trafficking to specific subcellular compartments. Some molecular functions of proteins in cellular pathways can be predicted from minimotif consensus sequences identified through experimentation. While a role for minimotifs in regulating signal transduction and gene regulation during disease pathogenesis (such as infectious diseases and cancer) is established, the therapeutic use of minimotif mimetic drugs is limited. In this review, we discuss a general theme identifying a pervasive role of minimotifs in the pathomechanism of neurodegenerative diseases. Beyond their longstanding history in the genetics of familial neurodegeneration, minimotifs are also major players in neurotoxic protein aggregation, aberrant protein trafficking, and epigenetic regulation. Generalizing the importance of minimotifs in neurodegenerative diseases offers a new perspective for the future study of neurodegenerative mechanisms and the investigation of new therapeutics.
Collapse
Affiliation(s)
- Surbhi Sharma
- Nevada Institute of Personalized Medicine, Las Vegas, NV, USA
- School of Life Sciences, Las Vegas, NV, USA
| | - Richard J. Young
- Nevada Institute of Personalized Medicine, Las Vegas, NV, USA
- School of Life Sciences, Las Vegas, NV, USA
| | - Jingchun Chen
- Nevada Institute of Personalized Medicine, Las Vegas, NV, USA
| | - Xiangning Chen
- Nevada Institute of Personalized Medicine, Las Vegas, NV, USA
- Department of Psychology, Las Vegas, NV, USA
| | - Edwin C. Oh
- Nevada Institute of Personalized Medicine, Las Vegas, NV, USA
- School of Medicine, Las Vegas, NV, USA
| | - Martin R. Schiller
- Nevada Institute of Personalized Medicine, Las Vegas, NV, USA
- School of Life Sciences, Las Vegas, NV, USA
- School of Medicine, Las Vegas, NV, USA
| |
Collapse
|
16
|
Sarkar D, Jana T, Saha S. LMDIPred: A web-server for prediction of linear peptide sequences binding to SH3, WW and PDZ domains. PLoS One 2018; 13:e0200430. [PMID: 30001346 PMCID: PMC6042728 DOI: 10.1371/journal.pone.0200430] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 06/26/2018] [Indexed: 12/29/2022] Open
Abstract
Protein-peptide interactions form an important subset of the total protein interaction network in the cell and play key roles in signaling and regulatory networks, and in major biological processes like cellular localization, protein degradation, and immune response. In this work, we have described the LMDIPred web server, an online resource for generalized prediction of linear peptide sequences that may bind to three most prevalent and well-studied peptide recognition modules (PRMs)—SH3, WW and PDZ. We have developed support vector machine (SVM)-based prediction models that achieved maximum Matthews Correlation Coefficient (MCC) of 0.85 with an accuracy of 94.55% for SH3, MCC of 0.90 with an accuracy of 95.82% for WW, and MCC of 0.83 with an accuracy of 92.29% for PDZ binding peptides. LMDIPred output combines predictions from these SVM models with predictions using Position-Specific Scoring Matrices (PSSMs) and string-matching methods using known domain-binding motif instances and regular expressions. All of these methods were evaluated using a five-fold cross-validation technique on both balanced and unbalanced datasets, and also validated on independent datasets. LMDIPred aims to provide a preliminary bioinformatics platform for sequence-based prediction of probable binding sites for SH3, WW or PDZ domains.
Collapse
Affiliation(s)
| | - Tanmoy Jana
- Bioinformatics Centre, Bose Institute, Kolkata, India
| | - Sudipto Saha
- Bioinformatics Centre, Bose Institute, Kolkata, India
- * E-mail: ,
| |
Collapse
|
17
|
A Prominent Role of the Human Cytomegalovirus UL8 Glycoprotein in Restraining Proinflammatory Cytokine Production by Myeloid Cells at Late Times during Infection. J Virol 2018; 92:JVI.02229-17. [PMID: 29467314 DOI: 10.1128/jvi.02229-17] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 02/14/2018] [Indexed: 01/21/2023] Open
Abstract
Human cytomegalovirus (HCMV) persistence in infected individuals relies on a plethora of mechanisms to efficiently reduce host immune responses. To that end, HCMV uses a variety of gene products, some of which have not been identified yet. Here we characterized the UL8 gene, which consists of two exons, sharing the first with the HCMV RL11 family member UL7 UL8 is a transmembrane protein with an N-terminal immunoglobulin (Ig)-like domain in common with UL7 but with an extended stalk and a distinctive cytoplasmic tail. The UL8 open reading frame gives rise to a heavily glycosylated protein predominantly expressed on the cell surface, from where it can be partially endocytosed and subsequently degraded. Infections with UL8-tagged viruses indicated that UL8 was synthesized with late-phase kinetics. By virtue of its highly conserved Ig-like domain, this viral protein interacted with a surface molecule present on activated neutrophils. Notably, when ectopically expressed in THP-1 myeloid cells, UL8 was able to significantly reduce the production of a variety of proinflammatory cytokines. Mutations in UL8 indicated that this functional effect was mediated by the cell surface expression of its Ig-like domain. To investigate the impact of the viral protein in the infection context, we engineered HCMVs lacking the UL8 gene and demonstrated that UL8 decreases the release of a large number of proinflammatory factors at late times after infection of THP-1 cells. Our data indicate that UL8 may exert an immunosuppressive role key for HCMV survival in the host.IMPORTANCE HCMV is a major pathogen that causes life-threatening diseases and disabilities in infected newborns and immunocompromised individuals. Containing one of the largest genomes among all reported human viruses, HCMV encodes an impressive repertoire of gene products. However, the functions of a large proportion of them still remain unknown, a fact that complicates the design of new therapeutic approaches to prevent or treat HCMV-associated diseases. In this report, we have conducted an extensive study of UL8, one of the previously uncharacterized HCMV open reading frames. We found that the UL8 protein is expressed at late times postinfection and utilized by HCMV to reduce the production of proinflammatory factors by infected myeloid cells. Thus, the work presented here points to a key role of UL8 as a novel HCMV immune modulator capable of restraining host antiviral defenses.
Collapse
|
18
|
Erdős G, Szaniszló T, Pajkos M, Hajdu-Soltész B, Kiss B, Pál G, Nyitray L, Dosztányi Z. Novel linear motif filtering protocol reveals the role of the LC8 dynein light chain in the Hippo pathway. PLoS Comput Biol 2017; 13:e1005885. [PMID: 29240760 PMCID: PMC5746249 DOI: 10.1371/journal.pcbi.1005885] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 12/28/2017] [Accepted: 11/20/2017] [Indexed: 01/12/2023] Open
Abstract
Protein-protein interactions (PPIs) formed between short linear motifs and globular domains play important roles in many regulatory and signaling processes but are highly underrepresented in current protein-protein interaction databases. These types of interactions are usually characterized by a specific binding motif that captures the key amino acids shared among the interaction partners. However, the computational proteome-level identification of interaction partners based on the known motif is hindered by the huge number of randomly occurring matches from which biologically relevant motif hits need to be extracted. In this work, we established a novel bioinformatic filtering protocol to efficiently explore interaction network of a hub protein. We introduced a novel measure that enabled the optimization of the elements and parameter settings of the pipeline which was built from multiple sequence-based prediction methods. In addition, data collected from PPI databases and evolutionary analyses were also incorporated to further increase the biological relevance of the identified motif hits. The approach was applied to the dynein light chain LC8, a ubiquitous eukaryotic hub protein that has been suggested to be involved in motor-related functions as well as promoting the dimerization of various proteins by recognizing linear motifs in its partners. From the list of putative binding motifs collected by our protocol, several novel peptides were experimentally verified to bind LC8. Altogether 71 potential new motif instances were identified. The expanded list of LC8 binding partners revealed the evolutionary plasticity of binding partners despite the highly conserved binding interface. In addition, it also highlighted a novel, conserved function of LC8 in the upstream regulation of the Hippo signaling pathway. Beyond the LC8 system, our work also provides general guidelines that can be applied to explore the interaction network of other linear motif binding proteins or protein domains. Fine-tuning of many cellular processes relies on weak, transient protein-protein interactions. Such interactions often involve compact functional modules, called short linear motifs (SLiMs) that can bind to specific globular domains. SLiM-mediated interactions can carry out diverse molecular functions by targeting proteins to specific cellular locations, regulating the activity and binding preferences of proteins, or aiding the assembly of macromolecular complexes. The key to the function of SLiMs is their small size and highly flexible nature. At the same time, these properties make their experimental identification challenging. Consequently, only a small portion of SLiM-mediated interactions is currently known. This underlies the importance of novel computational methods that can reliably identify candidate sites involved in binding to linear motif binding domains. Here we present a novel bioinformatic approach that efficiently predicts new binding partners for SLiM-binding domains. We applied this method to the dynein light chain LC8, a protein that was already known to bind many partners in a wide range of organisms. With this method, we not only significantly expanded the interaction network of LC8, but also identified a novel function of LC8 in a highly important pathway controlling organ size in animals.
Collapse
Affiliation(s)
- Gábor Erdős
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Tamás Szaniszló
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Mátyás Pajkos
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Borbála Hajdu-Soltész
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Bence Kiss
- Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Gábor Pál
- Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - László Nyitray
- Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Zsuzsanna Dosztányi
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
- * E-mail:
| |
Collapse
|
19
|
Zhao W, Zhang Y, Yang S, Hao Y, Wang Z, Duan X. Analysis of two transcript isoforms of vacuolar ATPase subunit H in mouse and zebrafish. Gene 2017; 638:66-75. [PMID: 28970149 DOI: 10.1016/j.gene.2017.09.065] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 09/26/2017] [Accepted: 09/28/2017] [Indexed: 11/16/2022]
Abstract
ATP6V1H encodes the subunit H of vacuolar ATPase (V-ATPase) and has been recently proved to regulate osteoclast function. The alternative splicing of ATP6V1H gene results in two isoforms, and it is not clear whether and how the two isoforms function differently. In this report, we used bioinformatics methods to compare the differences of two isoforms in different species. The distributions and amounts of two isoforms were analyzed in eleven kinds of mouse tissues and mouse osteoclasts using RT-PCR, Q-PCR, western blot and immunohistochemical staining methods, respectively. In order to observe the in vivo biological differences of two isoforms during development, the zebrafish mRNA of two wild type atp6v1h transcripts as well as their mutant forms were also injected into zebrafish embryos, respectively. Bioinformatic analysis revealed that two isoforms were quite different in many ways, especially in protein size, internal space, phosphorylation state and H-bond binding. The amounts of two transcripts and the ratio of long and short transcript varied a lot from tissue to tissue or cell to cell, and osteoclasts were the cells only expressing long isoform among the tissues or cells we detected. The in vivo selective expression of two subunit H splice variants showed their different effects on the craniofacial development of zebrafish. The short isoform reduced the size of zebrafish head and did not play a complete function compared with the long isoform. We propose that long isoform of subunit H is necessary for the normal craniofacial bone development and the lack of short transcript might be necessary for the normal osteoclastic function.
Collapse
Affiliation(s)
- Wanmin Zhao
- State Key Laboratory of Military Stomatology, National Clinical Research Center for Oral Diseases, Shaanxi Key Laboratory of Oral Diseases, Department of Oral Biology, Clinic of Oral Rare and Genetic Diseases, School of Stomatology, The Fourth Military Medical University, Xi'an, 710032, People's Republic of China
| | - Yanli Zhang
- State Key Laboratory of Military Stomatology, National Clinical Research Center for Oral Diseases, Shaanxi Key Laboratory of Oral Diseases, Department of Oral Biology, Clinic of Oral Rare and Genetic Diseases, School of Stomatology, The Fourth Military Medical University, Xi'an, 710032, People's Republic of China
| | - Shaoqing Yang
- State Key Laboratory of Military Stomatology, National Clinical Research Center for Oral Diseases, Shaanxi Key Laboratory of Oral Diseases, Department of Oral Biology, Clinic of Oral Rare and Genetic Diseases, School of Stomatology, The Fourth Military Medical University, Xi'an, 710032, People's Republic of China
| | - Ying Hao
- State Key Laboratory of Military Stomatology, National Clinical Research Center for Oral Diseases, Shaanxi Key Laboratory of Oral Diseases, Department of Oral Biology, Clinic of Oral Rare and Genetic Diseases, School of Stomatology, The Fourth Military Medical University, Xi'an, 710032, People's Republic of China
| | - Zhe Wang
- State Key Laboratory of Military Stomatology, National Clinical Research Center for Oral Diseases, Shaanxi Key Laboratory of Oral Diseases, Department of Oral Biology, Clinic of Oral Rare and Genetic Diseases, School of Stomatology, The Fourth Military Medical University, Xi'an, 710032, People's Republic of China
| | - Xiaohong Duan
- State Key Laboratory of Military Stomatology, National Clinical Research Center for Oral Diseases, Shaanxi Key Laboratory of Oral Diseases, Department of Oral Biology, Clinic of Oral Rare and Genetic Diseases, School of Stomatology, The Fourth Military Medical University, Xi'an, 710032, People's Republic of China.
| |
Collapse
|
20
|
A bioinformatics pipeline to search functional motifs within whole-proteome data: a case study of poxviruses. Virus Genes 2016; 53:173-178. [PMID: 28000080 PMCID: PMC5357487 DOI: 10.1007/s11262-016-1416-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Accepted: 12/01/2016] [Indexed: 12/19/2022]
Abstract
Proteins harbor domains or short linear motifs, which facilitate their functions and interactions. Finding functional motifs in protein sequences could predict the putative cellular roles or characteristics of hypothetical proteins. In this study, we present Shetti-Motif, which is an interactive tool to (i) map UniProt and PROSITE flat files, (ii) search for multiple pre-defined consensus patterns or experimentally validated functional motifs in large datasets protein sequences (proteome-wide), (iii) search for motifs containing repeated residues (low-complexity regions, e.g., Leu-, SR-, PEST-rich motifs, etc.). As proof of principle, using this comparative proteomics pipeline, eleven proteomes encoded by member of Poxviridae family were searched against about 100 experimentally validated functional motifs. The closely related viruses and viruses infect the same host cells (e.g. vaccinia and variola viruses) show similar motif-containing proteins profile. The motifs encoded by these viruses are correlated, which explains why poxviruses are able to interact with wide range of host cells. In conclusion, this in silico analysis is useful to establish a dataset(s) or potential proteins for further investigation or compare between species.
Collapse
|
21
|
Regulation of a Spindle Positioning Factor at Kinetochores by SUMO-Targeted Ubiquitin Ligases. Dev Cell 2016; 36:415-27. [PMID: 26906737 DOI: 10.1016/j.devcel.2016.01.011] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Revised: 12/04/2015] [Accepted: 01/14/2016] [Indexed: 12/17/2022]
Abstract
Correct function of the mitotic spindle requires balanced interplay of kinetochore and astral microtubules that mediate chromosome segregation and spindle positioning, respectively. Errors therein can cause severe defects ranging from aneuploidy to developmental disorders. Here, we describe a protein degradation pathway that functionally links astral microtubules to kinetochores via regulation of a microtubule-associated factor. We show that the yeast spindle positioning protein Kar9 localizes not only to astral but also to kinetochore microtubules, where it becomes targeted for proteasomal degradation by the SUMO-targeted ubiquitin ligases (STUbLs) Slx5-Slx8. Intriguingly, this process does not depend on preceding sumoylation of Kar9 but rather requires SUMO-dependent recruitment of STUbLs to kinetochores. Failure to degrade Kar9 leads to defects in both chromosome segregation and spindle positioning. We propose that kinetochores serve as platforms to recruit STUbLs in a SUMO-dependent manner in order to ensure correct spindle function by regulating levels of microtubule-associated proteins.
Collapse
|
22
|
Abstract
Several theories for the origin of life have gained widespread acceptance, led by primordial soup, chemical evolution, metabolism first, and the RNA world. However, while new and existing theories often address a key step, there is less focus on a comprehensive abiogenic continuum leading to the last universal common ancestor. Herein, I present the "minimotif synthesis" hypothesis unifying select origin of life theories with new and revised steps. The hypothesis is based on first principles, on the concept of selection over long time scales, and on a stepwise progression toward complexity. The major steps are the thermodynamically-driven origination of extant molecular specificity emerging from primordial soup leading to the rise of peptide catalysts, and a cyclic feed-forward catalytic diversification of compound and peptides in the primordial soup. This is followed by degenerate, semi-partially conservative peptide replication to pass on catalytic knowledge to progeny protocells. At some point during this progression, the emergence of RNA and selection could drive the separation of catalytic and genetic functions, allowing peptides and proteins to permeate the catalytic space, and RNA to encode higher fidelity information transfer. Translation may have emerged from RNA template driven organization and successive ligation of activated amino acids as a predecessor to translation.
Collapse
Affiliation(s)
- Martin R Schiller
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada, Las Vegas, Nevada, USA
| |
Collapse
|
23
|
Sharma S, Toledo O, Hedden M, Lyon KF, Brooks SB, David RP, Limtong J, Newsome JM, Novakovic N, Rajasekaran S, Thapar V, Williams SR, Schiller MR. The Functional Human C-Terminome. PLoS One 2016; 11:e0152731. [PMID: 27050421 PMCID: PMC4822787 DOI: 10.1371/journal.pone.0152731] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Accepted: 03/18/2016] [Indexed: 11/24/2022] Open
Abstract
All translated proteins end with a carboxylic acid commonly called the C-terminus. Many short functional sequences (minimotifs) are located on or immediately proximal to the C-terminus. However, information about the function of protein C-termini has not been consolidated into a single source. Here, we built a new "C-terminome" database and web system focused on human proteins. Approximately 3,600 C-termini in the human proteome have a minimotif with an established molecular function. To help evaluate the function of the remaining C-termini in the human proteome, we inferred minimotifs identified by experimentation in rodent cells, predicted minimotifs based upon consensus sequence matches, and predicted novel highly repetitive sequences in C-termini. Predictions can be ranked by enrichment scores or Gene Evolutionary Rate Profiling (GERP) scores, a measurement of evolutionary constraint. By searching for new anchored sequences on the last 10 amino acids of proteins in the human proteome with lengths between 3-10 residues and up to 5 degenerate positions in the consensus sequences, we have identified new consensus sequences that predict instances in the majority of human genes. All of this information is consolidated into a database that can be accessed through a C-terminome web system with search and browse functions for minimotifs and human proteins. A known consensus sequence-based predicted function is assigned to nearly half the proteins in the human proteome. Weblink: http://cterminome.bio-toolkit.com.
Collapse
Affiliation(s)
- Surbhi Sharma
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Oniel Toledo
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Michael Hedden
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Kenneth F. Lyon
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Steven B. Brooks
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Roxanne P. David
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Justin Limtong
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Jacklyn M. Newsome
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Nemanja Novakovic
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Sanguthevar Rajasekaran
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut 06269–2155, United States of America
| | - Vishal Thapar
- Department of Pathology, Massachusetts General Hospital, Boston, Massachusetts 02114, United States of America
| | - Sean R. Williams
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Martin R. Schiller
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| |
Collapse
|
24
|
Sobhy H. Shetti, a simple tool to parse, manipulate and search large datasets of sequences. Microb Genom 2015; 1:e000035. [PMID: 28348820 PMCID: PMC5320677 DOI: 10.1099/mgen.0.000035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Accepted: 09/21/2015] [Indexed: 11/20/2022] Open
Abstract
Parsing and manipulating long and/or multiple protein or gene sequences can be a challenging process for experimental biologists and microbiologists lacking prior knowledge of bioinformatics and programming. Here we present a simple, easy, user-friendly and versatile tool to parse, manipulate and search within large datasets of long and multiple protein or gene sequences. The Shetti tool can be used to search for a sequence, species, protein/gene or pattern/motif. Moreover, it can also be used to construct a universal consensus or molecular signatures for proteins based on their physical characteristics. Shetti is an efficient and fast tool that can deal with large sets of long sequences efficiently. Shetti parses UniProt Knowledgebase and NCBI GenBank flat files and visualizes them as a table.
Collapse
Affiliation(s)
- Haitham Sobhy
- Dalian Institute of Chemical Physics, CAS, Dalian, PR China
| |
Collapse
|
25
|
Lyon KF, Strong CL, Schooler SG, Young RJ, Roy N, Ozar B, Bachmeier M, Rajasekaran S, Schiller MR. Natural variability of minimotifs in 1092 people indicates that minimotifs are targets of evolution. Nucleic Acids Res 2015; 43:6399-412. [PMID: 26068475 PMCID: PMC4513861 DOI: 10.1093/nar/gkv580] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Revised: 04/17/2015] [Accepted: 05/21/2015] [Indexed: 01/05/2023] Open
Abstract
Since the function of a short contiguous peptide minimotif can be introduced or eliminated by a single point mutation, these functional elements may be a source of human variation and a target of selection. We analyzed the variability of ∼300 000 minimotifs in 1092 human genomes from the 1000 Genomes Project. Most minimotifs have been purified by selection, with a 94% invariance, which supports important functional roles for minimotifs. Minimotifs are generally under negative selection, possessing high genomic evolutionary rate profiling (GERP) and sitewise likelihood-ratio (SLR) scores. Some are subject to neutral drift or positive selection, similar to coding regions. Most SNPs in minimotif were common variants, but with minor allele frequencies generally <10%. This was supported by low substation rates and few newly derived minimotifs. Several minimotif alleles showed different intercontinental and regional geographic distributions, strongly suggesting a role for minimotifs in adaptive evolution. We also note that 4% of PTM minimotif sites in histone tails were common variants, which has the potential to differentially affect DNA packaging among individuals. In conclusion, minimotifs are a source of functional genetic variation in the human population; thus, they are likely to be an important target of selection and evolution.
Collapse
Affiliation(s)
- Kenneth F Lyon
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Christy L Strong
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Steve G Schooler
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Richard J Young
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, USA
| | - Nervik Roy
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Brittany Ozar
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Mark Bachmeier
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Sanguthevar Rajasekaran
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, USA
| | - Martin R Schiller
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| |
Collapse
|
26
|
Wong A, Gehring C, Irving HR. Conserved Functional Motifs and Homology Modeling to Predict Hidden Moonlighting Functional Sites. Front Bioeng Biotechnol 2015; 3:82. [PMID: 26106597 PMCID: PMC4460814 DOI: 10.3389/fbioe.2015.00082] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 05/18/2015] [Indexed: 12/11/2022] Open
Abstract
Moonlighting functional centers within proteins can provide them with hitherto unrecognized functions. Here, we review how hidden moonlighting functional centers, which we define as binding sites that have catalytic activity or regulate protein function in a novel manner, can be identified using targeted bioinformatic searches. Functional motifs used in such searches include amino acid residues that are conserved across species and many of which have been assigned functional roles based on experimental evidence. Molecules that were identified in this manner seeking cyclic mononucleotide cyclases in plants are used as examples. The strength of this computational approach is enhanced when good homology models can be developed to test the functionality of the predicted centers in silico, which, in turn, increases confidence in the ability of the identified candidates to perform the predicted functions. Computational characterization of moonlighting functional centers is not diagnostic for catalysis but serves as a rapid screening method, and highlights testable targets from a potentially large pool of candidates for subsequent in vitro and in vivo experiments required to confirm the functionality of the predicted moonlighting centers.
Collapse
Affiliation(s)
- Aloysius Wong
- Division of Biological and Environmental Science and Engineering, King Abdullah University of Science and Technology , Thuwal , Saudi Arabia
| | - Chris Gehring
- Division of Biological and Environmental Science and Engineering, King Abdullah University of Science and Technology , Thuwal , Saudi Arabia
| | - Helen R Irving
- Monash Institute of Pharmaceutical Sciences, Monash University , Melbourne, VIC , Australia
| |
Collapse
|
27
|
Sarkar D, Jana T, Saha S. LMPID: a manually curated database of linear motifs mediating protein-protein interactions. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav014. [PMID: 25776024 PMCID: PMC4360622 DOI: 10.1093/database/bav014] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Linear motifs (LMs), used by a subset of all protein-protein interactions (PPIs), bind to globular receptors or domains and play an important role in signaling networks. LMPID (Linear Motif mediated Protein Interaction Database) is a manually curated database which provides comprehensive experimentally validated information about the LMs mediating PPIs from all organisms on a single platform. About 2200 entries have been compiled by detailed manual curation of PubMed abstracts, of which about 1000 LM entries were being annotated for the first time, as compared with the Eukaryotic LM resource. The users can submit their query through a user-friendly search page and browse the data in the alphabetical order of the bait gene names and according to the domains interacting with the LM. LMPID is freely accessible at http://bicresources.jcbose. ac.in/ssaha4/lmpid and contains 1750 unique LM instances found within 1181 baits interacting with 552 prey proteins. In summary, LMPID is an attempt to enrich the existing repertoire of resources available for studying the LMs implicated in PPIs and may help in understanding the patterns of LMs binding to a specific domain and develop prediction model to identify novel LMs specific to a domain and further able to predict inhibitors/modulators of PPI of interest.
Collapse
Affiliation(s)
| | - Tanmoy Jana
- Bioinformatics Centre, Bose Institute, Kolkata, India
| | - Sudipto Saha
- Bioinformatics Centre, Bose Institute, Kolkata, India
| |
Collapse
|
28
|
Yu Q, Huo H, Vitter JS, Huan J, Nekrich Y. An Efficient Exact Algorithm for the Motif Stem Search Problem over Large Alphabets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:384-397. [PMID: 26357225 DOI: 10.1109/tcbb.2014.2361668] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In recent years, there has been an increasing interest in planted (l, d) motif search (PMS) with applications to discovering significant segments in biological sequences. However, there has been little discussion about PMS over large alphabets. This paper focuses on motif stem search (MSS), which is recently introduced to search motifs on large-alphabet inputs. A motif stem is an l-length string with some wildcards. The goal of the MSS problem is to find a set of stems that represents a superset of all (l , d) motifs present in the input sequences, and the superset is expected to be as small as possible. The three main contributions of this paper are as follows: (1) We build motif stem representation more precisely by using regular expressions. (2) We give a method for generating all possible motif stems without redundant wildcards. (3) We propose an efficient exact algorithm, called StemFinder, for solving the MSS problem. Compared with the previous MSS algorithms, StemFinder runs much faster and reports fewer stems which represent a smaller superset of all (l, d) motifs. StemFinder is freely available at http://sites.google.com/site/feqond/stemfinder.
Collapse
|
29
|
Hutchins JRA. What's that gene (or protein)? Online resources for exploring functions of genes, transcripts, and proteins. Mol Biol Cell 2015; 25:1187-201. [PMID: 24723265 PMCID: PMC3982986 DOI: 10.1091/mbc.e13-10-0602] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The genomic era has enabled research projects that use approaches including genome-scale screens, microarray analysis, next-generation sequencing, and mass spectrometry-based proteomics to discover genes and proteins involved in biological processes. Such methods generate data sets of gene, transcript, or protein hits that researchers wish to explore to understand their properties and functions and thus their possible roles in biological systems of interest. Recent years have seen a profusion of Internet-based resources to aid this process. This review takes the viewpoint of the curious biologist wishing to explore the properties of protein-coding genes and their products, identified using genome-based technologies. Ten key questions are asked about each hit, addressing functions, phenotypes, expression, evolutionary conservation, disease association, protein structure, interactors, posttranslational modifications, and inhibitors. Answers are provided by presenting the latest publicly available resources, together with methods for hit-specific and data set-wide information retrieval, suited to any genome-based analytical technique and experimental species. The utility of these resources is demonstrated for 20 factors regulating cell proliferation. Results obtained using some of these are discussed in more depth using the p53 tumor suppressor as an example. This flexible and universally applicable approach for characterizing experimental hits helps researchers to maximize the potential of their projects for biological discovery.
Collapse
Affiliation(s)
- James R A Hutchins
- Institute of Human Genetics, Centre National de la Recherche Scientifique (CNRS), 34396 Montpellier, France
| |
Collapse
|
30
|
Bhowmick P, Guharoy M, Tompa P. Bioinformatics Approaches for Predicting Disordered Protein Motifs. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2015; 870:291-318. [PMID: 26387106 DOI: 10.1007/978-3-319-20164-1_9] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Short, linear motifs (SLiMs) in proteins are functional microdomains consisting of contiguous residue segments along the protein sequence, typically not more than 10 consecutive amino acids in length with less than 5 defined positions. Many positions are 'degenerate' thus offering flexibility in terms of the amino acid types allowed at those positions. Their short length and degenerate nature confers evolutionary plasticity meaning that SLiMs often evolve convergently. Further, SLiMs have a propensity to occur within intrinsically unstructured protein segments and this confers versatile functionality to unstructured regions of the proteome. SLiMs mediate multiple types of protein interactions based on domain-peptide recognition and guide functions including posttranslational modifications, subcellular localization of proteins, and ligand binding. SLiMs thus behave as modular interaction units that confer versatility to protein function and SLiM-mediated interactions are increasingly being recognized as therapeutic targets. In this chapter we start with a brief description about the properties of SLiMs and their interactions and then move on to discuss algorithms and tools including several web-based methods that enable the discovery of novel SLiMs (de novo motif discovery) as well as the prediction of novel occurrences of known SLiMs. Both individual amino acid sequences as well as sets of protein sequences can be scanned using these methods to obtain statistically overrepresented sequence patterns. Lists of putatively functional SLiMs are then assembled based on parameters such as evolutionary sequence conservation, disorder scores, structural data, gene ontology terms and other contextual information that helps to assess the functional credibility or significance of these motifs. These bioinformatics methods should certainly guide experiments aimed at motif discovery.
Collapse
Affiliation(s)
- Pallab Bhowmick
- VIB Department of Structural Biology, Vrije Universiteit Brussel (VUB), Building E, Pleinlaan 2, 1050, Brussels, Belgium
| | - Mainak Guharoy
- VIB Department of Structural Biology, Vrije Universiteit Brussel (VUB), Building E, Pleinlaan 2, 1050, Brussels, Belgium.
| | - Peter Tompa
- VIB Department of Structural Biology, Vrije Universiteit Brussel (VUB), Building E, Pleinlaan 2, 1050, Brussels, Belgium. .,Institute of Enzymology, Research Center of Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary.
| |
Collapse
|
31
|
Kelil A, Dubreuil B, Levy ED, Michnick SW. Fast and accurate discovery of degenerate linear motifs in protein sequences. PLoS One 2014; 9:e106081. [PMID: 25207816 PMCID: PMC4160167 DOI: 10.1371/journal.pone.0106081] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2013] [Accepted: 08/01/2014] [Indexed: 11/20/2022] Open
Abstract
Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound performs an exact and exhaustive enumeration of all motifs present in proteins of interest, including all of their degenerate forms, and scores the overrepresentation of each motif based on its occurrence in proteins of interest relative to a background (e.g., proteome) using the hypergeometric distribution. To assess MotifHound, we benchmarked it together with state-of-the-art algorithms. The benchmark consists of 11,880 sets of proteins from S. cerevisiae; in each set, we artificially spiked-in one motif varying in terms of three key parameters, (i) number of occurrences, (ii) length and (iii) the number of degenerate or “wildcard” positions. The benchmark enabled the evaluation of the impact of these three properties on the performance of the different algorithms. The results showed that MotifHound and SLiMFinder were the most accurate in detecting degenerate linear motifs. Interestingly, MotifHound was 15 to 20 times faster at comparable accuracy and performed best in the discovery of highly degenerate motifs. We complemented the benchmark by an analysis of proteins experimentally shown to bind the FUS1 SH3 domain from S. cerevisiae. Using the full-length protein partners as sole information, MotifHound recapitulated most experimentally determined motifs binding to the FUS1 SH3 domain. Moreover, these motifs exhibited properties typical of SH3 binding peptides, e.g., high intrinsic disorder and evolutionary conservation, despite the fact that none of these properties were used as prior information. MotifHound is available (http://michnick.bcm.umontreal.ca or http://tinyurl.com/motifhound) together with the benchmark that can be used as a reference to assess future developments in motif discovery.
Collapse
Affiliation(s)
- Abdellali Kelil
- Département de Biochimie and Centre Robert-Cedergren, Bio-Informatique et Génomique, Université de Montréal, Succursale Centre-Ville, Montreal, Quebec, Canada
| | - Benjamin Dubreuil
- Département de Biochimie and Centre Robert-Cedergren, Bio-Informatique et Génomique, Université de Montréal, Succursale Centre-Ville, Montreal, Quebec, Canada
| | - Emmanuel D. Levy
- Département de Biochimie and Centre Robert-Cedergren, Bio-Informatique et Génomique, Université de Montréal, Succursale Centre-Ville, Montreal, Quebec, Canada
- * E-mail: (EDL); (SWM)
| | - Stephen W. Michnick
- Département de Biochimie and Centre Robert-Cedergren, Bio-Informatique et Génomique, Université de Montréal, Succursale Centre-Ville, Montreal, Quebec, Canada
- * E-mail: (EDL); (SWM)
| |
Collapse
|
32
|
van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, Fuxreiter M, Gough J, Gsponer J, Jones D, Kim PM, Kriwacki R, Oldfield CJ, Pappu RV, Tompa P, Uversky VN, Wright P, Babu MM. Classification of intrinsically disordered regions and proteins. Chem Rev 2014; 114:6589-631. [PMID: 24773235 PMCID: PMC4095912 DOI: 10.1021/cr400525m] [Citation(s) in RCA: 1410] [Impact Index Per Article: 141.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2013] [Indexed: 12/11/2022]
Affiliation(s)
- Robin van der Lee
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
- Centre
for Molecular and Biomolecular Informatics, Radboud University Medical Centre, 6500 HB Nijmegen, The
Netherlands
| | - Marija Buljan
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Benjamin Lang
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Robert J. Weatheritt
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Gary W. Daughdrill
- Department
of Cell Biology, Microbiology, and Molecular Biology, University of South Florida, 3720 Spectrum Boulevard, Suite 321, Tampa, Florida 33612, United States
| | - A. Keith Dunker
- Department
of Biochemistry and Molecular Biology, Indiana
University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Monika Fuxreiter
- MTA-DE
Momentum Laboratory of Protein Dynamics, Department of Biochemistry
and Molecular Biology, University of Debrecen, H-4032 Debrecen, Nagyerdei krt 98, Hungary
| | - Julian Gough
- Department
of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, United Kingdom
| | - Joerg Gsponer
- Department
of Biochemistry and Molecular Biology, Centre for High-Throughput
Biology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - David
T. Jones
- Bioinformatics
Group, Department of Computer Science, University
College London, London, WC1E 6BT, United Kingdom
| | - Philip M. Kim
- Terrence Donnelly Centre for Cellular and Biomolecular Research, Department of Molecular
Genetics, and Department of Computer Science, University
of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Richard
W. Kriwacki
- Department
of Structural Biology, St. Jude Children’s
Research Hospital, Memphis, Tennessee 38105, United States
| | - Christopher J. Oldfield
- Department
of Biochemistry and Molecular Biology, Indiana
University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Rohit V. Pappu
- Department
of Biomedical Engineering and Center for Biological Systems Engineering, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Peter Tompa
- VIB Department
of Structural Biology, Vrije Universiteit
Brussel, Brussels, Belgium
- Institute
of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Vladimir N. Uversky
- Department
of Molecular Medicine and USF Health Byrd Alzheimer’s Research
Institute, Morsani College of Medicine, University of South Florida, Tampa, Florida 33612, United States
- Institute for Biological Instrumentation,
Russian Academy of Sciences, Pushchino,
Moscow Region, Russia
| | - Peter
E. Wright
- Department
of Integrative Structural and Computational Biology and Skaggs Institute
of Chemical Biology, The Scripps Research
Institute, 10550 North
Torrey Pines Road, La Jolla, California 92037, United States
| | - M. Madan Babu
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| |
Collapse
|
33
|
Van Roey K, Uyar B, Weatheritt RJ, Dinkel H, Seiler M, Budd A, Gibson TJ, Davey NE. Short Linear Motifs: Ubiquitous and Functionally Diverse Protein Interaction Modules Directing Cell Regulation. Chem Rev 2014; 114:6733-78. [DOI: 10.1021/cr400585q] [Citation(s) in RCA: 293] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Kim Van Roey
- Structural
and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Bora Uyar
- Structural
and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Robert J. Weatheritt
- MRC
Laboratory of Molecular Biology (LMB), Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, United Kingdom
| | - Holger Dinkel
- Structural
and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Markus Seiler
- Structural
and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Aidan Budd
- Structural
and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Toby J. Gibson
- Structural
and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Norman E. Davey
- Structural
and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
- Department
of Physiology, University of California, San Francisco, San Francisco, California 94143, United States
| |
Collapse
|
34
|
Horn H, Haslam N, Jensen LJ. DoReMi: context-based prioritization of linear motif matches. PeerJ 2014; 2:e315. [PMID: 24711967 PMCID: PMC3970808 DOI: 10.7717/peerj.315] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Accepted: 03/01/2014] [Indexed: 12/31/2022] Open
Abstract
Many protein domains bind to short peptide sequences, called linear motifs. Data on their sequence specificities is sparse, which is why biologists usually resort to basic pattern searches to identify new putative binding sites for experimental follow-up. Most motifs have poor specificity and prioritization of the matches is thus crucial when scanning a full proteome with a pattern. Here we present a generic method to prioritize motif occurrence predictions by using cellular contextual information. We take 2 parameters as input: the motif occurrences and one or more of the interacting domains. The potential hits are ranked based on how strongly the context network associates them with a protein containing one of the specified domains, which leads to an increased predictive performance. The method is available through a web interface at doremi.jensenlab.org, which allows for an easy application of the method. We show that this approach leads to improved predictions of binding partners for PDZ domains and the SUMO binding domain. This is consistent with the earlier observation that coupling sequence motifs with network information improves kinase-specific substrate predictions.
Collapse
Affiliation(s)
- Heiko Horn
- NNF Center for Protein Research, University of Copenhagen , Denmark
| | - Niall Haslam
- Complex and Adaptive Systems Laboratory, University College Dublin , Dublin , Ireland ; Conway Institute of Biomolecular and Biomedical Science, University College Dublin , Dublin , Ireland
| | - Lars Juhl Jensen
- NNF Center for Protein Research, University of Copenhagen , Denmark
| |
Collapse
|
35
|
Abstract
Intrinsically disordered proteins (IDPs) and IDP regions fail to form a stable structure, yet they exhibit biological activities. Their mobile flexibility and structural instability are encoded by their amino acid sequences. They recognize proteins, nucleic acids, and other types of partners; they accelerate interactions and chemical reactions between bound partners; and they help accommodate posttranslational modifications, alternative splicing, protein fusions, and insertions or deletions. Overall, IDP-associated biological activities complement those of structured proteins. Recently, there has been an explosion of studies on IDP regions and their functions, yet the discovery and investigation of these proteins have a long, mostly ignored history. Along with recent discoveries, we present several early examples and the mechanisms by which IDPs contribute to function, which we hope will encourage comprehensive discussion of IDPs and IDP regions in biochemistry textbooks. Finally, we propose future directions for IDP research.
Collapse
Affiliation(s)
- Christopher J Oldfield
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202; ,
| | | |
Collapse
|
36
|
Predicting binding within disordered protein regions to structurally characterised peptide-binding domains. PLoS One 2013; 8:e72838. [PMID: 24019881 PMCID: PMC3760854 DOI: 10.1371/journal.pone.0072838] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Accepted: 07/12/2013] [Indexed: 11/19/2022] Open
Abstract
Disordered regions of proteins often bind to structured domains, mediating interactions within and between proteins. However, it is difficult to identify a priori the short disordered regions involved in binding. We set out to determine if docking such peptide regions to peptide binding domains would assist in these predictions.We assembled a redundancy reduced dataset of SLiM (Short Linear Motif) containing proteins from the ELM database. We selected 84 sequences which had an associated PDB structures showing the SLiM bound to a protein receptor, where the SLiM was found within a 50 residue region of the protein sequence which was predicted to be disordered. First, we investigated the Vina docking scores of overlapping tripeptides from the 50 residue SLiM containing disordered regions of the protein sequence to the corresponding PDB domain. We found only weak discrimination of docking scores between peptides involved in binding and adjacent non-binding peptides in this context (AUC 0.58).Next, we trained a bidirectional recurrent neural network (BRNN) using as input the protein sequence, predicted secondary structure, Vina docking score and predicted disorder score. The results were very promising (AUC 0.72) showing that multiple sources of information can be combined to produce results which are clearly superior to any single source.We conclude that the Vina docking score alone has only modest power to define the location of a peptide within a larger protein region known to contain it. However, combining this information with other knowledge (using machine learning methods) clearly improves the identification of peptide binding regions within a protein sequence. This approach combining docking with machine learning is primarily a predictor of binding to peptide-binding sites, and is not intended as a predictor of specificity of binding to particular receptors.
Collapse
|
37
|
Jünger MA, Aebersold R. Mass spectrometry-driven phosphoproteomics: patterning the systems biology mosaic. WILEY INTERDISCIPLINARY REVIEWS-DEVELOPMENTAL BIOLOGY 2013; 3:83-112. [PMID: 24902836 DOI: 10.1002/wdev.121] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Protein phosphorylation is the best-studied posttranslational modification and plays a role in virtually every biological process. Phosphoproteomics is the analysis of protein phosphorylation on a proteome-wide scale, and mainly uses the same instrumentation and analogous strategies as conventional mass spectrometry (MS)-based proteomics. Measurements can be performed either in a discovery-type, also known as shotgun mode, or in a targeted manner which monitors a set of a priori known phosphopeptides, such as members of a signal transduction pathway, across biological samples. Here, we delineate the different experimental levels at which measures can be taken to optimize the scope, reliability, and information content of phosphoproteomic analyses. Various chromatographic and chemical protocols exist to physically enrich phosphopeptides from proteolytic digests of biological samples. Subsequent mass spectrometric analysis revolves around peptide ion fragmentation to generate sequence information and identify the backbone sequence of phosphopeptides as well as the phosphate group attachment site(s), and different modes of fragmentation like collision-induced dissociation (CID), electron transfer dissociation (ETD), and higher energy collisional dissociation (HCD) have been established for phosphopeptide analysis. Computational tools are important for the identification and quantification of phosphopeptides and mapping of phosphorylation sites, the deposition of large-scale phosphoproteome datasets in public databases, and the extraction of biologically meaningful information by data mining, integration with other data types, and descriptive or predictive modeling. Finally, we discuss how orthogonal experimental approaches can be employed to validate newly identified phosphorylation sites on a biochemical, mechanistic, and physiological level.
Collapse
Affiliation(s)
- Martin A Jünger
- Department of Biology, Institute of Molecular Systems Biology, Zurich, Switzerland
| | | |
Collapse
|
38
|
Mi T, Rajasekaran S. Efficient algorithms for biological stems search. BMC Bioinformatics 2013; 14:161. [PMID: 23679045 PMCID: PMC3679804 DOI: 10.1186/1471-2105-14-161] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2012] [Accepted: 05/06/2013] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Motifs are significant patterns in DNA, RNA, and protein sequences, which play an important role in biological processes and functions, like identification of open reading frames, RNA transcription, protein binding, etc. Several versions of the motif search problem have been studied in the literature. One such version is called the Planted Motif Search (PMS)or (l, d)-motif Search. PMS is known to be NP complete. The time complexities of most of the planted motif search algorithms depend exponentially on the alphabet size. Recently a new version of the motif search problem has been introduced by Kuksa and Pavlovic. We call this version as the Motif Stems Search (MSS) problem. A motif stem is an l-mer (for some relevant value of l)with some wildcard characters and hence corresponds to a set of l-mers (without wildcards), some of which are (l, d)-motifs. Kuksa and Pavlovic have presented an efficient algorithm to find motif stems for inputs from large alphabets. Ideally, the number of stems output should be as small as possible since the stems form a superset of the motifs. RESULTS In this paper we propose an efficient algorithm for MSS and evaluate it on both synthetic and real data. This evaluation reveals that our algorithm is much faster than Kuksa and Pavlovic's algorithm. CONCLUSIONS Our MSS algorithm outperforms the algorithm of Kuksa and Pavlovic in terms of the run time as well as the number of stems output. Specifically, the stems output by our algorithm form a proper (and much smaller)subset of the stems output by Kuksa and Pavlovic's algorithm.
Collapse
Affiliation(s)
- Tian Mi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | | |
Collapse
|
39
|
Hsu WL, Oldfield CJ, Xue B, Meng J, Huang F, Romero P, Uversky VN, Dunker AK. Exploring the binding diversity of intrinsically disordered proteins involved in one-to-many binding. Protein Sci 2013; 22:258-73. [PMID: 23233352 DOI: 10.1002/pro.2207] [Citation(s) in RCA: 143] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Revised: 12/01/2012] [Accepted: 12/03/2012] [Indexed: 11/09/2022]
Abstract
Molecular recognition features (MoRFs) are intrinsically disordered protein regions that bind to partners via disorder-to-order transitions. In one-to-many binding, a single MoRF binds to two or more different partners individually. MoRF-based one-to-many protein-protein interaction (PPI) examples were collected from the Protein Data Bank, yielding 23 MoRFs bound to 2-9 partners, with all pairs of same-MoRF partners having less than 25% sequence identity. Of these, 8 MoRFs were bound to 2-9 partners having completely different folds, whereas 15 MoRFs were bound to 2-5 partners having the same folds but with low sequence identities. For both types of partner variation, backbone and side chain torsion angle rotations were used to bring about the conformational changes needed to enable close fits between a single MoRF and distinct partners. Alternative splicing events (ASEs) and posttranslational modifications (PTMs) were also found to contribute to distinct partner binding. Because ASEs and PTMs both commonly occur in disordered regions, and because both ASEs and PTMs are often tissue-specific, these data suggest that MoRFs, ASEs, and PTMs may collaborate to alter PPI networks in different cell types. These data enlarge the set of carefully studied MoRFs that use inherent flexibility and that also use ASE-based and/or PTM-based surface modifications to enable the same disordered segment to selectively associate with two or more partners. The small number of residues involved in MoRFs and in their modifications by ASEs or PTMs may simplify the evolvability of signaling network diversity.
Collapse
Affiliation(s)
- Wei-Lun Hsu
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | | | | | | | | | | | | | | |
Collapse
|
40
|
Secondary structure, a missing component of sequence-based minimotif definitions. PLoS One 2012; 7:e49957. [PMID: 23236358 PMCID: PMC3517595 DOI: 10.1371/journal.pone.0049957] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2012] [Accepted: 10/15/2012] [Indexed: 12/27/2022] Open
Abstract
Minimotifs are short contiguous segments of proteins that have a known biological function. The hundreds of thousands of minimotifs discovered thus far are an important part of the theoretical understanding of the specificity of protein-protein interactions, posttranslational modifications, and signal transduction that occur in cells. However, a longstanding problem is that the different abstractions of the sequence definitions do not accurately capture the specificity, despite decades of effort by many labs. We present evidence that structure is an essential component of minimotif specificity, yet is not used in minimotif definitions. Our analysis of several known minimotifs as case studies, analysis of occurrences of minimotifs in structured and disordered regions of proteins, and review of the literature support a new model for minimotif definitions that includes sequence, structure, and function.
Collapse
|
41
|
Davey NE, Cowan JL, Shields DC, Gibson TJ, Coldwell MJ, Edwards RJ. SLiMPrints: conservation-based discovery of functional motif fingerprints in intrinsically disordered protein regions. Nucleic Acids Res 2012; 40:10628-41. [PMID: 22977176 PMCID: PMC3510515 DOI: 10.1093/nar/gks854] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Large portions of higher eukaryotic proteomes are intrinsically disordered, and abundant evidence suggests that these unstructured regions of proteins are rich in regulatory interaction interfaces. A major class of disordered interaction interfaces are the compact and degenerate modules known as short linear motifs (SLiMs). As a result of the difficulties associated with the experimental identification and validation of SLiMs, our understanding of these modules is limited, advocating the use of computational methods to focus experimental discovery. This article evaluates the use of evolutionary conservation as a discriminatory technique for motif discovery. A statistical framework is introduced to assess the significance of relatively conserved residues, quantifying the likelihood a residue will have a particular level of conservation given the conservation of the surrounding residues. The framework is expanded to assess the significance of groupings of conserved residues, a metric that forms the basis of SLiMPrints (short linear motif fingerprints), a de novo motif discovery tool. SLiMPrints identifies relatively overconstrained proximal groupings of residues within intrinsically disordered regions, indicative of putatively functional motifs. Finally, the human proteome is analysed to create a set of highly conserved putative motif instances, including a novel site on translation initiation factor eIF2A that may regulate translation through binding of eIF4E.
Collapse
Affiliation(s)
- Norman E Davey
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Baden-Württemberg 69117, Germany.
| | | | | | | | | | | |
Collapse
|
42
|
Gfeller D. Uncovering new aspects of protein interactions through analysis of specificity landscapes in peptide recognition domains. FEBS Lett 2012; 586:2764-72. [PMID: 22710167 DOI: 10.1016/j.febslet.2012.03.054] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2012] [Revised: 03/27/2012] [Accepted: 03/27/2012] [Indexed: 12/20/2022]
Abstract
Protein interactions underlie all biological processes. An important class of protein interactions, often observed in signaling pathways, consists of peptide recognition domains binding short protein segments on the surface of their target proteins. Recent developments in experimental techniques have uncovered many such interactions and shed new lights on their specificity. To analyze these data, novel computational methods have been introduced that can accurately describe the specificity landscape of peptide recognition domains and predict new interactions. Combining large-scale analysis of binding specificity data with structure-based modeling can further reveal new biological insights into the molecular recognition events underlying signaling pathways.
Collapse
Affiliation(s)
- David Gfeller
- Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, CH-1015 Lausanne, Switzerland.
| |
Collapse
|
43
|
Weatheritt RJ, Luck K, Petsalaki E, Davey NE, Gibson TJ. The identification of short linear motif-mediated interfaces within the human interactome. ACTA ACUST UNITED AC 2012; 28:976-82. [PMID: 22328783 PMCID: PMC3315716 DOI: 10.1093/bioinformatics/bts072] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Eukaryotic proteins are highly modular, containing multiple interaction interfaces that mediate binding to a network of regulators and effectors. Recent advances in high-throughput proteomics have rapidly expanded the number of known protein-protein interactions (PPIs); however, the molecular basis for the majority of these interactions remains to be elucidated. There has been a growing appreciation of the importance of a subset of these PPIs, namely those mediated by short linear motifs (SLiMs), particularly the canonical and ubiquitous SH2, SH3 and PDZ domain-binding motifs. However, these motif classes represent only a small fraction of known SLiMs and outside these examples little effort has been made, either bioinformatically or experimentally, to discover the full complement of motif instances. RESULTS In this article, interaction data are analysed to identify and characterize an important subset of PPIs, those involving SLiMs binding to globular domains. To do this, we introduce iELM, a method to identify interactions mediated by SLiMs and add molecular details of the interaction interfaces to both interacting proteins. The method identifies SLiM-mediated interfaces from PPI data by searching for known SLiM-domain pairs. This approach was applied to the human interactome to identify a set of high-confidence putative SLiM-mediated PPIs. AVAILABILITY iELM is freely available at http://elmint.embl.de CONTACT toby.gibson@embl.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- R J Weatheritt
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | | | | | | | | |
Collapse
|