1
|
Kotb H, Davey N. FaSTPACE: a fast and scalable tool for peptide alignment and consensus extraction. NAR Genom Bioinform 2024; 6:lqae103. [PMID: 39170861 PMCID: PMC11337127 DOI: 10.1093/nargab/lqae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 07/04/2024] [Accepted: 08/05/2024] [Indexed: 08/23/2024] Open
Abstract
Several novel high-throughput experimental techniques have been developed in recent years that generate large datasets of putative biologically functional peptides. However, many of the computational tools required to process these datasets have not yet been created. In this study, we introduce FaSTPACE, a fast and scalable computational tool to rapidly align short peptides and extract enriched specificity determinants. The tool aligns peptides in a pairwise manner to produce a position-specific global similarity matrix for each peptide. Peptides are realigned in an iterative manner scoring the updated alignment based on the global similarity matrices of the peptides and updating the global similarity matrices based on the new alignment. The method then iterates until the global similarity matrices converge. Finally, an alignment and consensus motif are extracted from the resulting global similarity matrices. The tool is the first to support custom weighting for the input peptides to satisfy the pressing need to include experimental attributes encoding peptide confidence in specificity determinant extraction. FaSTPACE exhibited state-of-the-art performance and accuracy when benchmarked against similar tools on motif datasets generated using curated peptides and high-throughput data from proteomic peptide phage display. FaSTPACE is available as an open-source Python package and a web server.
Collapse
Affiliation(s)
- Hazem M Kotb
- Division of Cancer Biology, The Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Norman E Davey
- Division of Cancer Biology, The Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| |
Collapse
|
2
|
Maheshwari N, Jermiin LS, Cotroneo C, Gordon SV, Shields DC. Insights into the production and evolution of lantibiotics from a computational analysis of peptides associated with the lanthipeptide cyclase domain. ROYAL SOCIETY OPEN SCIENCE 2024; 11:240491. [PMID: 39021782 PMCID: PMC11251773 DOI: 10.1098/rsos.240491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 06/12/2024] [Accepted: 06/13/2024] [Indexed: 07/20/2024]
Abstract
Lanthipeptides are a large group of ribosomally encoded peptides cyclized by thioether and methylene bridges, which include the lantibiotics, lanthipeptides with antimicrobial activity. There are over 100 experimentally characterized lanthipeptides, with at least 25 distinct cyclization bridging patterns. We set out to understand the evolutionary dynamics and diversity of lanthipeptides. We identified 977 peptides in 2785 bacterial genomes from short open-reading frames encoding lanthipeptide modifiable amino acids (C, S and T) that lay chromosomally adjacent to genes encoding proteins containing the cyclase domain. These appeared to be synthesized by both known and novel enzymatic combinations. Our predictor of bridging topology suggested 36 novel-predicted topologies, including a single-cysteine topology seen in 179 lanthionine or labionin containing peptides, which were enriched for histidine. Evidence that supported the relevance of the single-cysteine containing lanthipeptide precursors included the presence of the labionin motif among single cysteine peptides that clustered with labionin-associated synthetase domains, and the leader features of experimentally defined lanthipeptides that were shared with single cysteine predictions. Evolutionary rate variation among peptide subfamilies suggests that selection pressures for functional change differ among subfamilies. Lanthipeptides that have recently evolved specific novel features may represent a richer source of potential novel antimicrobials, since their target species may have had less time to evolve resistance.
Collapse
Affiliation(s)
- Nikunj Maheshwari
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| | - Lars S. Jermiin
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- Research School of Biology, Australian National University, Canberra, ACT, Australia
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland
| | - Chiara Cotroneo
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| | - Stephen V. Gordon
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- School of Veterinary Medicine, University College Dublin, Dublin, Ireland
| | - Denis C. Shields
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| |
Collapse
|
3
|
Ghosh D, Biswas A, Radhakrishna M. Advanced computational approaches to understand protein aggregation. BIOPHYSICS REVIEWS 2024; 5:021302. [PMID: 38681860 PMCID: PMC11045254 DOI: 10.1063/5.0180691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 03/18/2024] [Indexed: 05/01/2024]
Abstract
Protein aggregation is a widespread phenomenon implicated in debilitating diseases like Alzheimer's, Parkinson's, and cataracts, presenting complex hurdles for the field of molecular biology. In this review, we explore the evolving realm of computational methods and bioinformatics tools that have revolutionized our comprehension of protein aggregation. Beginning with a discussion of the multifaceted challenges associated with understanding this process and emphasizing the critical need for precise predictive tools, we highlight how computational techniques have become indispensable for understanding protein aggregation. We focus on molecular simulations, notably molecular dynamics (MD) simulations, spanning from atomistic to coarse-grained levels, which have emerged as pivotal tools in unraveling the complex dynamics governing protein aggregation in diseases such as cataracts, Alzheimer's, and Parkinson's. MD simulations provide microscopic insights into protein interactions and the subtleties of aggregation pathways, with advanced techniques like replica exchange molecular dynamics, Metadynamics (MetaD), and umbrella sampling enhancing our understanding by probing intricate energy landscapes and transition states. We delve into specific applications of MD simulations, elucidating the chaperone mechanism underlying cataract formation using Markov state modeling and the intricate pathways and interactions driving the toxic aggregate formation in Alzheimer's and Parkinson's disease. Transitioning we highlight how computational techniques, including bioinformatics, sequence analysis, structural data, machine learning algorithms, and artificial intelligence have become indispensable for predicting protein aggregation propensity and locating aggregation-prone regions within protein sequences. Throughout our exploration, we underscore the symbiotic relationship between computational approaches and empirical data, which has paved the way for potential therapeutic strategies against protein aggregation-related diseases. In conclusion, this review offers a comprehensive overview of advanced computational methodologies and bioinformatics tools that have catalyzed breakthroughs in unraveling the molecular basis of protein aggregation, with significant implications for clinical interventions, standing at the intersection of computational biology and experimental research.
Collapse
Affiliation(s)
- Deepshikha Ghosh
- Department of Biological Sciences and Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India
| | - Anushka Biswas
- Department of Chemical Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India
| | | |
Collapse
|
4
|
Pagano L, Simonetti L, Pennacchietti V, Toto A, Malagrinò F, Ivarsson Y, Gianni S. Exploring the short linear motif-mediated protein-protein interactions of CrkL through ProP-PD. Biochem Biophys Res Commun 2024; 703:149658. [PMID: 38387229 DOI: 10.1016/j.bbrc.2024.149658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 02/06/2024] [Indexed: 02/24/2024]
Abstract
Adaptor proteins play a pivotal role in cellular signaling mediating a multitude of protein-protein interaction critical for cellular homeostasis. Dysregulation of these interactions has been linked to the onset of various cancer pathologies and exploited by viral pathogens during host cell takeover. CrkL is an adaptor protein composed of an N-terminal SH2 domain followed by two SH3 domains that mediate interactions with diverse partners through the recognition of specific binding motifs. In this study, we employed proteomic peptide-phage display (ProP-PD) to comprehensively explore the short linear motif (SLiM)-based interactions of CrkL. Furthermore, we scrutinized how the binding affinity for selected peptides was influenced in the context of the full-length CrkL versus the isolated N-SH3 domain. Importantly, our results provided insights into SLiM-binding sites within previously reported interactors, as well as revealing novel human and viral ligands, expanding our understanding of the interactions mediated by CrkL and highlighting the significance of SLiM-based interactions in mediating adaptor protein function, with implications for cancer and viral pathologies.
Collapse
Affiliation(s)
- L Pagano
- Dipartimento di Scienze Biochimiche "A. Rossi Fanelli", Sapienza Universita di Roma, Laboratory Affiliated to Istituto Pasteur Italia - Fondazione Cenci Bolognetti, 00185, Rome, Italy
| | - L Simonetti
- Department of Chemistry - BMC, Husargatan 3, 751 23, Uppsala, Sweden
| | - V Pennacchietti
- Dipartimento di Scienze Biochimiche "A. Rossi Fanelli", Sapienza Universita di Roma, Laboratory Affiliated to Istituto Pasteur Italia - Fondazione Cenci Bolognetti, 00185, Rome, Italy
| | - A Toto
- Dipartimento di Scienze Biochimiche "A. Rossi Fanelli", Sapienza Universita di Roma, Laboratory Affiliated to Istituto Pasteur Italia - Fondazione Cenci Bolognetti, 00185, Rome, Italy
| | - F Malagrinò
- Dipartimento di Medicina clinica, sanità pubblica, scienze della vita e dell'ambiente, Università dell'Aquila, Piazzale Salvatore Tommasi 1, L'Aquila, Coppito, 67010, Italy
| | - Y Ivarsson
- Department of Chemistry - BMC, Husargatan 3, 751 23, Uppsala, Sweden.
| | - S Gianni
- Dipartimento di Scienze Biochimiche "A. Rossi Fanelli", Sapienza Universita di Roma, Laboratory Affiliated to Istituto Pasteur Italia - Fondazione Cenci Bolognetti, 00185, Rome, Italy.
| |
Collapse
|
5
|
Idrees S, Paudel KR, Sadaf T, Hansbro PM. Uncovering domain motif interactions using high-throughput protein-protein interaction detection methods. FEBS Lett 2024; 598:725-742. [PMID: 38439692 DOI: 10.1002/1873-3468.14841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 01/09/2024] [Accepted: 02/18/2024] [Indexed: 03/06/2024]
Abstract
Protein-protein interactions (PPIs) are often mediated by short linear motifs (SLiMs) in one protein and domain in another, known as domain-motif interactions (DMIs). During the past decade, SLiMs have been studied to find their role in cellular functions such as post-translational modifications, regulatory processes, protein scaffolding, cell cycle progression, cell adhesion, cell signalling and substrate selection for proteasomal degradation. This review provides a comprehensive overview of the current PPI detection techniques and resources, focusing on their relevance to capturing interactions mediated by SLiMs. We also address the challenges associated with capturing DMIs. Moreover, a case study analysing the BioGrid database as a source of DMI prediction revealed significant known DMI enrichment in different PPI detection methods. Overall, it can be said that current high-throughput PPI detection methods can be a reliable source for predicting DMIs.
Collapse
Affiliation(s)
- Sobia Idrees
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
- Centre for Inflammation, Centenary Institute and Faculty of Science, School of Life Sciences, University of Technology Sydney, Australia
| | - Keshav Raj Paudel
- Centre for Inflammation, Centenary Institute and Faculty of Science, School of Life Sciences, University of Technology Sydney, Australia
| | - Tayyaba Sadaf
- Centre for Inflammation, Centenary Institute and Faculty of Science, School of Life Sciences, University of Technology Sydney, Australia
| | - Philip M Hansbro
- Centre for Inflammation, Centenary Institute and Faculty of Science, School of Life Sciences, University of Technology Sydney, Australia
| |
Collapse
|
6
|
Idrees S, Paudel KR, Hansbro PM. Prediction of motif-mediated viral mimicry through the integration of host-pathogen interactions. Arch Microbiol 2024; 206:94. [PMID: 38334822 PMCID: PMC10858152 DOI: 10.1007/s00203-024-03832-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/01/2024] [Accepted: 01/02/2024] [Indexed: 02/10/2024]
Abstract
One of the mechanisms viruses use in hijacking host cellular machinery is mimicking Short Linear Motifs (SLiMs) in host proteins to maintain their life cycle inside host cells. In the face of the escalating volume of virus-host protein-protein interactions (vhPPIs) documented in databases; the accurate prediction of molecular mimicry remains a formidable challenge due to the inherent degeneracy of SLiMs. Consequently, there is a pressing need for computational methodologies to predict new instances of viral mimicry. Our present study introduces a DMI-de-novo pipeline, revealing that vhPPIs catalogued in the VirHostNet3.0 database effectively capture domain-motif interactions (DMIs). Notably, both affinity purification coupled mass spectrometry and yeast two-hybrid assays emerged as good approaches for delineating DMIs. Furthermore, we have identified new vhPPIs mediated by SLiMs across different viruses. Importantly, the de-novo prediction strategy facilitated the recognition of several potential mimicry candidates implicated in the subversion of host cellular proteins. The insights gleaned from this research not only enhance our comprehension of the mechanisms by which viruses co-opt host cellular machinery but also pave the way for the development of novel therapeutic interventions.
Collapse
Affiliation(s)
- Sobia Idrees
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia.
- Centre for Inflammation, School of Life Sciences, Faculty of Science, Centenary Institute and the University of Technology Sydney, Sydney, NSW, Australia.
| | - Keshav Raj Paudel
- Centre for Inflammation, School of Life Sciences, Faculty of Science, Centenary Institute and the University of Technology Sydney, Sydney, NSW, Australia
| | - Philip M Hansbro
- Centre for Inflammation, School of Life Sciences, Faculty of Science, Centenary Institute and the University of Technology Sydney, Sydney, NSW, Australia
| |
Collapse
|
7
|
Christie J, Anthony CM, Harish M, Mudartha D, Ud Din Farooqee SB, Venkatraman P. The interaction network of the proteasome assembly chaperone PSMD9 regulates proteostasis. FEBS J 2023; 290:5581-5604. [PMID: 37665644 DOI: 10.1111/febs.16948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 08/09/2023] [Accepted: 09/01/2023] [Indexed: 09/06/2023]
Abstract
Functional networks in cells are created by physical, genetic, and regulatory interactions. Mapping them and annotating their functions by available methods remains a challenge. We use affinity purification mass spectrometry (AP-MS) coupled with SLiMFinder to discern such a network involving 26S proteasome non-ATPase regulatory subunit 9 (PSMD9), a chaperone of proteasome assembly. Approximately 20% of proteins within the PSMD9 interactome carry a short linear motif (SLiM) of the type 'EXKK'. The binding of purified PSMD9 with the peptide sequence ERKK, proteins heterogeneous nuclear ribonucleoproteins A2/B1 (hnRNPA2B1; containing ERKK), and peroxiredoxin-6 (PRDX6; containing EAKK) provided proof of principle for this motif-driven network. The EXKK motif in the peptide primarily interacts with the coiled-coil N domain of PSMD9, a unique interaction not reported for any coiled-coil domain. PSMD9 knockout (KO) HEK293 cells experience endoplasmic reticulum (ER) stress and respond by increasing the unfolded protein response (UPR) and reducing the formation of aggresomes and lipid droplets. Trans-expression of PSMD9 in the KO cells rescues lipid droplet formation. Overexpression of PSMD9 in HEK293 cells results in reduced UPR, and increased lipid droplet and aggresome formation. The outcome argues for the prominent role of PSMD9 in maintaining proteostasis. Probable mechanisms involve the binding of PSMD9 to binding immunoglobulin protein (BIP/GRP78; containing EDKK), an endoplasmic reticulum chaperone and key regulator of the UPR, and fatty acid synthase (FASN; containing ELKK), involved in fatty acid synthesis/lipid biogenesis. We propose that PSMD9 acts as a buffer in the cellular milieu by moderating the UPR and enhancing aggresome formation to reduce stress-induced proteotoxicity. Akin to waves created in ponds that perpetuate to a distance, perturbing the levels of PSMD9 would cause ripples down the networks, affecting final reactions in the pathway, one of which is altered proteostasis.
Collapse
Affiliation(s)
- Joel Christie
- Protein Interactome Lab for Structural and Functional Biology, Advanced Centre for Treatment Research and Education in Cancer, Tata Memorial Centre, Navi Mumbai, India
- Homi Bhabha National Institute, Mumbai, India
| | - C Merlyn Anthony
- Protein Interactome Lab for Structural and Functional Biology, Advanced Centre for Treatment Research and Education in Cancer, Tata Memorial Centre, Navi Mumbai, India
- Homi Bhabha National Institute, Mumbai, India
| | - Mahalakshmi Harish
- Protein Interactome Lab for Structural and Functional Biology, Advanced Centre for Treatment Research and Education in Cancer, Tata Memorial Centre, Navi Mumbai, India
- Homi Bhabha National Institute, Mumbai, India
| | - Deepti Mudartha
- Protein Interactome Lab for Structural and Functional Biology, Advanced Centre for Treatment Research and Education in Cancer, Tata Memorial Centre, Navi Mumbai, India
| | - Sheikh Burhan Ud Din Farooqee
- Protein Interactome Lab for Structural and Functional Biology, Advanced Centre for Treatment Research and Education in Cancer, Tata Memorial Centre, Navi Mumbai, India
| | - Prasanna Venkatraman
- Protein Interactome Lab for Structural and Functional Biology, Advanced Centre for Treatment Research and Education in Cancer, Tata Memorial Centre, Navi Mumbai, India
- Homi Bhabha National Institute, Mumbai, India
| |
Collapse
|
8
|
Blankenship CM, Xie J, Benz C, Wang A, Ivarsson Y, Jiang J. Motif-dependent binding on the intervening domain regulates O-GlcNAc transferase. Nat Chem Biol 2023; 19:1423-1431. [PMID: 37653170 PMCID: PMC10723112 DOI: 10.1038/s41589-023-01422-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 08/11/2023] [Indexed: 09/02/2023]
Abstract
The modification of intracellular proteins with O-linked β-N-acetylglucosamine (O-GlcNAc) moieties is a highly dynamic process that spatiotemporally regulates nearly every important cellular program. Despite its significance, little is known about the substrate recognition and regulation modes of O-GlcNAc transferase (OGT), the primary enzyme responsible for O-GlcNAc addition. In this study, we identified the intervening domain (Int-D), a poorly understood protein fold found only in metazoan OGTs, as a specific regulator of OGT protein-protein interactions and substrate modification. Using proteomic peptide phage display (ProP-PD) coupled with structural, biochemical and cellular characterizations, we discovered a strongly enriched peptide motif, employed by the Int-D to facilitate specific O-GlcNAcylation. We further show that disruption of Int-D binding dysregulates important cellular programs, including response to nutrient deprivation and glucose metabolism. These findings illustrate a mode of OGT substrate recognition and offer key insights into the biological roles of this unique domain.
Collapse
Affiliation(s)
- Connor M Blankenship
- Pharmaceutical Sciences Division, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | - Jinshan Xie
- Pharmaceutical Sciences Division, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | - Caroline Benz
- Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden
| | - Ao Wang
- Pharmaceutical Sciences Division, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | - Ylva Ivarsson
- Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden
| | - Jiaoyang Jiang
- Pharmaceutical Sciences Division, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
9
|
Zhao B, Ghadermarzi S, Kurgan L. Comparative evaluation of AlphaFold2 and disorder predictors for prediction of intrinsic disorder, disorder content and fully disordered proteins. Comput Struct Biotechnol J 2023; 21:3248-3258. [PMID: 38213902 PMCID: PMC10782001 DOI: 10.1016/j.csbj.2023.06.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/31/2023] [Accepted: 06/01/2023] [Indexed: 01/13/2024] Open
Abstract
We expand studies of AlphaFold2 (AF2) in the context of intrinsic disorder prediction by comparing it against a broad selection of 20 accurate, popular and recently released disorder predictors. We use 25% larger benchmark dataset with 646 proteins and cover protein-level predictions of disorder content and fully disordered proteins. AF2-based disorder predictions secure a relatively high Area Under receiver operating characteristic Curve (AUC) of 0.77 and are statistically outperformed by several modern disorder predictors that secure AUCs around 0.8 with median runtime of about 20 s compared to 1200 s for AF2. Moreover, AF2 provides modestly accurate predictions of fully disordered proteins (F1 = 0.59 vs. 0.91 for the best disorder predictor) and disorder content (mean absolute error of 0.21 vs. 0.15). AF2 also generates statistically more accurate disorder predictions for about 20% of proteins that have relatively short sequences and a few disordered regions that tend to be located at the sequence termini, and which are absent of disordered protein-binding regions. Interestingly, AF2 and the most accurate disorder predictors rely on deep neural networks, suggesting that these models are useful for protein structure and disorder predictions.
Collapse
Affiliation(s)
- Bi Zhao
- Genomics program, College of Public Health, University of South Florida, Tampa, FL, United States
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
10
|
Computational prediction of disordered binding regions. Comput Struct Biotechnol J 2023; 21:1487-1497. [PMID: 36851914 PMCID: PMC9957716 DOI: 10.1016/j.csbj.2023.02.018] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 02/08/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023] Open
Abstract
One of the key features of intrinsically disordered regions (IDRs) is their ability to interact with a broad range of partner molecules. Multiple types of interacting IDRs were identified including molecular recognition fragments (MoRFs), short linear sequence motifs (SLiMs), and protein-, nucleic acids- and lipid-binding regions. Prediction of binding IDRs in protein sequences is gaining momentum in recent years. We survey 38 predictors of binding IDRs that target interactions with a diverse set of partners, such as peptides, proteins, RNA, DNA and lipids. We offer a historical perspective and highlight key events that fueled efforts to develop these methods. These tools rely on a diverse range of predictive architectures that include scoring functions, regular expressions, traditional and deep machine learning and meta-models. Recent efforts focus on the development of deep neural network-based architectures and extending coverage to RNA, DNA and lipid-binding IDRs. We analyze availability of these methods and show that providing implementations and webservers results in much higher rates of citations/use. We also make several recommendations to take advantage of modern deep network architectures, develop tools that bundle predictions of multiple and different types of binding IDRs, and work on algorithms that model structures of the resulting complexes.
Collapse
|
11
|
Raghavan M, Kalantar KL, Duarte E, Teyssier N, Takahashi S, Kung AF, Rajan JV, Rek J, Tetteh KKA, Drakeley C, Ssewanyana I, Rodriguez-Barraquer I, Greenhouse B, DeRisi JL. Antibodies to repeat-containing antigens in Plasmodium falciparum are exposure-dependent and short-lived in children in natural malaria infections. eLife 2023; 12:e81401. [PMID: 36790168 PMCID: PMC10005774 DOI: 10.7554/elife.81401] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Accepted: 02/14/2023] [Indexed: 02/16/2023] Open
Abstract
Protection against Plasmodium falciparum, which is primarily antibody-mediated, requires recurrent exposure to develop. The study of both naturally acquired limited immunity and vaccine induced protection against malaria remains critical for ongoing eradication efforts. Towards this goal, we deployed a customized P. falciparum PhIP-seq T7 phage display library containing 238,068 tiled 62-amino acid peptides, covering all known coding regions, including antigenic variants, to systematically profile antibody targets in 198 Ugandan children and adults from high and moderate transmission settings. Repeat elements - short amino acid sequences repeated within a protein - were significantly enriched in antibody targets. While breadth of responses to repeat-containing peptides was twofold higher in children living in the high versus moderate exposure setting, no such differences were observed for peptides without repeats, suggesting that antibody responses to repeat-containing regions may be more exposure dependent and/or less durable in children than responses to regions without repeats. Additionally, short motifs associated with seroreactivity were extensively shared among hundreds of antigens, potentially representing cross-reactive epitopes. PfEMP1 shared motifs with the greatest number of other antigens, partly driven by the diversity of PfEMP1 sequences. These data suggest that the large number of repeat elements and potential cross-reactive epitopes found within antigenic regions of P. falciparum could contribute to the inefficient nature of malaria immunity.
Collapse
Affiliation(s)
- Madhura Raghavan
- University of California, San FranciscoSan FranciscoUnited States
| | | | - Elias Duarte
- University of California, BerkeleyBerkeleyUnited States
| | - Noam Teyssier
- University of California, San FranciscoSan FranciscoUnited States
| | - Saki Takahashi
- University of California, San FranciscoSan FranciscoUnited States
| | - Andrew F Kung
- University of California, San FranciscoSan FranciscoUnited States
| | - Jayant V Rajan
- University of California, San FranciscoSan FranciscoUnited States
| | - John Rek
- Infectious Diseases Research CollaborationKampalaUganda
| | - Kevin KA Tetteh
- London School of Hygiene and Tropical MedicineLondonUnited Kingdom
| | - Chris Drakeley
- London School of Hygiene and Tropical MedicineLondonUnited Kingdom
| | - Isaac Ssewanyana
- Infectious Diseases Research CollaborationKampalaUganda
- London School of Hygiene and Tropical MedicineLondonUnited Kingdom
| | - Isabel Rodriguez-Barraquer
- University of California, San FranciscoSan FranciscoUnited States
- Chan Zuckerberg BiohubSan FranciscoUnited States
| | - Bryan Greenhouse
- University of California, San FranciscoSan FranciscoUnited States
- Chan Zuckerberg BiohubSan FranciscoUnited States
| | - Joseph L DeRisi
- University of California, San FranciscoSan FranciscoUnited States
- Chan Zuckerberg BiohubSan FranciscoUnited States
| |
Collapse
|
12
|
Han B, Ren C, Wang W, Li J, Gong X. Computational Prediction of Protein Intrinsically Disordered Region Related Interactions and Functions. Genes (Basel) 2023; 14:432. [PMID: 36833360 PMCID: PMC9956190 DOI: 10.3390/genes14020432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 02/02/2023] [Accepted: 02/05/2023] [Indexed: 02/11/2023] Open
Abstract
Intrinsically Disordered Proteins (IDPs) and Regions (IDRs) exist widely. Although without well-defined structures, they participate in many important biological processes. In addition, they are also widely related to human diseases and have become potential targets in drug discovery. However, there is a big gap between the experimental annotations related to IDPs/IDRs and their actual number. In recent decades, the computational methods related to IDPs/IDRs have been developed vigorously, including predicting IDPs/IDRs, the binding modes of IDPs/IDRs, the binding sites of IDPs/IDRs, and the molecular functions of IDPs/IDRs according to different tasks. In view of the correlation between these predictors, we have reviewed these prediction methods uniformly for the first time, summarized their computational methods and predictive performance, and discussed some problems and perspectives.
Collapse
Affiliation(s)
- Bingqing Han
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Chongjiao Ren
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Wenda Wang
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Jiashan Li
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Xinqi Gong
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
- Beijing Academy of Intelligence, Beijing 100083, China
| |
Collapse
|
13
|
Blankenship C, Xie J, Benz C, Wang A, Ivarsson Y, Jiang J. A novel binding site on the cryptic intervening domain is a motif-dependent regulator of O-GlcNAc transferase. RESEARCH SQUARE 2023:rs.3.rs-2531412. [PMID: 36778302 PMCID: PMC9915769 DOI: 10.21203/rs.3.rs-2531412/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The modification of intracellular proteins with O-linked β- N -acetylglucosamine (O-GlcNAc) moieties is a highly dynamic process that spatiotemporally regulates nearly every important cellular program. Despite its significance, little is known about the substrate recognition and regulation modes of O-GlcNAc transferase (OGT), the primary enzyme responsible for O-GlcNAc addition. In this study, we have identified the intervening domain (Int-D), a poorly understood protein fold found only in metazoan OGTs, as a specific regulator of OGT protein-protein interactions and substrate modification. Utilizing an innovative proteomic peptide phage display (ProP-PD) coupled with structural, biochemical, and cellular characterizations, we discovered a novel peptide motif, employed by the Int-D to facilitate specific O-GlcNAcylation. We further show that disruption of Int-D binding dysregulates important cellular programs including nutrient stress response and glucose metabolism. These findings illustrate a novel mode of OGT substrate recognition and offer the first insights into the biological roles of this unique domain.
Collapse
Affiliation(s)
| | | | | | - Ao Wang
- University of Wisconsin-Madison
| | | | - Jiaoyang Jiang
- Pharmaceutical Sciences Division, School of Pharmacy, University of Wisconsin-Madison
| |
Collapse
|
14
|
Wadie B, Kleshchevnikov V, Sandaltzopoulou E, Benz C, Petsalaki E. Use of viral motif mimicry improves the proteome-wide discovery of human linear motifs. Cell Rep 2022; 39:110764. [PMID: 35508127 DOI: 10.1016/j.celrep.2022.110764] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 02/09/2022] [Accepted: 04/08/2022] [Indexed: 12/16/2022] Open
Abstract
Linear motifs have an integral role in dynamic cell functions, including cell signaling. However, due to their small size, low complexity, and frequent mutations, identifying novel functional motifs poses a challenge. Viruses rely extensively on the molecular mimicry of cellular linear motifs. In this study, we apply systematic motif prediction combined with functional filters to identify human linear motifs convergently evolved also in viral proteins. We observe an increase in the sensitivity of motif prediction and improved enrichment in known instances. We identify >7,300 non-redundant motif instances at various confidence levels, 99 of which are supported by all functional and structural filters. Overall, we provide a pipeline to improve the identification of functional linear motifs from interactomics datasets and a comprehensive catalog of putative human motifs that can contribute to our understanding of the human domain-linear motif code and the associated mechanisms of viral interference.
Collapse
Affiliation(s)
- Bishoy Wadie
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Vitalii Kleshchevnikov
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Elissavet Sandaltzopoulou
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Caroline Benz
- Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden
| | - Evangelia Petsalaki
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK.
| |
Collapse
|
15
|
Ma Q, Zou K, Zhang Z, Yang F. GLTM: A Global-Local Attention LSTM Model to Locate Dimer Motif of Single-Pass Membrane Proteins. Front Genet 2022; 13:854571. [PMID: 35368690 PMCID: PMC8965067 DOI: 10.3389/fgene.2022.854571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 02/14/2022] [Indexed: 11/13/2022] Open
Abstract
Single-pass membrane proteins, which constitute up to 50% of all transmembrane proteins, are typically active in significant conformational changes, such as a dimer or other oligomers, which is essential for understanding the function of transmembrane proteins. Finding the key motifs of oligomers through experimental observation is a routine method used in the field to infer the potential conformations of other members of the transmembrane protein family. However, approaches based on experimental observation need to consume a lot of time and manpower costs; moreover, they are hard to reveal the potential motifs. A proposed approach is to build an accurate and efficient transmembrane protein oligomer prediction model to screen the key motifs. In this paper, an attention-based Global-Local structure LSTM model named GLTM is proposed to predict dimers and screen potential dimer motifs. Different from traditional motifs screening based on highly conserved sequence search frame, a self-attention mechanism has been employed in GLTM to locate the highest dimerization score of subsequence fragments and has been proven to locate most known dimer motifs well. The proposed GLTM can reach 97.5% accuracy on the benchmark dataset collected from Membranome2.0. The three characteristics of GLTM can be summarized as follows: First, the original sequence fragment was converted to a set of subsequences which having the similar length of known motifs, and this additional step can greatly enhance the capability of capturing motif pattern; Second, to solve the problem of sample imbalance, a novel data enhancement approach combining improved one-hot encoding with random subsequence windows has been proposed to improve the generalization capability of GLTM; Third, position penalization has been taken into account, which makes a self-attention mechanism focused on special TM fragments. The experimental results in this paper fully demonstrated that the proposed GLTM has a broad application perspective on the location of potential oligomer motifs, and is helpful for preliminary and rapid research on the conformational change of mutants.
Collapse
Affiliation(s)
- Quanchao Ma
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Kai Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Zhihai Zhang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Fan Yang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China.,Artificial Intelligence and Bioinformation Cognition Laboratory, Jiangxi Science and Technology Normal University, Nanchang, China
| |
Collapse
|
16
|
Bondos SE, Dunker AK, Uversky VN. Intrinsically disordered proteins play diverse roles in cell signaling. Cell Commun Signal 2022; 20:20. [PMID: 35177069 PMCID: PMC8851865 DOI: 10.1186/s12964-022-00821-7] [Citation(s) in RCA: 67] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 12/11/2021] [Indexed: 11/29/2022] Open
Abstract
Signaling pathways allow cells to detect and respond to a wide variety of chemical (e.g. Ca2+ or chemokine proteins) and physical stimuli (e.g., sheer stress, light). Together, these pathways form an extensive communication network that regulates basic cell activities and coordinates the function of multiple cells or tissues. The process of cell signaling imposes many demands on the proteins that comprise these pathways, including the abilities to form active and inactive states, and to engage in multiple protein interactions. Furthermore, successful signaling often requires amplifying the signal, regulating or tuning the response to the signal, combining information sourced from multiple pathways, all while ensuring fidelity of the process. This sensitivity, adaptability, and tunability are possible, in part, due to the inclusion of intrinsically disordered regions in many proteins involved in cell signaling. The goal of this collection is to highlight the many roles of intrinsic disorder in cell signaling. Following an overview of resources that can be used to study intrinsically disordered proteins, this review highlights the critical role of intrinsically disordered proteins for signaling in widely diverse organisms (animals, plants, bacteria, fungi), in every category of cell signaling pathway (autocrine, juxtacrine, intracrine, paracrine, and endocrine) and at each stage (ligand, receptor, transducer, effector, terminator) in the cell signaling process. Thus, a cell signaling pathway cannot be fully described without understanding how intrinsically disordered protein regions contribute to its function. The ubiquitous presence of intrinsic disorder in different stages of diverse cell signaling pathways suggest that more mechanisms by which disorder modulates intra- and inter-cell signals remain to be discovered.
Collapse
Affiliation(s)
- Sarah E. Bondos
- Department of Molecular and Cellular Medicine, Texas A&M Health Science Center, College Station, TX 77843 USA
| | - A. Keith Dunker
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN 46202 USA
| | - Vladimir N. Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer’s Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33612 USA
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”, Pushchino, Moscow Region, Russia 142290
| |
Collapse
|
17
|
Mier P, Andrade-Navarro MA. Avoided motifs: short amino acid strings missing from protein datasets. Biol Chem 2021; 402:945-951. [PMID: 33660494 DOI: 10.1515/hsz-2020-0383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 02/19/2021] [Indexed: 11/15/2022]
Abstract
According to the amino acid composition of natural proteins, it could be expected that all possible sequences of three or four amino acids will occur at least once in large protein datasets purely by chance. However, in some species or cellular context, specific short amino acid motifs are missing due to unknown reasons. We describe these as Avoided Motifs, short amino acid combinations missing from biological sequences. Here we identify 209 human and 154 bacterial Avoided Motifs of length four amino acids, and discuss their possible functionality according to their presence in other species. Furthermore, we determine two Avoided Motifs of length three amino acids in human proteins specifically located in the cytoplasm, and two more in secreted proteins. Our results support the hypothesis that the characterization of Avoided Motifs in particular contexts can provide us with information about functional motifs, pointing to a new approach in the use of molecular sequences for the discovery of protein function.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, D-55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, D-55128 Mainz, Germany
| |
Collapse
|
18
|
Candelise N, Scaricamazza S, Salvatori I, Ferri A, Valle C, Manganelli V, Garofalo T, Sorice M, Misasi R. Protein Aggregation Landscape in Neurodegenerative Diseases: Clinical Relevance and Future Applications. Int J Mol Sci 2021; 22:ijms22116016. [PMID: 34199513 PMCID: PMC8199687 DOI: 10.3390/ijms22116016] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Revised: 05/28/2021] [Accepted: 05/29/2021] [Indexed: 12/13/2022] Open
Abstract
Intrinsic disorder is a natural feature of polypeptide chains, resulting in the lack of a defined three-dimensional structure. Conformational changes in intrinsically disordered regions of a protein lead to unstable β-sheet enriched intermediates, which are stabilized by intermolecular interactions with other β-sheet enriched molecules, producing stable proteinaceous aggregates. Upon misfolding, several pathways may be undertaken depending on the composition of the amino acidic string and the surrounding environment, leading to different structures. Accumulating evidence is suggesting that the conformational state of a protein may initiate signalling pathways involved both in pathology and physiology. In this review, we will summarize the heterogeneity of structures that are produced from intrinsically disordered protein domains and highlight the routes that lead to the formation of physiological liquid droplets as well as pathogenic aggregates. The most common proteins found in aggregates in neurodegenerative diseases and their structural variability will be addressed. We will further evaluate the clinical relevance and future applications of the study of the structural heterogeneity of protein aggregates, which may aid the understanding of the phenotypic diversity observed in neurodegenerative disorders.
Collapse
Affiliation(s)
- Niccolò Candelise
- Fondazione Santa Lucia IRCCS, c/o CERC, 00143 Rome, Italy; (S.S.); (I.S.); (A.F.); (C.V.)
- Institute of Translational Pharmacology, National Research Council, 00133 Rome, Italy
- Correspondence: ; Tel.: +39-338-891-2668
| | - Silvia Scaricamazza
- Fondazione Santa Lucia IRCCS, c/o CERC, 00143 Rome, Italy; (S.S.); (I.S.); (A.F.); (C.V.)
| | - Illari Salvatori
- Fondazione Santa Lucia IRCCS, c/o CERC, 00143 Rome, Italy; (S.S.); (I.S.); (A.F.); (C.V.)
- Department of Experimental Medicine, University of Rome “La Sapienza”, 00161 Rome, Italy; (V.M.); (T.G.); (M.S.); (R.M.)
| | - Alberto Ferri
- Fondazione Santa Lucia IRCCS, c/o CERC, 00143 Rome, Italy; (S.S.); (I.S.); (A.F.); (C.V.)
- Institute of Translational Pharmacology, National Research Council, 00133 Rome, Italy
| | - Cristiana Valle
- Fondazione Santa Lucia IRCCS, c/o CERC, 00143 Rome, Italy; (S.S.); (I.S.); (A.F.); (C.V.)
- Institute of Translational Pharmacology, National Research Council, 00133 Rome, Italy
| | - Valeria Manganelli
- Department of Experimental Medicine, University of Rome “La Sapienza”, 00161 Rome, Italy; (V.M.); (T.G.); (M.S.); (R.M.)
| | - Tina Garofalo
- Department of Experimental Medicine, University of Rome “La Sapienza”, 00161 Rome, Italy; (V.M.); (T.G.); (M.S.); (R.M.)
| | - Maurizio Sorice
- Department of Experimental Medicine, University of Rome “La Sapienza”, 00161 Rome, Italy; (V.M.); (T.G.); (M.S.); (R.M.)
| | - Roberta Misasi
- Department of Experimental Medicine, University of Rome “La Sapienza”, 00161 Rome, Italy; (V.M.); (T.G.); (M.S.); (R.M.)
| |
Collapse
|
19
|
Puccio S, Grillo G, Consiglio A, Soluri MF, Sblattero D, Cotella D, Santoro C, Liuni S, Bellis GD, Lugli E, Peano C, Licciulli F. InteractomeSeq: a web server for the identification and profiling of domains and epitopes from phage display and next generation sequencing data. Nucleic Acids Res 2020; 48:W200-W207. [PMID: 32402076 PMCID: PMC7319578 DOI: 10.1093/nar/gkaa363] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 04/16/2020] [Accepted: 05/05/2020] [Indexed: 01/03/2023] Open
Abstract
High-Throughput Sequencing technologies are transforming many research fields, including the analysis of phage display libraries. The phage display technology coupled with deep sequencing was introduced more than a decade ago and holds the potential to circumvent the traditional laborious picking and testing of individual phage rescued clones. However, from a bioinformatics point of view, the analysis of this kind of data was always performed by adapting tools designed for other purposes, thus not considering the noise background typical of the 'interactome sequencing' approach and the heterogeneity of the data. InteractomeSeq is a web server allowing data analysis of protein domains ('domainome') or epitopes ('epitome') from either Eukaryotic or Prokaryotic genomic phage libraries generated and selected by following an Interactome sequencing approach. InteractomeSeq allows users to upload raw sequencing data and to obtain an accurate characterization of domainome/epitome profiles after setting the parameters required to tune the analysis. The release of this tool is relevant for the scientific and clinical community, because InteractomeSeq will fill an existing gap in the field of large-scale biomarkers profiling, reverse vaccinology, and structural/functional studies, thus contributing essential information for gene annotation or antigen identification. InteractomeSeq is freely available at https://InteractomeSeq.ba.itb.cnr.it/.
Collapse
Affiliation(s)
- Simone Puccio
- Laboratory of Translational Immunology, Humanitas Clinical and Research Center, IRCCS, Rozzano (Milan), 20089, Italy
| | - Giorgio Grillo
- Institute for Biomedical Technologies, National Research Council, Bari 70100, Italy
| | - Arianna Consiglio
- Institute for Biomedical Technologies, National Research Council, Bari 70100, Italy
| | - Maria Felicia Soluri
- Department of Health Sciences & Center for TranslationalResearch on Autoimmune and Allergic Disease (CAAD), Università del Piemonte Orientale, Novara 28100, Italy
| | - Daniele Sblattero
- Department of Life Sciences, University of Trieste, Trieste 34100, Italy
| | - Diego Cotella
- Department of Health Sciences & Center for TranslationalResearch on Autoimmune and Allergic Disease (CAAD), Università del Piemonte Orientale, Novara 28100, Italy
| | - Claudio Santoro
- Department of Health Sciences & Center for TranslationalResearch on Autoimmune and Allergic Disease (CAAD), Università del Piemonte Orientale, Novara 28100, Italy
| | - Sabino Liuni
- Institute for Biomedical Technologies, National Research Council, Bari 70100, Italy
| | - Gianluca De Bellis
- Institute for Biomedical Technologies, National Research Council, Segrate (Milan) 20090, Italy
| | - Enrico Lugli
- Laboratory of Translational Immunology, Humanitas Clinical and Research Center, IRCCS, Rozzano (Milan), 20089, Italy.,Humanitas Flow Cytometry Core, Humanitas Clinical and Research Center, IRCCS, Rozzano (Milan) 20089, Italy
| | - Clelia Peano
- Institute of Genetic and Biomedical Research, UoS Milan, National Research Council, Rozzano (Milan) 20089, Italy.,Genomic Unit, Humanitas Clinical and Research Center, IRCCS,Rozzano (Milan) 20089, Italy
| | - Flavio Licciulli
- Institute for Biomedical Technologies, National Research Council, Bari 70100, Italy
| |
Collapse
|
20
|
Wigington CP, Roy J, Damle NP, Yadav VK, Blikstad C, Resch E, Wong CJ, Mackay DR, Wang JT, Krystkowiak I, Bradburn DA, Tsekitsidou E, Hong SH, Kaderali MA, Xu SL, Stearns T, Gingras AC, Ullman KS, Ivarsson Y, Davey NE, Cyert MS. Systematic Discovery of Short Linear Motifs Decodes Calcineurin Phosphatase Signaling. Mol Cell 2020; 79:342-358.e12. [PMID: 32645368 DOI: 10.1016/j.molcel.2020.06.029] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Revised: 03/24/2020] [Accepted: 05/26/2020] [Indexed: 12/17/2022]
Abstract
Short linear motifs (SLiMs) drive dynamic protein-protein interactions essential for signaling, but sequence degeneracy and low binding affinities make them difficult to identify. We harnessed unbiased systematic approaches for SLiM discovery to elucidate the regulatory network of calcineurin (CN)/PP2B, the Ca2+-activated phosphatase that recognizes LxVP and PxIxIT motifs. In vitro proteome-wide detection of CN-binding peptides, in vivo SLiM-dependent proximity labeling, and in silico modeling of motif determinants uncovered unanticipated CN interactors, including NOTCH1, which we establish as a CN substrate. Unexpectedly, CN shows SLiM-dependent proximity to centrosomal and nuclear pore complex (NPC) proteins-structures where Ca2+ signaling is largely uncharacterized. CN dephosphorylates human and yeast NPC proteins and promotes accumulation of a nuclear transport reporter, suggesting conserved NPC regulation by CN. The CN network assembled here provides a resource to investigate Ca2+ and CN signaling and demonstrates synergy between experimental and computational methods, establishing a blueprint for examining SLiM-based networks.
Collapse
Affiliation(s)
| | - Jagoree Roy
- Department of Biology, Stanford University, Stanford, CA, USA
| | - Nikhil P Damle
- Department of Biology, Stanford University, Stanford, CA, USA
| | - Vikash K Yadav
- Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden
| | - Cecilia Blikstad
- Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden
| | - Eduard Resch
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Branch for Translational Medicine and Pharmacology TMP, Theodor-Stern-Kai 7, 60596 Frankfurt am Main, Germany
| | - Cassandra J Wong
- Lunenfeld-Tanenbaum Research Institute at Mount Sinai Hospital, University of Toronto, Toronto, ON, Canada
| | - Douglas R Mackay
- Department of Oncological Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, USA
| | - Jennifer T Wang
- Department of Biology, Stanford University, Stanford, CA, USA
| | - Izabella Krystkowiak
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland
| | | | | | - Su Hyun Hong
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA
| | - Malika Amyn Kaderali
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA
| | - Shou-Ling Xu
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA
| | - Tim Stearns
- Department of Biology, Stanford University, Stanford, CA, USA
| | - Anne-Claude Gingras
- Lunenfeld-Tanenbaum Research Institute at Mount Sinai Hospital, University of Toronto, Toronto, ON, Canada; Department of Molecular Genetics, University of Toronto, Toronto, M5S 3H7 ON, Canada
| | - Katharine S Ullman
- Department of Oncological Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, USA
| | - Ylva Ivarsson
- Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden
| | - Norman E Davey
- Division of Cancer Biology, The Institute of Cancer Research, 237 Fullham Road, London SW3 6JB, UK
| | - Martha S Cyert
- Department of Biology, Stanford University, Stanford, CA, USA.
| |
Collapse
|
21
|
Maranhão AQ, Silva HM, da Silva WMC, França RKA, De Leo TC, Dias-Baruffi M, Burtet RT, Brigido MM. Discovering Selected Antibodies From Deep-Sequenced Phage-Display Antibody Library Using ATTILA. Bioinform Biol Insights 2020; 14:1177932220915240. [PMID: 32425512 PMCID: PMC7218273 DOI: 10.1177/1177932220915240] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 03/03/2020] [Indexed: 11/20/2022] Open
Abstract
Phage display is a powerful technique to select high-affinity antibodies for different purposes, including biopharmaceuticals. Next-generation sequencing (NGS) presented itself as a robust solution, making it possible to assess billions of sequences of the variable domains from selected sublibraries. Handling this process, a central difficulty is to find the selected clones. Here, we present the AutomaTed Tool For Immunoglobulin Analysis (ATTILA), a new tool to analyze and find the enriched variable domains throughout a biopanning experiment. The ATTILA is a workflow that combines publicly available tools and in-house programs and scripts to find the fold-change frequency of deeply sequenced amplicons generated from selected VH and VL domains. We analyzed the same human Fab library NGS data using ATTILA in 5 different experiments, as well as on 2 biopanning experiments regarding performance, accuracy, and output. These analyses proved to be suitable to assess library variability and to list the more enriched variable domains, as ATTILA provides a report with the amino acid sequence of each identified domain, along with its complementarity-determining regions (CDRs), germline classification, and fold change. Finally, the methods employed here demonstrated a suitable manner to combine amplicon generation and NGS data analysis to discover new monoclonal antibodies (mAbs).
Collapse
Affiliation(s)
- Andréa Queiroz Maranhão
- Department of Cellular Biology, Institute of Biological Science, University of Brasília, Brasília, Brazil.,Instituto de Investigação em Imunologia, Instituto Nacional de Ciência e Tecnologia (iii-INCT), São Paulo, Brazil
| | - Heidi Muniz Silva
- Department of Cellular Biology, Institute of Biological Science, University of Brasília, Brasília, Brazil
| | - Waldeyr Mendes Cordeiro da Silva
- Department of Cellular Biology, Institute of Biological Science, University of Brasília, Brasília, Brazil.,NEPBio, Federal Institute of Goiás, Formosa, Brazil
| | - Renato Kaylan Alves França
- Department of Cellular Biology, Institute of Biological Science, University of Brasília, Brasília, Brazil
| | - Thais Canassa De Leo
- School of Pharmaceutical Sciences of Ribeirão Preto, USP, Ribeirão Preto, Brazil
| | - Marcelo Dias-Baruffi
- School of Pharmaceutical Sciences of Ribeirão Preto, USP, Ribeirão Preto, Brazil
| | - Rafael Trindade Burtet
- Department of Cellular Biology, Institute of Biological Science, University of Brasília, Brasília, Brazil
| | - Marcelo Macedo Brigido
- Department of Cellular Biology, Institute of Biological Science, University of Brasília, Brasília, Brazil.,Instituto de Investigação em Imunologia, Instituto Nacional de Ciência e Tecnologia (iii-INCT), São Paulo, Brazil
| |
Collapse
|
22
|
Abstract
Short linear motifs (SLiMs) are important mediators of interactions between intrinsically disordered regions of proteins and their interaction partners. Here, we detail instructions for the computational prediction of SLiMs in disordered protein regions, using the main tools of the SLiMSuite package: (1) SLiMProb identifies and calculates enrichment of predefined motifs in a set of proteins; (2) SLiMFinder predicts SLiMs de novo in a set of proteins, accounting for evolutionary relationships; (3) QSLiMFinder increases SLiMFinder sensitivity by focusing SLiM prediction on a specific query protein/region; (4) CompariMotif compares predicted SLiMs to known SLiMs or other SLiM predictions to identify common patterns. For each tool, command-line and online server examples are provided. Detailed notes provide additional advice on different applications of SLiMSuite, including batch running of multiple datasets and conservation masking using alignments of predicted orthologues.
Collapse
|
23
|
Hraber P, O'Maille PE, Silberfarb A, Davis-Anderson K, Generous N, McMahon BH, Fair JM. Resources to Discover and Use Short Linear Motifs in Viral Proteins. Trends Biotechnol 2020; 38:113-127. [PMID: 31427097 PMCID: PMC7114124 DOI: 10.1016/j.tibtech.2019.07.004] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 07/11/2019] [Accepted: 07/15/2019] [Indexed: 12/23/2022]
Abstract
Viral proteins evade host immune function by molecular mimicry, often achieved by short linear motifs (SLiMs) of three to ten consecutive amino acids (AAs). Motif mimicry tolerates mutations, evolves quickly to modify interactions with the host, and enables modular interactions with protein complexes. Host cells cannot easily coordinate changes to conserved motif recognition and binding interfaces under selective pressure to maintain critical signaling pathways. SLiMs offer potential for use in synthetic biology, such as better immunogens and therapies, but may also present biosecurity challenges. We survey viral uses of SLiMs to mimic host proteins, and information resources available for motif discovery. As the number of examples continues to grow, knowledge management tools are essential to help organize and compare new findings.
Collapse
Affiliation(s)
- Peter Hraber
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Paul E O'Maille
- Biosciences Division, SRI International, 333 Ravenswood Ave, Menlo Park, CA 94025, USA
| | - Andrew Silberfarb
- Artificial Intelligence Center, SRI International, 333 Ravenswood Ave, Menlo Park, CA 94025, USA
| | - Katie Davis-Anderson
- Biosecurity and Public Health, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Nicholas Generous
- Global Security Directorate, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Benjamin H McMahon
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Jeanne M Fair
- Biosecurity and Public Health, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| |
Collapse
|
24
|
A Consensus Binding Motif for the PP4 Protein Phosphatase. Mol Cell 2019; 76:953-964.e6. [PMID: 31585692 DOI: 10.1016/j.molcel.2019.08.029] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 07/08/2019] [Accepted: 08/28/2019] [Indexed: 12/18/2022]
Abstract
Dynamic protein phosphorylation constitutes a fundamental regulatory mechanism in all organisms. Phosphoprotein phosphatase 4 (PP4) is a conserved and essential nuclear serine and threonine phosphatase. Despite the importance of PP4, general principles of substrate selection are unknown, hampering the study of signal regulation by this phosphatase. Here, we identify and thoroughly characterize a general PP4 consensus-binding motif, the FxxP motif. X-ray crystallography studies reveal that FxxP motifs bind to a conserved pocket in the PP4 regulatory subunit PPP4R3. Systems-wide in silico searches integrated with proteomic analysis of PP4 interacting proteins allow us to identify numerous FxxP motifs in proteins controlling a range of fundamental cellular processes. We identify an FxxP motif in the cohesin release factor WAPL and show that this regulates WAPL phosphorylation status and is required for efficient cohesin release. Collectively our work uncovers basic principles of PP4 specificity with broad implications for understanding phosphorylation-mediated signaling in cells.
Collapse
|
25
|
Vekris A, Pilalis E, Chatziioannou A, Petry KG. A Computational Pipeline for the Extraction of Actionable Biological Information From NGS-Phage Display Experiments. Front Physiol 2019; 10:1160. [PMID: 31607941 PMCID: PMC6769401 DOI: 10.3389/fphys.2019.01160] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Accepted: 08/28/2019] [Indexed: 12/20/2022] Open
Abstract
Phage Display is a powerful method for the identification of peptide binding to targets of variable complexities and tissues, from unique molecules to the internal surfaces of vessels of living organisms. Particularly for in vivo screenings, the resulting repertoires can be very complex and difficult to study with traditional approaches. Next Generation Sequencing (NGS) opened the possibility to acquire high resolution overviews of such repertoires and thus facilitates the identification of binders of interest. Additionally, the ever-increasing amount of available genome/proteome information became satisfactory regarding the identification of putative mimicked proteins, due to the large scale on which partial sequence homology is assessed. However, the subsequent production of massive data stresses the need for high-performance computational approaches in order to perform standardized and insightful molecular network analysis. Systems-level analysis is essential for efficient resolution of the underlying molecular complexity and the extraction of actionable interpretation, in terms of systemic biological processes and pathways that are systematically perturbed. In this work we introduce PepSimili, an integrated workflow tool, which performs mapping of massive peptide repertoires on whole proteomes and delivers a streamlined, systems-level biological interpretation. The tool employs modules for modeling and filtering of background noise due to random mappings and amplifies the biologically meaningful signal through coupling with BioInfoMiner, a systems interpretation tool that employs graph-theoretic methods for prioritization of systemic processes and corresponding driver genes. The current implementation exploits the Galaxy environment and is available online. A case study using public data is presented, with and without a control selection.
Collapse
Affiliation(s)
| | - Eleftherios Pilalis
- Metabolic Engineering and Bioinformatics Program, Institute of Chemical Biology, National Hellenic Research Foundation, Athens, Greece.,eNIOS Applications P.C., Athens, Greece
| | - Aristotelis Chatziioannou
- Metabolic Engineering and Bioinformatics Program, Institute of Chemical Biology, National Hellenic Research Foundation, Athens, Greece.,eNIOS Applications P.C., Athens, Greece
| | | |
Collapse
|
26
|
Prytuliak R, Volkmer M, Meier M, Habermann BH. HH-MOTiF: de novo detection of short linear motifs in proteins by Hidden Markov Model comparisons. Nucleic Acids Res 2019; 45:W470-W477. [PMID: 28460141 PMCID: PMC5570144 DOI: 10.1093/nar/gkx341] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 04/18/2017] [Indexed: 12/18/2022] Open
Abstract
Short linear motifs (SLiMs) in proteins are self-sufficient functional sequences that specify interaction sites for other molecules and thus mediate a multitude of functions. Computational, as well as experimental biological research would significantly benefit, if SLiMs in proteins could be correctly predicted de novo with high sensitivity. However, de novo SLiM prediction is a difficult computational task. When considering recall and precision, the performances of published methods indicate remaining challenges in SLiM discovery. We have developed HH-MOTiF, a web-based method for SLiM discovery in sets of mainly unrelated proteins. HH-MOTiF makes use of evolutionary information by creating Hidden Markov Models (HMMs) for each input sequence and its closely related orthologs. HMMs are compared against each other to retrieve short stretches of homology that represent potential SLiMs. These are transformed to hierarchical structures, which we refer to as motif trees, for further processing and evaluation. Our approach allows us to identify degenerate SLiMs, while still maintaining a reasonably high precision. When considering a balanced measure for recall and precision, HH-MOTiF performs better on test data compared to other SLiM discovery methods. HH-MOTiF is freely available as a web-server at http://hh-motif.biochem.mpg.de.
Collapse
Affiliation(s)
- Roman Prytuliak
- Computational Biology Group, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Michael Volkmer
- Computational Biology Group, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Markus Meier
- Research Group Quantitative Biology and Bioinformatics, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Bianca H Habermann
- Computational Biology Group, Max Planck Institute of Biochemistry, Martinsried, Germany.,Computational Biology Group, Developmental Biology Institute of Marseille (IBDM) UMR 7288, CNRS, Aix Marseille Université, Marseille 13288 Cedex 9, France
| |
Collapse
|
27
|
Zarin T, Strome B, Nguyen Ba AN, Alberti S, Forman-Kay JD, Moses AM. Proteome-wide signatures of function in highly diverged intrinsically disordered regions. eLife 2019; 8:e46883. [PMID: 31264965 PMCID: PMC6634968 DOI: 10.7554/elife.46883] [Citation(s) in RCA: 102] [Impact Index Per Article: 20.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 07/01/2019] [Indexed: 12/24/2022] Open
Abstract
Intrinsically disordered regions make up a large part of the proteome, but the sequence-to-function relationship in these regions is poorly understood, in part because the primary amino acid sequences of these regions are poorly conserved in alignments. Here we use an evolutionary approach to detect molecular features that are preserved in the amino acid sequences of orthologous intrinsically disordered regions. We find that most disordered regions contain multiple molecular features that are preserved, and we define these as 'evolutionary signatures' of disordered regions. We demonstrate that intrinsically disordered regions with similar evolutionary signatures can rescue function in vivo, and that groups of intrinsically disordered regions with similar evolutionary signatures are strongly enriched for functional annotations and phenotypes. We propose that evolutionary signatures can be used to predict function for many disordered regions from their amino acid sequences.
Collapse
Affiliation(s)
- Taraneh Zarin
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
| | - Bob Strome
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
| | - Alex N Nguyen Ba
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| | - Simon Alberti
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Molecular and Cellular Bioengineering, Biotechnology Center, Technische Universität Dresden, Dresden, Germany
| | - Julie D Forman-Kay
- Program in Molecular Medicine, Hospital for Sick Children, Toronto, Canada
- Department of Biochemistry, University of Toronto, Toronto, Canada
| | - Alan M Moses
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| |
Collapse
|
28
|
Sharma R, Raicar G, Tsunoda T, Patil A, Sharma A. OPAL: prediction of MoRF regions in intrinsically disordered protein sequences. Bioinformatics 2019; 34:1850-1858. [PMID: 29360926 DOI: 10.1093/bioinformatics/bty032] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2017] [Accepted: 01/17/2018] [Indexed: 12/15/2022] Open
Abstract
Motivation Intrinsically disordered proteins lack stable 3-dimensional structure and play a crucial role in performing various biological functions. Key to their biological function are the molecular recognition features (MoRFs) located within long disordered regions. Computationally identifying these MoRFs from disordered protein sequences is a challenging task. In this study, we present a new MoRF predictor, OPAL, to identify MoRFs in disordered protein sequences. OPAL utilizes two independent sources of information computed using different component predictors. The scores are processed and combined using common averaging method. The first score is computed using a component MoRF predictor which utilizes composition and sequence similarity of MoRF and non-MoRF regions to detect MoRFs. The second score is calculated using half-sphere exposure (HSE), solvent accessible surface area (ASA) and backbone angle information of the disordered protein sequence, using information from the amino acid properties of flanks surrounding the MoRFs to distinguish MoRF and non-MoRF residues. Results OPAL is evaluated using test sets that were previously used to evaluate MoRF predictors, MoRFpred, MoRFchibi and MoRFchibi-web. The results demonstrate that OPAL outperforms all the available MoRF predictors and is the most accurate predictor available for MoRF prediction. It is available at http://www.alok-ai-lab.com/tools/opal/. Contact ashwini@hgc.jp or alok.sharma@griffith.edu.au. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ronesh Sharma
- School of Engineering and Physics, The University of the South Pacific, Suva, Fiji.,School of Electrical and Electronics Engineering, Fiji National University, Suva, Fiji
| | - Gaurav Raicar
- School of Engineering and Physics, The University of the South Pacific, Suva, Fiji
| | - Tatsuhiko Tsunoda
- Laboratory of Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan.,Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo 113-8510, Japan.,CREST, JST, Tokyo 113-8510, Japan
| | - Ashwini Patil
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| | - Alok Sharma
- School of Engineering and Physics, The University of the South Pacific, Suva, Fiji.,Laboratory of Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan.,Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo 113-8510, Japan.,CREST, JST, Tokyo 113-8510, Japan.,Institute for Integrated and Intelligent Systems, Griffith University, Nathan, Brisbane, QLD, Australia
| |
Collapse
|
29
|
Li Y, Zhang Y, Li X, Yi S, Xu J. Gain-of-Function Mutations: An Emerging Advantage for Cancer Biology. Trends Biochem Sci 2019; 44:659-674. [PMID: 31047772 DOI: 10.1016/j.tibs.2019.03.009] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 03/21/2019] [Accepted: 03/26/2019] [Indexed: 02/08/2023]
Abstract
Advances in next-generation sequencing have identified thousands of genomic variants that perturb the normal functions of proteins, further contributing to diverse phenotypic consequences in cancer. Elucidating the functional pathways altered by loss-of-function (LOF) or gain-of-function (GOF) mutations will be crucial for prioritizing cancer-causing variants and their resultant therapeutic liabilities. In this review, we highlight the fundamental function of GOF mutations and discuss the potential mechanistic effects in the context of signaling networks. We also summarize advances in experimental and computational resources, which will dramatically help with studies on the functional and phenotypic consequences of mutations. Together, systematic investigations of the function of GOF mutations will provide an important missing piece for cancer biology and precision therapy.
Collapse
Affiliation(s)
- Yongsheng Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China; Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA
| | - Yunpeng Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China; College of Bioinformatics, Hainan Medical University, Haikou 570100, China.
| | - Song Yi
- Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA; Department of Biomedical Engineering, Cockrell School of Engineering, The University of Texas at Austin, Austin, TX 78712, USA.
| | - Juan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| |
Collapse
|
30
|
Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecX). Sci Rep 2019; 9:3577. [PMID: 30837494 PMCID: PMC6401088 DOI: 10.1038/s41598-019-38746-w] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 12/19/2018] [Indexed: 12/28/2022] Open
Abstract
In this paper, we present peptide-pair encoding (PPE), a general-purpose probabilistic segmentation of protein sequences into commonly occurring variable-length sub-sequences. The idea of PPE segmentation is inspired by the byte-pair encoding (BPE) text compression algorithm, which has recently gained popularity in subword neural machine translation. We modify this algorithm by adding a sampling framework allowing for multiple ways of segmenting a sequence. PPE segmentation steps can be learned over a large set of protein sequences (Swiss-Prot) or even a domain-specific dataset and then applied to a set of unseen sequences. This representation can be widely used as the input to any downstream machine learning tasks in protein bioinformatics. In particular, here, we introduce this representation through protein motif discovery and protein sequence embedding. (i) DiMotif: we present DiMotif as an alignment-free discriminative motif discovery method and evaluate the method for finding protein motifs in three different settings: (1) comparison of DiMotif with two existing approaches on 20 distinct motif discovery problems which are experimentally verified, (2) classification-based approach for the motifs extracted for integrins, integrin-binding proteins, and biofilm formation, and (3) in sequence pattern searching for nuclear localization signal. The DiMotif, in general, obtained high recall scores, while having a comparable F1 score with other methods in the discovery of experimentally verified motifs. Having high recall suggests that the DiMotif can be used for short-list creation for further experimental investigations on motifs. In the classification-based evaluation, the extracted motifs could reliably detect the integrins, integrin-binding, and biofilm formation-related proteins on a reserved set of sequences with high F1 scores. (ii) ProtVecX: we extend k-mer based protein vector (ProtVec) embedding to variablelength protein embedding using PPE sub-sequences. We show that the new method of embedding can marginally outperform ProtVec in enzyme prediction as well as toxin prediction tasks. In addition, we conclude that the embeddings are beneficial in protein classification tasks when they are combined with raw amino acids k-mer features.
Collapse
|
31
|
O’Brien KT, Golla K, Kranjc T, O’Donovan D, Allen S, Maguire P, Simpson JC, O’Connell D, Moran N, Shields DC. Computational and experimental analysis of bioactive peptide linear motifs in the integrin adhesome. PLoS One 2019; 14:e0210337. [PMID: 30689642 PMCID: PMC6349357 DOI: 10.1371/journal.pone.0210337] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 12/20/2018] [Indexed: 12/15/2022] Open
Abstract
Therapeutic modulation of protein interactions is challenging, but short linear motifs (SLiMs) represent potential targets. Focal adhesions play a central role in adhesion by linking cells to the extracellular matrix. Integrins are central to this process, and many other intracellular proteins are components of the integrin adhesome. We applied a peptide network targeting approach to explore the intracellular modulation of integrin function in platelets. Firstly, we computed a platelet-relevant integrin adhesome, inferred via homology of known platelet proteins to adhesome components. We then computationally selected peptides from the set of platelet integrin adhesome cytoplasmic and membrane adjacent protein-protein interfaces. Motifs of interest in the intracellular component of the platelet integrin adhesome were identified using a predictor of SLiMs based on analysis of protein primary amino acid sequences (SLiMPred), a predictor of strongly conserved motifs within disordered protein regions (SLiMPrints), and information from the literature regarding protein interactions in the complex. We then synthesized peptides incorporating these motifs combined with cell penetrating factors (tat peptide and palmitylation for cytoplasmic and membrane proteins respectively). We tested for the platelet activating effects of the peptides, as well as their abilities to inhibit activation. Bioactivity testing revealed a number of peptides that modulated platelet function, including those derived from α-actinin (ACTN1) and syndecan (SDC4), binding to vinculin and syntenin respectively. Both chimeric peptide experiments and peptide combination experiments failed to identify strong effects, perhaps characterizing the adhesome as relatively robust against within-adhesome synergistic perturbation. We investigated in more detail peptides targeting vinculin. Combined experimental and computational evidence suggested a model in which the positively charged tat-derived cell penetrating part of the peptide contributes to bioactivity via stabilizing charge interactions with a region of the ACTN1 negatively charged surface. We conclude that some interactions in the integrin adhesome appear to be capable of modulation by short peptides, and may aid in the identification and characterization of target sites within the complex that may be useful for therapeutic modulation.
Collapse
Affiliation(s)
- Kevin T. O’Brien
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| | - Kalyan Golla
- Molecular and Cellular Therapeutics, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Tilen Kranjc
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- School of Biology and Environment Science, University College Dublin, Dublin, Ireland
| | - Darragh O’Donovan
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- School of Biomolecular and Biomedical Science, University College Dublin, Dublin, Ireland
| | - Seamus Allen
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- School of Biomolecular and Biomedical Science, University College Dublin, Dublin, Ireland
| | - Patricia Maguire
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- School of Biomolecular and Biomedical Science, University College Dublin, Dublin, Ireland
| | - Jeremy C. Simpson
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- School of Biology and Environment Science, University College Dublin, Dublin, Ireland
| | - David O’Connell
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- School of Biomolecular and Biomedical Science, University College Dublin, Dublin, Ireland
| | - Niamh Moran
- Molecular and Cellular Therapeutics, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Denis C. Shields
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- * E-mail:
| |
Collapse
|
32
|
Barski M. BASILIScan: a tool for high-throughput analysis of intrinsic disorder patterns in homologous proteins. BMC Genomics 2018; 19:902. [PMID: 30537929 PMCID: PMC6290515 DOI: 10.1186/s12864-018-5322-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 11/28/2018] [Indexed: 12/02/2022] Open
Abstract
Background Intrinsic structural disorder is a common property of many proteins, especially in eukaryotic and virus proteomes. The tendency of some proteins or protein regions to exist in a disordered state usually precludes their structural characterisation and renders them especially difficult for experimental handling after recombinant expression. Results A new intuitive, publicly-available computational resource, called BASILIScan, is presented here. It provides a BLAST-based search for close homologues of the protein of interest, integrated with a simultaneous prediction of intrinsic disorder together with a robust data viewer and interpreter. This allows for a quick, high-throughput screening, scoring and selection of closely-related yet highly structured homologues of the protein of interest. Comparative parallel analysis of the conservation of extended regions of disorder in multiple sequences is also offered. The use of BASILIScan and its capacity for yielding biologically applicable predictions is demonstrated. Using a high-throughput BASILIScan screen it is also shown that a large proportion of the human proteome displays homologous sequences of superior intrinsic structural order in many related species. Conclusion Through the swift identification of intrinsically stable homologues and poorly conserved disordered regions by the BASILIScan software, the chances of successful recombinant protein expression and compatibility with downstream applications such as crystallisation can be greatly increased. Electronic supplementary material The online version of this article (10.1186/s12864-018-5322-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Michal Barski
- Section of Virology, Department of Medicine, St Mary's Hospital, Imperial College London, London, W2 1PG, UK.
| |
Collapse
|
33
|
Modulation of the aggregation of an amyloidogenic sequence by flanking-disordered region in the intrinsically disordered antigen merozoite surface protein 2. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2018; 48:99-110. [PMID: 30443712 DOI: 10.1007/s00249-018-1337-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 08/31/2018] [Accepted: 11/08/2018] [Indexed: 10/27/2022]
Abstract
The abundant Plasmodium falciparum merozoite surface protein MSP2, a potential malaria vaccine candidate, is an intrinsically disordered protein with some nascent secondary structure present in its conserved N-terminal region. This relatively ordered region has been implicated in both membrane interactions and amyloid-like aggregation of the protein, while the significance of the flanking-disordered region is unclear. In this study, we show that aggregation of the N-terminal conserved region of MSP2 is influenced in a length- and sequence-dependent fashion by the disordered central variable sequences. Intriguingly, MSP2 peptides containing the conserved region and the first five residues of the variable disordered regions aggregated more rapidly than a peptide corresponding to the conserved region alone. In contrast, MSP2 peptides extending 8 or 12 residues into the disordered region aggregated more slowly, consistent with the expected inhibitory effect of flanking-disordered sequences on the aggregation of amyloidogenic ordered sequences. Computational analyses indicated that the helical propensity of the ordered region of MSP2 was modulated by the adjacent disordered five residues in a sequence-dependent manner. Nuclear magnetic resonance and circular dichroism spectroscopic studies with synthetic peptides confirmed the computational predictions, emphasizing the correlation between aggregation propensity and conformation of the ordered region and the effects thereon of the adjacent disordered region. These results show that the effects of flanking-disordered sequences on a more ordered sequence may include enhancement of aggregation through modulation of the conformational properties of the more ordered sequence.
Collapse
|
34
|
Idrees S, Pérez-Bercoff Å, Edwards RJ. SLiM-Enrich: computational assessment of protein-protein interaction data as a source of domain-motif interactions. PeerJ 2018; 6:e5858. [PMID: 30402352 PMCID: PMC6215436 DOI: 10.7717/peerj.5858] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Accepted: 10/02/2018] [Indexed: 01/21/2023] Open
Abstract
Many important cellular processes involve protein–protein interactions (PPIs) mediated by a Short Linear Motif (SLiM) in one protein interacting with a globular domain in another. Despite their significance, these domain-motif interactions (DMIs) are typically low affinity, which makes them challenging to identify by classical experimental approaches, such as affinity pulldown mass spectrometry (AP-MS) and yeast two-hybrid (Y2H). DMIs are generally underrepresented in PPI networks as a result. A number of computational methods now exist to predict SLiMs and/or DMIs from experimental interaction data but it is yet to be established how effective different PPI detection methods are for capturing these low affinity SLiM-mediated interactions. Here, we introduce a new computational pipeline (SLiM-Enrich) to assess how well a given source of PPI data captures DMIs and thus, by inference, how useful that data should be for SLiM discovery. SLiM-Enrich interrogates a PPI network for pairs of interacting proteins in which the first protein is known or predicted to interact with the second protein via a DMI. Permutation tests compare the number of known/predicted DMIs to the expected distribution if the two sets of proteins are randomly associated. This provides an estimate of DMI enrichment within the data and the false positive rate for individual DMIs. As a case study, we detect significant DMI enrichment in a high-throughput Y2H human PPI study. SLiM-Enrich analysis supports Y2H data as a source of DMIs and highlights the high false positive rates associated with naïve DMI prediction. SLiM-Enrich is available as an R Shiny app. The code is open source and available via a GNU GPL v3 license at: https://github.com/slimsuite/SLiMEnrich. A web server is available at: http://shiny.slimsuite.unsw.edu.au/SLiMEnrich/.
Collapse
Affiliation(s)
- Sobia Idrees
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Åsa Pérez-Bercoff
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Richard J Edwards
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
35
|
Sarkar D, Jana T, Saha S. LMDIPred: A web-server for prediction of linear peptide sequences binding to SH3, WW and PDZ domains. PLoS One 2018; 13:e0200430. [PMID: 30001346 PMCID: PMC6042728 DOI: 10.1371/journal.pone.0200430] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 06/26/2018] [Indexed: 12/29/2022] Open
Abstract
Protein-peptide interactions form an important subset of the total protein interaction network in the cell and play key roles in signaling and regulatory networks, and in major biological processes like cellular localization, protein degradation, and immune response. In this work, we have described the LMDIPred web server, an online resource for generalized prediction of linear peptide sequences that may bind to three most prevalent and well-studied peptide recognition modules (PRMs)—SH3, WW and PDZ. We have developed support vector machine (SVM)-based prediction models that achieved maximum Matthews Correlation Coefficient (MCC) of 0.85 with an accuracy of 94.55% for SH3, MCC of 0.90 with an accuracy of 95.82% for WW, and MCC of 0.83 with an accuracy of 92.29% for PDZ binding peptides. LMDIPred output combines predictions from these SVM models with predictions using Position-Specific Scoring Matrices (PSSMs) and string-matching methods using known domain-binding motif instances and regular expressions. All of these methods were evaluated using a five-fold cross-validation technique on both balanced and unbalanced datasets, and also validated on independent datasets. LMDIPred aims to provide a preliminary bioinformatics platform for sequence-based prediction of probable binding sites for SH3, WW or PDZ domains.
Collapse
Affiliation(s)
| | - Tanmoy Jana
- Bioinformatics Centre, Bose Institute, Kolkata, India
| | - Sudipto Saha
- Bioinformatics Centre, Bose Institute, Kolkata, India
- * E-mail: ,
| |
Collapse
|
36
|
Prytuliak R, Pfeiffer F, Habermann BH. SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data. BMC Bioinformatics 2018; 19:24. [PMID: 29373955 PMCID: PMC5787307 DOI: 10.1186/s12859-018-2020-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2017] [Accepted: 01/08/2018] [Indexed: 12/30/2022] Open
Abstract
Background Protein or nucleic acid sequences contain a multitude of associated annotations representing continuous sequence elements (CSEs). Comparing these CSEs is needed, whenever we want to match identical annotations or integrate distinctive ones. Currently, there is no ready-to-use software available that provides comprehensive statistical readout for comparing two annotations of the same type with each other, which can be adapted to the application logic of the scientific question. Results We have developed a method, SLALOM (for StatisticaL Analysis of Locus Overlap Method), to perform comparative analysis of sequence annotations in a highly flexible way. SLALOM implements six major operation modes and a number of additional options that can answer a variety of statistical questions about a pair of input annotations of a given sequence collection. We demonstrate the results of SLALOM on three different examples from biology and economics and compare our method to already existing software. We discuss the importance of carefully choosing the application logic to address specific scientific questions. Conclusion SLALOM is a highly versatile, command-line based method for comparing annotations in a collection of sequences, with a statistical read-out for performance evaluation and benchmarking of predictors and gene annotation pipelines. Abstraction from sequence content even allows SLALOM to compare other kinds of positional data including, for example, data coming from time series. Electronic supplementary material The online version of this article (10.1186/s12859-018-2020-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Roman Prytuliak
- Computational Biology Group, Max Planck Institute of Biochemistry, Am Klopferspitz 18, 82152, Martinsried, Germany
| | - Friedhelm Pfeiffer
- Computational Biology Group, Max Planck Institute of Biochemistry, Am Klopferspitz 18, 82152, Martinsried, Germany
| | - Bianca Hermine Habermann
- Computational Biology Group, Max Planck Institute of Biochemistry, Am Klopferspitz 18, 82152, Martinsried, Germany. .,Computational Biology Group, Aix-Marseille University & CNRS, Developmental Biology Institute of Marseille (IBDM), UMR 7288, Parc Scientifique de Luminy, 163 Avenue de Luminy, 13009, Marseille, France.
| |
Collapse
|
37
|
Sharma R, Bayarjargal M, Tsunoda T, Patil A, Sharma A. MoRFPred-plus: Computational Identification of MoRFs in Protein Sequences using Physicochemical Properties and HMM profiles. J Theor Biol 2018; 437:9-16. [DOI: 10.1016/j.jtbi.2017.10.015] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Revised: 10/12/2017] [Accepted: 10/13/2017] [Indexed: 11/26/2022]
|
38
|
Upgrading Affinity Screening Experiments by Analysis of Next-Generation Sequencing Data. Methods Mol Biol 2017; 1701:411-424. [PMID: 29116519 DOI: 10.1007/978-1-4939-7447-4_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Computational analysis of next-generation sequencing data (NGS; also termed deep sequencing) enables the analysis of affinity screening procedures (or biopanning experiments) in an unprecedented depth and therewith improves the identification of relevant peptide or antibody ligands with desired binding or functional properties. Virtually any selection methodology employing the direct physical linkage of geno- and phenotype to select for desired properties can be leveraged by computational analysis. This article describes a concept how relevant ligands can be identified by harnessing NGS data. Thereby, the focus lays on improved ligand identification and describes how NGS data can be structured for single-round analysis as well as for comparative analysis of multiple selection rounds. Especially, the comparative analysis opens new avenues in the field of ligand identification. The concept of computational analysis is described at the example of the software tool "AptaAnalyzer TM ." This intuitive tool was developed for scientists without special computer skills and makes the computational approach accessible to a broad user range.
Collapse
|
39
|
Wu CG, Chen H, Guo F, Yadav VK, Mcilwain SJ, Rowse M, Choudhary A, Lin Z, Li Y, Gu T, Zheng A, Xu Q, Lee W, Resch E, Johnson B, Day J, Ge Y, Ong IM, Burkard ME, Ivarsson Y, Xing Y. PP2A-B' holoenzyme substrate recognition, regulation and role in cytokinesis. Cell Discov 2017; 3:17027. [PMID: 28884018 PMCID: PMC5586252 DOI: 10.1038/celldisc.2017.27] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2017] [Accepted: 07/12/2017] [Indexed: 12/11/2022] Open
Abstract
Protein phosphatase 2A (PP2A) is a major Ser/Thr phosphatase; it forms diverse heterotrimeric holoenzymes that counteract kinase actions. Using a peptidome that tiles the disordered regions of the human proteome, we identified proteins containing [LMFI]xx[ILV]xEx motifs that serve as interaction sites for B′-family PP2A regulatory subunits and holoenzymes. The B′-binding motifs have important roles in substrate recognition and in competitive inhibition of substrate binding. With more than 100 novel ligands identified, we confirmed that the recently identified LxxIxEx B′α-binding motifs serve as common binding sites for B′ subunits with minor variations, and that S/T phosphorylation or D/E residues at positions 2, 7, 8 and 9 of the motifs reinforce interactions. Hundreds of proteins in the human proteome harbor intrinsic or phosphorylation-responsive B′-interaction motifs, and localize at distinct cellular organelles, such as midbody, predicting kinase-facilitated recruitment of PP2A-B′ holoenzymes for tight spatiotemporal control of phosphorylation at mitosis and cytokinesis. Moroever, Polo-like kinase 1-mediated phosphorylation of Cyk4/RACGAP1, a centralspindlin component at the midbody, facilitates binding of both RhoA guanine nucleotide exchange factor (epithelial cell transforming sequence 2 (Ect2)) and PP2A-B′ that in turn dephosphorylates Cyk4 and disrupts Ect2 binding. This feedback signaling loop precisely controls RhoA activation and specifies a restricted region for cleavage furrow ingression. Our results provide a framework for further investigation of diverse signaling circuits formed by PP2A-B′ holoenzymes in various cellular processes.
Collapse
Affiliation(s)
- Cheng-Guo Wu
- McArdle Laboratory for Cancer Research, Department of Oncology, University of Wisconsin at Madison, School of Medicine and Public Health, Madison, WI, USA.,Biophysics Program, University of Wisconsin at Madison, Madison, WI, USA
| | - Hui Chen
- McArdle Laboratory for Cancer Research, Department of Oncology, University of Wisconsin at Madison, School of Medicine and Public Health, Madison, WI, USA
| | - Feng Guo
- McArdle Laboratory for Cancer Research, Department of Oncology, University of Wisconsin at Madison, School of Medicine and Public Health, Madison, WI, USA
| | - Vikash K Yadav
- Department of Chemistry-BMC, Uppsala University, Uppsala, Sweden
| | - Sean J Mcilwain
- Biostatistics and Medical Informatics, Wisconsin Institutes of Medical Research, University of Wisconsin at Madison, School of Medicine and Public Health, Madison, WI, USA
| | - Michael Rowse
- McArdle Laboratory for Cancer Research, Department of Oncology, University of Wisconsin at Madison, School of Medicine and Public Health, Madison, WI, USA
| | - Alka Choudhary
- Department of Medicine, Hematology/Oncology, UW Carbone Cancer Center, University of Wisconsin at Madison, School of Medicine and Public Health, Madison, WI, USA
| | - Ziqing Lin
- Department of Cell and Regenerative Biology, Human Proteomic Program, School of Medicine and Public Health, Madison, WI, USA
| | - Yitong Li
- McArdle Laboratory for Cancer Research, Department of Oncology, University of Wisconsin at Madison, School of Medicine and Public Health, Madison, WI, USA
| | - Tingjia Gu
- McArdle Laboratory for Cancer Research, Department of Oncology, University of Wisconsin at Madison, School of Medicine and Public Health, Madison, WI, USA
| | - Aiping Zheng
- McArdle Laboratory for Cancer Research, Department of Oncology, University of Wisconsin at Madison, School of Medicine and Public Health, Madison, WI, USA
| | - Qingge Xu
- Department of Cell and Regenerative Biology, Human Proteomic Program, School of Medicine and Public Health, Madison, WI, USA
| | - Woojong Lee
- McArdle Laboratory for Cancer Research, Department of Oncology, University of Wisconsin at Madison, School of Medicine and Public Health, Madison, WI, USA
| | - Eduard Resch
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Project Group Translational Medicine and Pharmacology TMP, Frankfurt am Main, Germany
| | - Benjamin Johnson
- McArdle Laboratory for Cancer Research, Department of Oncology, University of Wisconsin at Madison, School of Medicine and Public Health, Madison, WI, USA
| | - Jenny Day
- McArdle Laboratory for Cancer Research, Department of Oncology, University of Wisconsin at Madison, School of Medicine and Public Health, Madison, WI, USA
| | - Ying Ge
- Department of Cell and Regenerative Biology, Human Proteomic Program, School of Medicine and Public Health, Madison, WI, USA
| | - Irene M Ong
- Biostatistics and Medical Informatics, Wisconsin Institutes of Medical Research, University of Wisconsin at Madison, School of Medicine and Public Health, Madison, WI, USA
| | - Mark E Burkard
- Department of Medicine, Hematology/Oncology, UW Carbone Cancer Center, University of Wisconsin at Madison, School of Medicine and Public Health, Madison, WI, USA
| | - Ylva Ivarsson
- Department of Chemistry-BMC, Uppsala University, Uppsala, Sweden
| | - Yongna Xing
- McArdle Laboratory for Cancer Research, Department of Oncology, University of Wisconsin at Madison, School of Medicine and Public Health, Madison, WI, USA.,Biophysics Program, University of Wisconsin at Madison, Madison, WI, USA
| |
Collapse
|
40
|
Kelil A, Dubreuil B, Levy ED, Michnick SW. Exhaustive search of linear information encoding protein-peptide recognition. PLoS Comput Biol 2017; 13:e1005499. [PMID: 28426660 PMCID: PMC5417721 DOI: 10.1371/journal.pcbi.1005499] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Revised: 05/04/2017] [Accepted: 04/04/2017] [Indexed: 11/24/2022] Open
Abstract
High-throughput in vitro methods have been extensively applied to identify linear information that encodes peptide recognition. However, these methods are limited in number of peptides, sequence variation, and length of peptides that can be explored, and often produce solutions that are not found in the cell. Despite the large number of methods developed to attempt addressing these issues, the exhaustive search of linear information encoding protein-peptide recognition has been so far physically unfeasible. Here, we describe a strategy, called DALEL, for the exhaustive search of linear sequence information encoded in proteins that bind to a common partner. We applied DALEL to explore binding specificity of SH3 domains in the budding yeast Saccharomyces cerevisiae. Using only the polypeptide sequences of SH3 domain binding proteins, we succeeded in identifying the majority of known SH3 binding sites previously discovered either in vitro or in vivo. Moreover, we discovered a number of sites with both non-canonical sequences and distinct properties that may serve ancillary roles in peptide recognition. We compared DALEL to a variety of state-of-the-art algorithms in the blind identification of known binding sites of the human Grb2 SH3 domain. We also benchmarked DALEL on curated biological motifs derived from the ELM database to evaluate the effect of increasing/decreasing the enrichment of the motifs. Our strategy can be applied in conjunction with experimental data of proteins interacting with a common partner to identify binding sites among them. Yet, our strategy can also be applied to any group of proteins of interest to identify enriched linear motifs or to exhaustively explore the space of linear information encoded in a polypeptide sequence. Finally, we have developed a webserver located at http://michnick.bcm.umontreal.ca/dalel, offering user-friendly interface and providing different scenarios utilizing DALEL. Here we describe the first strategy for the exhaustive search of the linear information encoding protein-peptide recognition; an approach that has previously been physically unfeasible because the combinatorial space of polypeptide sequences is too vast. The search covers the entire space of sequences with no restriction on motif length or composition, and includes all possible combinations of amino acids at distinct positions of each sequence, as well as positions with correlated preferences for amino acids.
Collapse
Affiliation(s)
- Abdellali Kelil
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, Quebec, Canada
| | - Benjamin Dubreuil
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Emmanuel D. Levy
- Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, Quebec, Canada
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Stephen W. Michnick
- Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, Quebec, Canada
- * E-mail:
| |
Collapse
|
41
|
Davey NE, Seo MH, Yadav VK, Jeon J, Nim S, Krystkowiak I, Blikstad C, Dong D, Markova N, Kim PM, Ivarsson Y. Discovery of short linear motif-mediated interactions through phage display of intrinsically disordered regions of the human proteome. FEBS J 2017; 284:485-498. [PMID: 28002650 DOI: 10.1111/febs.13995] [Citation(s) in RCA: 65] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Revised: 12/04/2016] [Accepted: 12/19/2016] [Indexed: 12/29/2022]
Abstract
The intrinsically disordered regions of eukaryotic proteomes are enriched in short linear motifs (SLiMs), which are of crucial relevance for cellular signaling and protein regulation; many mediate interactions by providing binding sites for peptide-binding domains. The vast majority of SLiMs remain to be discovered highlighting the need for experimental methods for their large-scale identification. We present a novel proteomic peptide phage display (ProP-PD) library that displays peptides representing the disordered regions of the human proteome, allowing direct large-scale interrogation of most potential binding SLiMs in the proteome. The performance of the ProP-PD library was validated through selections against SLiM-binding bait domains with distinct folds and binding preferences. The vast majority of identified binding peptides contained sequences that matched the known SLiM-binding specificities of the bait proteins. For SHANK1 PDZ, we establish a novel consensus TxF motif for its non-C-terminal ligands. The binding peptides mostly represented novel target proteins, however, several previously validated protein-protein interactions (PPIs) were also discovered. We determined the affinities between the VHS domain of GGA1 and three identified ligands to 40-130 μm through isothermal titration calorimetry, and confirmed interactions through coimmunoprecipitation using full-length proteins. Taken together, we outline a general pipeline for the design and construction of ProP-PD libraries and the analysis of ProP-PD-derived, SLiM-based PPIs. We demonstrated the methods potential to identify low affinity motif-mediated interactions for modular domains with distinct binding preferences. The approach is a highly useful complement to the current toolbox of methods for PPI discovery.
Collapse
Affiliation(s)
- Norman E Davey
- Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Ireland
| | - Moon-Hyeong Seo
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Canada
| | | | - Jouhyun Jeon
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Canada
| | - Satra Nim
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Canada
| | - Izabella Krystkowiak
- Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Ireland
| | | | - Debbie Dong
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Canada
| | | | - Philip M Kim
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Canada.,Department of Molecular Genetics and Department of Computer Science, University of Toronto, Canada
| | - Ylva Ivarsson
- Department of Chemistry - BMC, Uppsala University, Sweden
| |
Collapse
|
42
|
Czeizler E, Hirvola T, Karhu K. A graph-theoretical approach for motif discovery in protein sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:121-130. [PMID: 28055896 DOI: 10.1109/tcbb.2015.2511750] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Motif recognition is a challenging problem in bioinformatics due to the diversity of protein motifs. Many existing algorithms identify motifs of a given length, thus being either not applicable or not efficient when searching simultaneously for motifs of various lengths. Searching for gapped motifs, although very important, is a highly time-consuming task due to the combinatorial explosion of possible combinations implied by the consideration of long gaps. We introduce a new graph theoretical approach to identify motifs of various lengths, both with and without gaps. We compare our approach with two widely used methods: MEME and GLAM2 analyzing both the quality of the results and the required computational time. Our method provides results of a slightly higher level of quality than MEME but at a much faster rate, i.e., one eighth of MEME's query time. By using similarity indexing, we drop the query times down to an average of approximately one sixth of the ones required by GLAM2, while achieving a slightly higher level of quality of the results. More precisely, for sequence collections smaller than 50000 bytes GLAM2 is 13 times slower, while being at least as fast as our method on larger ones. The source code of our C++ implementation is freely available in GitHub: https://github.com/hirvolt1/debruijn-motif.
Collapse
|
43
|
Sharma R, Kumar S, Tsunoda T, Patil A, Sharma A. Predicting MoRFs in protein sequences using HMM profiles. BMC Bioinformatics 2016; 17:504. [PMID: 28155710 PMCID: PMC5259822 DOI: 10.1186/s12859-016-1375-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Intrinsically Disordered Proteins (IDPs) lack an ordered three-dimensional structure and are enriched in various biological processes. The Molecular Recognition Features (MoRFs) are functional regions within IDPs that undergo a disorder-to-order transition on binding to a partner protein. Identifying MoRFs in IDPs using computational methods is a challenging task. METHODS In this study, we introduce hidden Markov model (HMM) profiles to accurately identify the location of MoRFs in disordered protein sequences. Using windowing technique, HMM profiles are utilised to extract features from protein sequences and support vector machines (SVM) are used to calculate a propensity score for each residue. Two different SVM kernels with high noise tolerance are evaluated with a varying window size and the scores of the SVM models are combined to generate the final propensity score to predict MoRF residues. The SVM models are designed to extract maximal information between MoRF residues, its neighboring regions (Flanks) and the remainder of the sequence (Others). RESULTS To evaluate the proposed method, its performance was compared to that of other MoRF predictors; MoRFpred and ANCHOR. The results show that the proposed method outperforms these two predictors. CONCLUSIONS Using HMM profile as a source of feature extraction, the proposed method indicates improvement in predicting MoRFs in disordered protein sequences.
Collapse
Affiliation(s)
- Ronesh Sharma
- School of Electrical and Electronics Engineering, Fiji National University, Suva, Fiji.,School of Engineering and Physics, The University of the South Pacific, Suva, Fiji
| | - Shiu Kumar
- School of Electrical and Electronics Engineering, Fiji National University, Suva, Fiji.,School of Engineering and Physics, The University of the South Pacific, Suva, Fiji
| | - Tatsuhiko Tsunoda
- CREST, JST, Yokohama, 230-0045, Japan.,RIKEN Center for Integrative Medical Science, Yokohama, 230-0045, Japan.,Medical Research Institute, Tokyo Medical and Dental University, Tokyo, 113-8510, Japan
| | - Ashwini Patil
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.
| | - Alok Sharma
- School of Engineering and Physics, The University of the South Pacific, Suva, Fiji. .,CREST, JST, Yokohama, 230-0045, Japan. .,RIKEN Center for Integrative Medical Science, Yokohama, 230-0045, Japan. .,Medical Research Institute, Tokyo Medical and Dental University, Tokyo, 113-8510, Japan.
| |
Collapse
|
44
|
Sze-To A, Fung S, Lee ESA, Wong AK. Prediction of Protein–Protein Interaction via co-occurring Aligned Pattern Clusters. Methods 2016; 110:26-34. [DOI: 10.1016/j.ymeth.2016.07.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2016] [Revised: 06/25/2016] [Accepted: 07/26/2016] [Indexed: 10/21/2022] Open
|
45
|
Evolution of domain-peptide interactions to coadapt specificity and affinity to functional diversity. Proc Natl Acad Sci U S A 2016; 113:E3862-71. [PMID: 27317745 DOI: 10.1073/pnas.1518469113] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Evolution of complexity in eukaryotic proteomes has arisen, in part, through emergence of modular independently folded domains mediating protein interactions via binding to short linear peptides in proteins. Over 30 years, structural properties and sequence preferences of these peptides have been extensively characterized. Less successful, however, were efforts to establish relationships between physicochemical properties and functions of domain-peptide interactions. To our knowledge, we have devised the first strategy to exhaustively explore the binding specificity of protein domain-peptide interactions. We applied the strategy to SH3 domains to determine the properties of their binding peptides starting from various experimental data. The strategy identified the majority (∼70%) of experimentally determined SH3 binding sites. We discovered mutual relationships among binding specificity, binding affinity, and structural properties and evolution of linear peptides. Remarkably, we found that these properties are also related to functional diversity, defined by depth of proteins within hierarchies of gene ontologies. Our results revealed that linear peptides evolved to coadapt specificity and affinity to functional diversity of domain-peptide interactions. Thus, domain-peptide interactions follow human-constructed gene ontologies, which suggest that our understanding of biological process hierarchies reflect the way chemical and thermodynamic properties of linear peptides and their interaction networks, in general, have evolved.
Collapse
|
46
|
Brinton LT, Bauknight DK, Dasa SSK, Kelly KA. PHASTpep: Analysis Software for Discovery of Cell-Selective Peptides via Phage Display and Next-Generation Sequencing. PLoS One 2016; 11:e0155244. [PMID: 27186887 PMCID: PMC4871350 DOI: 10.1371/journal.pone.0155244] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Accepted: 04/26/2016] [Indexed: 11/18/2022] Open
Abstract
Next-generation sequencing has enhanced the phage display process, allowing for the quantification of millions of sequences resulting from the biopanning process. In response, many valuable analysis programs focused on specificity and finding targeted motifs or consensus sequences were developed. For targeted drug delivery and molecular imaging, it is also necessary to find peptides that are selective—targeting only the cell type or tissue of interest. We present a new analysis strategy and accompanying software, PHage Analysis for Selective Targeted PEPtides (PHASTpep), which identifies highly specific and selective peptides. Using this process, we discovered and validated, both in vitro and in vivo in mice, two sequences (HTTIPKV and APPIMSV) targeted to pancreatic cancer-associated fibroblasts that escaped identification using previously existing software. Our selectivity analysis makes it possible to discover peptides that target a specific cell type and avoid other cell types, enhancing clinical translatability by circumventing complications with systemic use.
Collapse
Affiliation(s)
- Lindsey T. Brinton
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, 22908, United States of America
- Cardiovascular Research Center, University of Virginia, Charlottesville, Virginia, 22908, United States of America
| | - Dustin K. Bauknight
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, 22908, United States of America
- Cardiovascular Research Center, University of Virginia, Charlottesville, Virginia, 22908, United States of America
| | - Siva Sai Krishna Dasa
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, 22908, United States of America
- Cardiovascular Research Center, University of Virginia, Charlottesville, Virginia, 22908, United States of America
| | - Kimberly A. Kelly
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, 22908, United States of America
- Cardiovascular Research Center, University of Virginia, Charlottesville, Virginia, 22908, United States of America
- * E-mail:
| |
Collapse
|
47
|
Yan J, Dunker AK, Uversky VN, Kurgan L. Molecular recognition features (MoRFs) in three domains of life. MOLECULAR BIOSYSTEMS 2016; 12:697-710. [DOI: 10.1039/c5mb00640f] [Citation(s) in RCA: 103] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
MoRFs are widespread intrinsically disordered protein-binding regions that have similar abundance and amino acid composition across the three domains of life.
Collapse
Affiliation(s)
- Jing Yan
- Department of Electrical and Computer Engineering
- University of Alberta
- Edmonton
- Canada
| | - A. Keith Dunker
- Center for Computational Biology and Bioinformatics
- Indiana University School of Medicine
- Indianapolis
- USA
- Indiana University School of Informatics
| | - Vladimir N. Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute
- Morsani College of Medicine
- University of South Florida
- Tampa
- USA
| | - Lukasz Kurgan
- Department of Electrical and Computer Engineering
- University of Alberta
- Edmonton
- Canada
- Department of Computer Science
| |
Collapse
|
48
|
Olorin E, O'Brien KT, Palopoli N, Pérez-Bercoff Å, Shields DC, Edwards RJ. SLiMScape 3.x: a Cytoscape 3 app for discovery of Short Linear Motifs in protein interaction networks. F1000Res 2015; 4:477. [PMID: 26674271 PMCID: PMC4670012 DOI: 10.12688/f1000research.6773.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/20/2015] [Indexed: 11/30/2022] Open
Abstract
Short linear motifs (SLiMs) are small protein sequence patterns that mediate a large number of critical protein-protein interactions, involved in processes such as complex formation, signal transduction, localisation and stabilisation. SLiMs show rapid evolutionary dynamics and are frequently the targets of molecular mimicry by pathogens. Identifying enriched sequence patterns due to convergent evolution in non-homologous proteins has proven to be a successful strategy for computational SLiM prediction. Tools of the SLiMSuite package use this strategy, using a statistical model to identify SLiM enrichment based on the evolutionary relationships, amino acid composition and predicted disorder of the input proteins. The quality of input data is critical for successful SLiM prediction. Cytoscape provides a user-friendly, interactive environment to explore interaction networks and select proteins based on common features, such as shared interaction partners. SLiMScape embeds tools of the SLiMSuite package for
de novo SLiM discovery (SLiMFinder and QSLiMFinder) and identifying occurrences/enrichment of known SLiMs (SLiMProb) within this interactive framework. SLiMScape makes it easier to (1) generate high quality hypothesis-driven datasets for these tools, and (2) visualise predicted SLiM occurrences within the context of the network. To generate new predictions, users can select nodes from a protein network or provide a set of Uniprot identifiers. SLiMProb also requires additional query motif input. Jobs are then run remotely on the SLiMSuite server (
http://rest.slimsuite.unsw.edu.au) for subsequent retrieval and visualisation. SLiMScape can also be used to retrieve and visualise results from jobs run directly on the server. SLiMScape and SLiMSuite are open source and freely available via GitHub under GNU licenses.
Collapse
Affiliation(s)
- Emily Olorin
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
| | - Kevin T O'Brien
- UCD Conway Institute of Biomolecular and Biomedical Research, School of Medicine, University College Dublin, Dublin, Ireland
| | - Nicolas Palopoli
- Centre for Biological Sciences, University of Southampton, Southampton, UK ; Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina ; Fundación Instituto Leloir, Buenos Aires, Argentina
| | - Åsa Pérez-Bercoff
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
| | - Denis C Shields
- UCD Conway Institute of Biomolecular and Biomedical Research, School of Medicine, University College Dublin, Dublin, Ireland
| | - Richard J Edwards
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia ; Centre for Biological Sciences, University of Southampton, Southampton, UK
| |
Collapse
|
49
|
Blikstad C, Ivarsson Y. High-throughput methods for identification of protein-protein interactions involving short linear motifs. Cell Commun Signal 2015; 13:38. [PMID: 26297553 PMCID: PMC4546347 DOI: 10.1186/s12964-015-0116-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 08/11/2015] [Indexed: 02/07/2023] Open
Abstract
Interactions between modular domains and short linear motifs (3–10 amino acids peptide stretches) are crucial for cell signaling. The motifs typically reside in the disordered regions of the proteome and the interactions are often transient, allowing for rapid changes in response to changing stimuli. The properties that make domain-motif interactions suitable for cell signaling also make them difficult to capture experimentally and they are therefore largely underrepresented in the known protein-protein interaction networks. Most of the knowledge on domain-motif interactions is derived from low-throughput studies, although there exist dedicated high-throughput methods for the identification of domain-motif interactions. The methods include arrays of peptides or proteins, display of peptides on phage or yeast, and yeast-two-hybrid experiments. We here provide a survey of scalable methods for domain-motif interaction profiling. These methods have frequently been applied to a limited number of ubiquitous domain families. It is now time to apply them to a broader set of peptide binding proteins, to provide a comprehensive picture of the linear motifs in the human proteome and to link them to their potential binding partners. Despite the plethora of methods, it is still a challenge for most approaches to identify interactions that rely on post-translational modification or context dependent or conditional interactions, suggesting directions for further method development.
Collapse
Affiliation(s)
- Cecilia Blikstad
- Department of Chemistry - BMC, Husargatan 3, 751 23, Uppsala, Sweden
| | - Ylva Ivarsson
- Department of Chemistry - BMC, Husargatan 3, 751 23, Uppsala, Sweden.
| |
Collapse
|
50
|
Song T, Gu H. Discovering short linear protein motif based on selective training of profile hidden Markov models. J Theor Biol 2015; 377:75-84. [PMID: 25791288 DOI: 10.1016/j.jtbi.2015.03.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2014] [Revised: 03/06/2015] [Accepted: 03/07/2015] [Indexed: 11/20/2022]
Abstract
Short linear motifs (SLiMs) in proteins are relatively conservative sequence patterns within disordered regions of proteins, typically 3-10 amino acids in length. They play an important role in mediating protein-protein interactions. Discovering SLiMs by computational methods has attracted more and more attention, most of which were based on regular expressions and profiles. In this paper, a de novo motif discovery method was proposed based on profile hidden Markov models (HMMs), which can not only provide the emission probabilities of amino acids in the defined positions of SLiMs, but also model the undefined positions. We adopted the ordered region masking and the relative local conservation (RLC) masking to improve the signal to noise ratio of the query sequences while applying evolutionary weighting to make the important sequences in evolutionary process get more attention by the selective training of profile HMMs. The experimental results show that our method and the profile-based method returned different subsets within a SLiMs dataset, and the performance of the two approaches are equivalent on a more realistic discovery dataset. Profile HMM-based motif discovery methods complement the existing methods and provide another way for SLiMs analysis.
Collapse
Affiliation(s)
- Tao Song
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Hong Gu
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China.
| |
Collapse
|