1
|
Gong Y, Li R, Liu Y, Wang J, Cao B, Fu X, Li R, Chen DZ. MR2CPPIS: Accurate prediction of protein-protein interaction sites based on multi-scale Res2Net with coordinate attention mechanism. Comput Biol Med 2024; 176:108543. [PMID: 38744015 DOI: 10.1016/j.compbiomed.2024.108543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 04/09/2024] [Accepted: 04/28/2024] [Indexed: 05/16/2024]
Abstract
Proteins play a vital role in various biological processes and achieve their functions through protein-protein interactions (PPIs). Thus, accurate identification of PPI sites is essential. Traditional biological methods for identifying PPIs are costly, labor-intensive, and time-consuming. The development of computational prediction methods for PPI sites offers promising alternatives. Most known deep learning (DL) methods employ layer-wise multi-scale CNNs to extract features from protein sequences. But, these methods usually neglect the spatial positions and hierarchical information embedded within protein sequences, which are actually crucial for PPI site prediction. In this paper, we propose MR2CPPIS, a novel sequence-based DL model that utilizes the multi-scale Res2Net with coordinate attention mechanism to exploit multi-scale features and enhance PPI site prediction capability. We leverage the multi-scale Res2Net to expand the receptive field for each network layer, thus capturing multi-scale information of protein sequences at a granular level. To further explore the local contextual features of each target residue, we employ a coordinate attention block to characterize the precise spatial position information, enabling the network to effectively extract long-range dependencies. We evaluate our MR2CPPIS on three public benchmark datasets (Dset 72, Dset 186, and PDBset 164), achieving state-of-the-art performance. The source codes are available at https://github.com/YyinGong/MR2CPPIS.
Collapse
Affiliation(s)
- Yinyin Gong
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China; Hunan Engineering Research Center of Advanced Embedded Computing and Intelligent Medical Systems, Hunan University, Changsha, 410082, China
| | - Rui Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China; Hunan Engineering Research Center of Advanced Embedded Computing and Intelligent Medical Systems, Hunan University, Changsha, 410082, China.
| | - Yan Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China; Hunan Engineering Research Center of Advanced Embedded Computing and Intelligent Medical Systems, Hunan University, Changsha, 410082, China
| | - Jilong Wang
- Peng Cheng Laboratory, Shenzhen, 518066, China
| | - Buwen Cao
- College of Information and Electronic Engineering, Hunan City University, Yiyang, 413002, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Renfa Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China; Hunan Engineering Research Center of Advanced Embedded Computing and Intelligent Medical Systems, Hunan University, Changsha, 410082, China
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
2
|
Oriol F, Alberto M, Joachim AP, Patrick G, M BP, Ruben MF, Jaume B, Altair CH, Ferran P, Oriol G, Narcis FF, Baldo O. Structure-based learning to predict and model protein-DNA interactions and transcription-factor co-operativity in cis-regulatory elements. NAR Genom Bioinform 2024; 6:lqae068. [PMID: 38867914 PMCID: PMC11167492 DOI: 10.1093/nargab/lqae068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 04/18/2024] [Accepted: 05/23/2024] [Indexed: 06/14/2024] Open
Abstract
Transcription factor (TF) binding is a key component of genomic regulation. There are numerous high-throughput experimental methods to characterize TF-DNA binding specificities. Their application, however, is both laborious and expensive, which makes profiling all TFs challenging. For instance, the binding preferences of ∼25% human TFs remain unknown; they neither have been determined experimentally nor inferred computationally. We introduce a structure-based learning approach to predict the binding preferences of TFs and the automated modelling of TF regulatory complexes. We show the advantage of using our approach over the classical nearest-neighbor prediction in the limits of remote homology. Starting from a TF sequence or structure, we predict binding preferences in the form of motifs that are then used to scan a DNA sequence for occurrences. The best matches are either profiled with a binding score or collected for their subsequent modeling into a higher-order regulatory complex with DNA. Co-operativity is modelled by: (i) the co-localization of TFs and (ii) the structural modeling of protein-protein interactions between TFs and with co-factors. We have applied our approach to automatically model the interferon-β enhanceosome and the pioneering complexes of OCT4, SOX2 (or SOX11) and KLF4 with a nucleosome, which are compared with the experimentally known structures.
Collapse
Affiliation(s)
- Fornes Oriol
- Centre for Molecular Medicine and Therapeutics. BC Children's Hospital Research Institute. Department of Medical Genetics. University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Meseguer Alberto
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | | | - Gohl Patrick
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Bota Patricia M
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Molina-Fernández Ruben
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Bonet Jaume
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
- Laboratory of Protein Design & Immunoengineering. School of Engineering. Ecole Polytechnique Federale de Lausanne. Lausanne 1015, Vaud, Switzerland
| | - Chinchilla-Hernandez Altair
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Pegenaute Ferran
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Gallego Oriol
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Fernandez-Fuentes Narcis
- Institute of Biological, Environmental and Rural Science. Aberystwyth University, SY23 3DA Aberystwyth, UK
| | - Oliva Baldo
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| |
Collapse
|
3
|
Roca-Martínez J, Dhondge H, Sattler M, Vranken WF. Deciphering the RRM-RNA recognition code: A computational analysis. PLoS Comput Biol 2023; 19:e1010859. [PMID: 36689472 PMCID: PMC9894542 DOI: 10.1371/journal.pcbi.1010859] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 02/02/2023] [Accepted: 01/07/2023] [Indexed: 01/24/2023] Open
Abstract
RNA recognition motifs (RRM) are the most prevalent class of RNA binding domains in eucaryotes. Their RNA binding preferences have been investigated for almost two decades, and even though some RRM domains are now very well described, their RNA recognition code has remained elusive. An increasing number of experimental structures of RRM-RNA complexes has become available in recent years. Here, we perform an in-depth computational analysis to derive an RNA recognition code for canonical RRMs. We present and validate a computational scoring method to estimate the binding between an RRM and a single stranded RNA, based on structural data from a carefully curated multiple sequence alignment, which can predict RRM binding RNA sequence motifs based on the RRM protein sequence. Given the importance and prevalence of RRMs in humans and other species, this tool could help design RNA binding motifs with uses in medical or synthetic biology applications, leading towards the de novo design of RRMs with specific RNA recognition.
Collapse
Affiliation(s)
- Joel Roca-Martínez
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
- Structural biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
| | | | - Michael Sattler
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich, Neuherberg, Germany
- Bavarian NMR Center, Department of Bioscience, School of Natural Sciences, Technical University of Munich, Garching, Germany
| | - Wim F. Vranken
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
- Structural biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- * E-mail:
| |
Collapse
|
4
|
Banerjee A, Mazumder A, Roy J, Das J, Majumdar A, Chatterjee A, Biswas NK, Chawla Sarkar M, Das S, Dutta S, Maitra A. Emergence of a unique SARS-CoV-2 Delta sub-cluster harboring a constellation of co-appearing non-Spike mutations. J Med Virol 2023; 95:e28413. [PMID: 36541745 PMCID: PMC9878222 DOI: 10.1002/jmv.28413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 11/21/2022] [Accepted: 12/15/2022] [Indexed: 12/24/2022]
Abstract
Accumulation of diverse mutations across the structural and nonstructural genes is leading to rapid evolution of SARS-CoV-2, altering its pathogenicity. We performed whole genome sequencing of 239 SARS-CoV-2 RNA samples collected from both adult and pediatric patients across eastern India (West Bengal), during the second pandemic wave in India (April-May 2021). In addition to several common spike mutations within the Delta variant, a unique constellation of eight co-appearing non-Spike mutations was identified, which revealed a high degree of positive mutual correlation. Our results also demonstrated the dynamics of SARS-CoV-2 variants among unvaccinated pediatric patients. 41.4% of our studied Delta strains harbored this signature set of eight co-appearing non-Spike mutations and phylogenetically out-clustered other Delta sub-lineages like 21J, 21A, or 21I. This is the first report from eastern India that portrayed a landscape of co-appearing mutations in the non-Spike proteins, which might have led to the evolution of a distinct Delta subcluster. Accumulation of such mutations in SARS-CoV-2 may lead to the emergence of "vaccine-evading variants." Hence, monitoring of such non-Spike mutations will be significant in the formulation of any future vaccines against those SARS-CoV-2 variants that might evade the current vaccine-induced immunity, among both the pediatric and adult populations.
Collapse
Affiliation(s)
| | - Anup Mazumder
- National Institute of Biomedical GenomicsKalyaniIndia
| | - Jayita Roy
- National Institute of Biomedical GenomicsKalyaniIndia
| | | | - Agniva Majumdar
- ICMR‐National Institute of Cholera and Enteric DiseasesKolkataIndia
| | | | | | | | - Saumitra Das
- National Institute of Biomedical GenomicsKalyaniIndia,Department of Microbiology and Cell BiologyIndian Institute of ScienceBengaluruIndia
| | - Shanta Dutta
- ICMR‐National Institute of Cholera and Enteric DiseasesKolkataIndia
| | | |
Collapse
|
5
|
Meseguer A, Årman F, Fornes O, Molina-Fernández R, Bonet J, Fernandez-Fuentes N, Oliva B. On the prediction of DNA-binding preferences of C2H2-ZF domains using structural models: application on human CTCF. NAR Genom Bioinform 2021; 2:lqaa046. [PMID: 33575598 PMCID: PMC7671317 DOI: 10.1093/nargab/lqaa046] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 05/07/2020] [Accepted: 06/10/2020] [Indexed: 12/25/2022] Open
Abstract
Cis2-His2 zinc finger (C2H2-ZF) proteins are the largest family of transcription factors in human and higher metazoans. To date, the DNA-binding preferences of many members of this family remain unknown. We have developed a computational method to predict their DNA-binding preferences. We have computed theoretical position weight matrices (PWMs) of proteins composed by C2H2-ZF domains, with the only requirement of an input structure. We have predicted more than two-third of a single zinc-finger domain binding site for about 70% variants of Zif268, a classical member of this family. We have successfully matched between 60 and 90% of the binding-site motif of examples of proteins composed by three C2H2-ZF domains in JASPAR, a standard database of PWMs. The tests are used as a proof of the capacity to scan a DNA fragment and find the potential binding sites of transcription-factors formed by C2H2-ZF domains. As an example, we have tested the approach to predict the DNA-binding preferences of the human chromatin binding factor CTCF. We offer a server to model the structure of a zinc-finger protein and predict its PWM.
Collapse
Affiliation(s)
- Alberto Meseguer
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona, Catalonia 08005, Spain
| | - Filip Årman
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona, Catalonia 08005, Spain
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Ruben Molina-Fernández
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona, Catalonia 08005, Spain
| | - Jaume Bonet
- Laboratory of Protein Design & Immunoengineering, School of Engineering, Ecole Polytechnique Federale de Lausanne, Lausanne 1015, Vaud, Switzerland
| | - Narcis Fernandez-Fuentes
- Department of Biosciences, U Science Tech, Universitat de Vic-Universitat Central de Catalunya, Vic, Catalonia 08500, Spain
| | - Baldo Oliva
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona, Catalonia 08005, Spain
| |
Collapse
|
6
|
Aguirre-Plans J, Meseguer A, Molina-Fernandez R, Marín-López MA, Jumde G, Casanova K, Bonet J, Fornes O, Fernandez-Fuentes N, Oliva B. SPServer: split-statistical potentials for the analysis of protein structures and protein-protein interactions. BMC Bioinformatics 2021; 22:4. [PMID: 33407073 PMCID: PMC7788957 DOI: 10.1186/s12859-020-03770-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 09/20/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Statistical potentials, also named knowledge-based potentials, are scoring functions derived from empirical data that can be used to evaluate the quality of protein folds and protein-protein interaction (PPI) structures. In previous works we decomposed the statistical potentials in different terms, named Split-Statistical Potentials, accounting for the type of amino acid pairs, their hydrophobicity, solvent accessibility and type of secondary structure. These potentials have been successfully used to identify near-native structures in protein structure prediction, rank protein docking poses, and predict PPI binding affinities. RESULTS Here, we present the SPServer, a web server that applies the Split-Statistical Potentials to analyze protein folds and protein interfaces. SPServer provides global scores as well as residue/residue-pair profiles presented as score plots and maps. This level of detail allows users to: (1) identify potentially problematic regions on protein structures; (2) identify disrupting amino acid pairs in protein interfaces; and (3) compare and analyze the quality of tertiary and quaternary structural models. CONCLUSIONS While there are many web servers that provide scoring functions to assess the quality of either protein folds or PPI structures, SPServer integrates both aspects in a unique easy-to-use web server. Moreover, the server permits to locally assess the quality of the structures and interfaces at a residue level and provides tools to compare the local assessment between structures. SERVER ADDRESS: https://sbi.upf.edu/spserver/ .
Collapse
Grants
- BIO2017-85329-R (FEDER,UE) Ministerio de Economía, Industria y Competitividad, Gobierno de España
- BIO2017-83591-R(FEDER,UE Ministerio de Economía, Industria y Competitividad, Gobierno de España
- RYC-2015-17519 Ministerio de Economía, Industria y Competitividad, Gobierno de España
- MDM-2014-0370 Ministerio de Economía, Industria y Competitividad, Gobierno de España
- FI Agència de Gestió d'Ajuts Universitaris i de Recerca
- 2017 SGR 01020 Agència de Gestió d'Ajuts Universitaris i de Recerca
- PT13/0001/0023 Instituto de Salud Carlos III
- Agència de Gestió d’Ajuts Universitaris i de Recerca
Collapse
Affiliation(s)
- Joaquim Aguirre-Plans
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Alberto Meseguer
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Ruben Molina-Fernandez
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Manuel Alejandro Marín-López
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Gaurav Jumde
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Kevin Casanova
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Jaume Bonet
- Laboratory of Protein Design and Immuno-Enginneering, School of Engineering, Ecole Polytechnique Federale de Lausanne, 1015, Lausanne, Vaud, Switzerland
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada
| | - Narcis Fernandez-Fuentes
- Department of Biosciences, U Science Tech, Universitat de Vic-Universitat Central de Catalunya, Vic 08500, Barcelona, Catalonia, Spain
- Institute of Biological, Environ-Mental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 3EB, UK
| | - Baldo Oliva
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain.
| |
Collapse
|
7
|
Lou H, Cukier RI. A maximum entropy principle approach to a joint probability model for sequences with known neighbor and next neighbor pair probabilities. Chem Phys 2020. [DOI: 10.1016/j.chemphys.2020.110872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
8
|
Meseguer A, Dominguez L, Bota PM, Aguirre‐Plans J, Bonet J, Fernandez‐Fuentes N, Oliva B. Using collections of structural models to predict changes of binding affinity caused by mutations in protein-protein interactions. Protein Sci 2020; 29:2112-2130. [PMID: 32797645 PMCID: PMC7513729 DOI: 10.1002/pro.3930] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 08/04/2020] [Accepted: 08/05/2020] [Indexed: 12/24/2022]
Abstract
Protein-protein interactions (PPIs) in all the molecular aspects that take place both inside and outside cells. However, determining experimentally the structure and affinity of PPIs is expensive and time consuming. Therefore, the development of computational tools, as a complement to experimental methods, is fundamental. Here, we present a computational suite: MODPIN, to model and predict the changes of binding affinity of PPIs. In this approach we use homology modeling to derive the structures of PPIs and score them using state-of-the-art scoring functions. We explore the conformational space of PPIs by generating not a single structural model but a collection of structural models with different conformations based on several templates. We apply the approach to predict the changes in free energy upon mutations and splicing variants of large datasets of PPIs to statistically quantify the quality and accuracy of the predictions. As an example, we use MODPIN to study the effect of mutations in the interaction between colicin endonuclease 9 and colicin endonuclease 2 immune protein from Escherichia coli. Finally, we have compared our results with other state-of-art methods.
Collapse
Affiliation(s)
- Alberto Meseguer
- Structural Bioinformatics Group, Research Programme on Biomedical Informatics, Department of Experimental and Health ScienceUniversitat Pompeu FabraBarcelonaCataloniaSpain
| | - Lluis Dominguez
- Integrative Biomedical Informatics Group (GRIB‐IMIM). Department of Experimental and Life SciencesUniversitat Pompeu FabraBarcelonaCataloniaSpain
| | - Patricia M. Bota
- Structural Bioinformatics Group, Research Programme on Biomedical Informatics, Department of Experimental and Health ScienceUniversitat Pompeu FabraBarcelonaCataloniaSpain
- Department of BiosciencesUniversitat de Vic‐Universitat Central de CatalunyaVicCataloniaSpain
| | - Joaquim Aguirre‐Plans
- Structural Bioinformatics Group, Research Programme on Biomedical Informatics, Department of Experimental and Health ScienceUniversitat Pompeu FabraBarcelonaCataloniaSpain
| | - Jaume Bonet
- Structural Bioinformatics Group, Research Programme on Biomedical Informatics, Department of Experimental and Health ScienceUniversitat Pompeu FabraBarcelonaCataloniaSpain
| | - Narcis Fernandez‐Fuentes
- Department of BiosciencesUniversitat de Vic‐Universitat Central de CatalunyaVicCataloniaSpain
- Institute of Biological, Environmental and Rural SciencesAberystwyth UniversityAberystwythUK
| | - Baldo Oliva
- Structural Bioinformatics Group, Research Programme on Biomedical Informatics, Department of Experimental and Health ScienceUniversitat Pompeu FabraBarcelonaCataloniaSpain
| |
Collapse
|
9
|
Siebenmorgen T, Zacharias M. Computational prediction of protein–protein binding affinities. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1448] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Affiliation(s)
- Till Siebenmorgen
- Physics Department T38 Technical University of Munich Garching Germany
| | - Martin Zacharias
- Physics Department T38 Technical University of Munich Garching Germany
| |
Collapse
|
10
|
Jung Y, El-Manzalawy Y, Dobbs D, Honavar VG. Partner-specific prediction of RNA-binding residues in proteins: A critical assessment. Proteins 2018; 87:198-211. [PMID: 30536635 PMCID: PMC6389706 DOI: 10.1002/prot.25639] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 10/10/2018] [Accepted: 11/29/2018] [Indexed: 01/06/2023]
Abstract
RNA-protein interactions play essential roles in regulating gene expression. While some RNA-protein interactions are "specific", that is, the RNA-binding proteins preferentially bind to particular RNA sequence or structural motifs, others are "non-RNA specific." Deciphering the protein-RNA recognition code is essential for comprehending the functional implications of these interactions and for developing new therapies for many diseases. Because of the high cost of experimental determination of protein-RNA interfaces, there is a need for computational methods to identify RNA-binding residues in proteins. While most of the existing computational methods for predicting RNA-binding residues in RNA-binding proteins are oblivious to the characteristics of the partner RNA, there is growing interest in methods for partner-specific prediction of RNA binding sites in proteins. In this work, we assess the performance of two recently published partner-specific protein-RNA interface prediction tools, PS-PRIP, and PRIdictor, along with our own new tools. Specifically, we introduce a novel metric, RNA-specificity metric (RSM), for quantifying the RNA-specificity of the RNA binding residues predicted by such tools. Our results show that the RNA-binding residues predicted by previously published methods are oblivious to the characteristics of the putative RNA binding partner. Moreover, when evaluated using partner-agnostic metrics, RNA partner-specific methods are outperformed by the state-of-the-art partner-agnostic methods. We conjecture that either (a) the protein-RNA complexes in PDB are not representative of the protein-RNA interactions in nature, or (b) the current methods for partner-specific prediction of RNA-binding residues in proteins fail to account for the differences in RNA partner-specific versus partner-agnostic protein-RNA interactions, or both.
Collapse
Affiliation(s)
- Yong Jung
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania.,Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, Pennsylvania.,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania
| | - Yasser El-Manzalawy
- Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, Pennsylvania.,Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, Pennsylvania.,College of Information Sciences and Technology, Pennsylvania State University, Pennsylvania
| | - Drena Dobbs
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, Iowa.,Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, Iowa
| | - Vasant G Honavar
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania.,Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, Pennsylvania.,Institute for Cyberscience, Pennsylvania State University, University Park, Pennsylvania.,Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, Pennsylvania.,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania.,College of Information Sciences and Technology, Pennsylvania State University, Pennsylvania
| |
Collapse
|
11
|
Marín-López MA, Planas-Iglesias J, Aguirre-Plans J, Bonet J, Garcia-Garcia J, Fernandez-Fuentes N, Oliva B. On the mechanisms of protein interactions: predicting their affinity from unbound tertiary structures. Bioinformatics 2018; 34:592-598. [PMID: 29028891 PMCID: PMC5860604 DOI: 10.1093/bioinformatics/btx616] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Accepted: 09/26/2017] [Indexed: 12/12/2022] Open
Abstract
Motivation The characterization of the protein–protein association mechanisms is crucial to understanding how biological processes occur. It has been previously shown that the early formation of non-specific encounters enhances the realization of the stereospecific (i.e. native) complex by reducing the dimensionality of the search process. The association rate for the formation of such complex plays a crucial role in the cell biology and depends on how the partners diffuse to be close to each other. Predicting the binding free energy of proteins provides new opportunities to modulate and control protein–protein interactions. However, existing methods require the 3D structure of the complex to predict its affinity, severely limiting their application to interactions with known structures. Results We present a new approach that relies on the unbound protein structures and protein docking to predict protein–protein binding affinities. Through the study of the docking space (i.e. decoys), the method predicts the binding affinity of the query proteins when the actual structure of the complex itself is unknown. We tested our approach on a set of globular and soluble proteins of the newest affinity benchmark, obtaining accuracy values comparable to other state-of-art methods: a 0.4 correlation coefficient between the experimental and predicted values of ΔG and an error < 3 Kcal/mol. Availability and implementation The binding affinity predictor is implemented and available at http://sbi.upf.edu/BADock and https://github.com/badocksbi/BADock. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Manuel Alejandro Marín-López
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, Barcelona 08003, Spain
| | - Joan Planas-Iglesias
- Division of Metabolic and Vascular Health, University of Warwick, Coventry CV4?7AL, UK
| | - Joaquim Aguirre-Plans
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, Barcelona 08003, Spain
| | - Jaume Bonet
- Laboratory of Protein Design and Immunoenginneering, School of Engineering, Ecole Polytechnique Federale de Lausanne, Lausanne 1015, Switzerland
| | - Javier Garcia-Garcia
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, Barcelona 08003, Spain
| | - Narcis Fernandez-Fuentes
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth SY23?3DA, UK
| | - Baldo Oliva
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, Barcelona 08003, Spain
| |
Collapse
|
12
|
Schneidman-Duhovny D, Khuri N, Dong GQ, Winter MB, Shifrut E, Friedman N, Craik CS, Pratt KP, Paz P, Aswad F, Sali A. Predicting CD4 T-cell epitopes based on antigen cleavage, MHCII presentation, and TCR recognition. PLoS One 2018; 13:e0206654. [PMID: 30399156 PMCID: PMC6219782 DOI: 10.1371/journal.pone.0206654] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2017] [Accepted: 10/17/2018] [Indexed: 12/16/2022] Open
Abstract
Accurate predictions of T-cell epitopes would be useful for designing vaccines, immunotherapies for cancer and autoimmune diseases, and improved protein therapies. The humoral immune response involves uptake of antigens by antigen presenting cells (APCs), APC processing and presentation of peptides on MHC class II (pMHCII), and T-cell receptor (TCR) recognition of pMHCII complexes. Most in silico methods predict only peptide-MHCII binding, resulting in significant over-prediction of CD4 T-cell epitopes. We present a method, ITCell, for prediction of T-cell epitopes within an input protein antigen sequence for given MHCII and TCR sequences. The method integrates information about three stages of the immune response pathway: antigen cleavage, MHCII presentation, and TCR recognition. First, antigen cleavage sites are predicted based on the cleavage profiles of cathepsins S, B, and H. Second, for each 12-mer peptide in the antigen sequence we predict whether it will bind to a given MHCII, based on the scores of modeled peptide-MHCII complexes. Third, we predict whether or not any of the top scoring peptide-MHCII complexes can bind to a given TCR, based on the scores of modeled ternary peptide-MHCII-TCR complexes and the distribution of predicted cleavage sites. Our benchmarks consist of epitope predictions generated by this algorithm, checked against 20 peptide-MHCII-TCR crystal structures, as well as epitope predictions for four peptide-MHCII-TCR complexes with known epitopes and TCR sequences but without crystal structures. ITCell successfully identified the correct epitopes as one of the 20 top scoring peptides for 22 of 24 benchmark cases. To validate the method using a clinically relevant application, we utilized five factor VIII-specific TCR sequences from hemophilia A subjects who developed an immune response to factor VIII replacement therapy. The known HLA-DR1-restricted factor VIII epitope was among the six top-scoring factor VIII peptides predicted by ITCall to bind HLA-DR1 and all five TCRs. Our integrative approach is more accurate than current single-stage epitope prediction algorithms applied to the same benchmarks. It is freely available as a web server (http://salilab.org/itcell).
Collapse
Affiliation(s)
- Dina Schneidman-Duhovny
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, United States of America
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, United States of America
- * E-mail: (AS); (DS); (PP); (FA)
| | - Natalia Khuri
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, United States of America
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, United States of America
- Graduate Group in Biophysics, University of California at San Francisco, San Francisco, CA, United States of America
| | - Guang Qiang Dong
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, United States of America
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, United States of America
| | - Michael B. Winter
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, United States of America
| | - Eric Shifrut
- Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Nir Friedman
- Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Charles S. Craik
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, United States of America
- California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, San Francisco, CA, United States of America
| | - Kathleen P. Pratt
- Uniformed Services University of the Health Sciences, Bethesda, MD, United States of America
| | - Pedro Paz
- Bayer HealthCare, San Francisco, CA, United States of America
- * E-mail: (AS); (DS); (PP); (FA)
| | - Fred Aswad
- Bayer HealthCare, San Francisco, CA, United States of America
- * E-mail: (AS); (DS); (PP); (FA)
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, United States of America
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, United States of America
- Graduate Group in Biophysics, University of California at San Francisco, San Francisco, CA, United States of America
- * E-mail: (AS); (DS); (PP); (FA)
| |
Collapse
|
13
|
Abstract
Protein-RNA interactions play an important role in many biological processes. Computational methods such as docking have been developed to complement existing biophysical and structural biology techniques. Computational prediction of protein-RNA complex structures includes two steps: generating candidate structures from the individual protein and RNA parts and scoring the generated poses to pick out the correct one. In this work, we considered three recently developed data sets of protein-RNA complexes to evaluate and improve the performance of the FFT-based rigid-body docking algorithm implemented in the ICM package. An electrostatic term describing interactions between negatively charged phosphate groups and positively charged protein residues was added to the energy function used during the docking step to take into account the greater role that electrostatic interactions play in protein-RNA complexes. Next, the docking results were used to optimize a scoring function including van der Waals, electrostatic, and solvation terms. This optimization yielded a much smaller weight for the solvation term indicating that solvation energy may be less important for the scoring of protein-RNA structures. Rescoring of the generated poses with the new scoring function led to much higher success rates, while pose clustering by contact fingerprints produced further improvements, achieving a success rate of 0.66 for the top 100 structures.
Collapse
Affiliation(s)
- Yelena A Arnautova
- Molsoft L.L.C., 11199 Sorrento Valley Road, S209 , San Diego , California 92121 , United States
| | - Ruben Abagyan
- Skaggs School of Pharmacy and Pharmaceutical Sciences , University of California San Diego , La Jolla , California 92093 , United States
| | - Maxim Totrov
- Molsoft L.L.C., 11199 Sorrento Valley Road, S209 , San Diego , California 92121 , United States
| |
Collapse
|
14
|
Chowdhury S, Zhang J, Kurgan L. In Silico Prediction and Validation of Novel RNA Binding Proteins and Residues in the Human Proteome. Proteomics 2018; 18:e1800064. [PMID: 29806170 DOI: 10.1002/pmic.201800064] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Revised: 05/05/2018] [Indexed: 12/22/2022]
Abstract
Deciphering a complete landscape of protein-RNA interactions in the human proteome remains an elusive challenge. We computationally elucidate RNA binding proteins (RBPs) using an approach that complements previous efforts. We employ two modern complementary sequence-based methods that provide accurate predictions from the structured and the intrinsically disordered sequences, even in the absence of sequence similarity to the known RBPs. We generate and analyze putative RNA binding residues on the whole proteome scale. Using a conservative setting that ensures low, 5% false positive rate, we identify 1511 putative RBPs that include 281 known RBPs and 166 RBPs that were previously predicted. We empirically demonstrate that these overlaps are statistically significant. We also validate the putative RBPs based on two major hallmarks of their RNA binding residues: high levels of evolutionary conservation and enrichment in charged amino acids. Moreover, we show that the novel RBPs are significantly under-annotated functionally which coincides with the fact that they were not yet found to interact with RNAs. We provide two examples of our novel putative RBPs for which there is recent evidence of their interactions with RNAs. The dataset of novel putative RBPs and RNA binding residues for the future hypothesis generation is provided in the Supporting Information.
Collapse
Affiliation(s)
- Shomeek Chowdhury
- Dr. Vikram Sarabhai Institute of Cell and Molecular Biology, Maharaja Sayajirao University of Baroda, Gujarat, 390005, India.,Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - Jian Zhang
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA.,School of Computer and Information Technology, Xinyang Normal University, Xinyang, 464000, P. R. China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
15
|
Zhang J, Kurgan L. Review and comparative assessment of sequence-based predictors of protein-binding residues. Brief Bioinform 2017; 19:821-837. [DOI: 10.1093/bib/bbx022] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Indexed: 12/31/2022] Open
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
16
|
Kmiecik S, Gront D, Kolinski M, Wieteska L, Dawid AE, Kolinski A. Coarse-Grained Protein Models and Their Applications. Chem Rev 2016; 116:7898-936. [DOI: 10.1021/acs.chemrev.6b00163] [Citation(s) in RCA: 555] [Impact Index Per Article: 69.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Sebastian Kmiecik
- Faculty
of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Dominik Gront
- Faculty
of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Michal Kolinski
- Bioinformatics
Laboratory, Mossakowski Medical Research Center of the Polish Academy of Sciences, Pawinskiego 5, 02-106 Warsaw, Poland
| | - Lukasz Wieteska
- Faculty
of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
- Department
of Medical Biochemistry, Medical University of Lodz, Mazowiecka 6/8, 92-215 Lodz, Poland
| | | | - Andrzej Kolinski
- Faculty
of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| |
Collapse
|
17
|
Wu Z, Hu G, Yang J, Peng Z, Uversky VN, Kurgan L. In various protein complexes, disordered protomers have large per-residue surface areas and area of protein-, DNA- and RNA-binding interfaces. FEBS Lett 2015; 589:2561-9. [DOI: 10.1016/j.febslet.2015.08.014] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2015] [Revised: 07/31/2015] [Accepted: 08/03/2015] [Indexed: 11/28/2022]
|
18
|
Yan J, Friedrich S, Kurgan L. A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues. Brief Bioinform 2015; 17:88-105. [DOI: 10.1093/bib/bbv023] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Indexed: 01/07/2023] Open
|
19
|
Nagarajan R, Chothani SP, Ramakrishnan C, Sekijima M, Gromiha MM. Structure based approach for understanding organism specific recognition of protein-RNA complexes. Biol Direct 2015; 10:8. [PMID: 25886642 PMCID: PMC4352265 DOI: 10.1186/s13062-015-0039-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 02/03/2015] [Indexed: 12/11/2022] Open
Abstract
Background Protein-RNA interactions perform diverse functions within the cell. Understanding the recognition mechanism of protein-RNA complexes has been a challenging task in molecular and computational biology. In earlier works, the recognition mechanisms have been studied for a specific complex or using a set of non–redundant complexes. In this work, we have constructed 18 sets of same protein-RNA complexes belonging to different organisms from Protein Data Bank (PDB). The similarities and differences in each set of complexes have been revealed in terms of various sequence and structure based features such as root mean square deviation, sequence homology, propensity of binding site residues, variance, conservation at binding sites, binding segments, binding motifs of amino acid residues and nucleotides, preferred amino acid-nucleotide pairs and influence of neighboring residues for binding. Results We found that the proteins of mesophilic organisms have more number of binding sites than thermophiles and the binding propensities of amino acid residues are distinct in E. coli, H. sapiens, S. cerevisiae, thermophiles and archaea. Proteins prefer to bind with RNA using a single residue segment in all the organisms while RNA prefers to use a stretch of up to six nucleotides for binding with proteins. We have developed amino acid residue-nucleotide pair potentials for different organisms, which could be used for predicting the binding specificity. Further, molecular dynamics simulation studies on aspartyl tRNA synthetase complexed with aspartyl tRNA showed specific modes of recognition in E. coli, T. thermophilus and S. cerevisiae. Conclusion Based on structural analysis and molecular dynamics simulations we suggest that the mode of recognition depends on the type of the organism in a protein-RNA complex. Reviewers This article was reviewed by Sandor Pongor, Gajendra Raghava and Narayanaswamy Srinivasan. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0039-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Raju Nagarajan
- Department of Biotechnology, Bhupat Jyoti Metha School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India.
| | - Sonia Pankaj Chothani
- Department of Biotechnology, Bhupat Jyoti Metha School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India. .,Philips Research North America, 345 Scarborough Road, Briarcliff Manor, NY, 10510, USA.
| | - Chandrasekaran Ramakrishnan
- Department of Biotechnology, Bhupat Jyoti Metha School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India.
| | - Masakazu Sekijima
- Global Scientific Information and Computing Center (GSIC), Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8550, Japan.
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat Jyoti Metha School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India.
| |
Collapse
|
20
|
Joyce AP, Zhang C, Bradley P, Havranek JJ. Structure-based modeling of protein: DNA specificity. Brief Funct Genomics 2014; 14:39-49. [PMID: 25414269 DOI: 10.1093/bfgp/elu044] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein:DNA interactions are essential to a range of processes that maintain and express the information encoded in the genome. Structural modeling is an approach that aims to understand these interactions at the physicochemical level. It has been proposed that structural modeling can lead to deeper understanding of the mechanisms of protein:DNA interactions, and that progress in this field can not only help to rationalize the observed specificities of DNA-binding proteins but also to allow researchers to engineer novel DNA site specificities. In this review we discuss recent developments in the structural description of protein:DNA interactions and specificity, as well as the challenges facing the field in the future.
Collapse
|