1
|
Agarwal A, Kant S, Bahadur RP. Efficient mapping of RNA-binding residues in RNA-binding proteins using local sequence features of binding site residues in protein-RNA complexes. Proteins 2023; 91:1361-1379. [PMID: 37254800 DOI: 10.1002/prot.26528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 04/13/2023] [Accepted: 05/02/2023] [Indexed: 06/01/2023]
Abstract
Protein-RNA interactions play vital roles in plethora of biological processes such as regulation of gene expression, protein synthesis, mRNA processing and biogenesis. Identification of RNA-binding residues (RBRs) in proteins is essential to understand RNA-mediated protein functioning, to perform site-directed mutagenesis and to develop novel targeted drug therapies. Moreover, the extensive gap between sequence and structural data restricts the identification of binding sites in unsolved structures. However, efficient use of computational methods demanding only sequence to identify binding residues can bridge this huge sequence-structure gap. In this study, we have extensively studied protein-RNA interface in known RNA-binding proteins (RBPs). We find that the interface is highly enriched in basic and polar residues with Gly being the most common interface neighbor. We investigated several amino acid features and developed a method to predict putative RBRs from amino acid sequence. We have implemented balanced random forest (BRF) classifier with local residue features of protein sequences for prediction. With 5-fold cross-validations, the sequence pattern derived dipeptide composition based BRF model (DCP-BRF) resulted in an accuracy of 87.9%, specificity of 88.8%, sensitivity of 82.2%, Mathew's correlation coefficient of 0.60 and AUC of 0.93, performing better than few existing methods. We further validated our prediction model on known human RBPs through RBR prediction and could map ~54% of them. Further, knowledge of binding site preferences obtained from computational predictions combined with experimental validations of potential RNA binding sites can enhance our understanding of protein-RNA interactions. This may serve to accelerate investigations on functional roles of many novel RBPs.
Collapse
Affiliation(s)
- Ankita Agarwal
- School of Bio Science, Indian Institute of Technology Kharagpur, Kharagpur, India
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Shri Kant
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| |
Collapse
|
2
|
Solayman M, Litfin T, Singh J, Paliwal K, Zhou Y, Zhan J. Probing RNA structures and functions by solvent accessibility: an overview from experimental and computational perspectives. Brief Bioinform 2022; 23:6554125. [PMID: 35348613 PMCID: PMC9116373 DOI: 10.1093/bib/bbac112] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 03/03/2022] [Accepted: 03/04/2022] [Indexed: 12/30/2022] Open
Abstract
Characterizing RNA structures and functions have mostly been focused on 2D, secondary and 3D, tertiary structures. Recent advances in experimental and computational techniques for probing or predicting RNA solvent accessibility make this 1D representation of tertiary structures an increasingly attractive feature to explore. Here, we provide a survey of these recent developments, which indicate the emergence of solvent accessibility as a simple 1D property, adding to secondary and tertiary structures for investigating complex structure–function relations of RNAs.
Collapse
Affiliation(s)
- Md Solayman
- Institute for Glycomics, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Thomas Litfin
- Institute for Glycomics, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Parklands Dr. Southport, QLD 4222, Australia.,Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China.,Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Jian Zhan
- Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| |
Collapse
|
3
|
Zhou T, Rong J, Liu Y, Gong W, Li C. An ensemble approach to predict binding hotspots in protein-RNA interactions based on SMOTE data balancing and random grouping feature selection strategies. Bioinformatics 2022; 38:2452-2458. [PMID: 35253843 DOI: 10.1093/bioinformatics/btac138] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 01/15/2022] [Accepted: 03/02/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The identification of binding hotspots in protein-RNA interactions is crucial for understanding their potential recognition mechanisms and drug design. The experimental methods have many limitations, since they are usually time-consuming and labor-intensive. Thus, developing an effective and efficient theoretical method is urgently needed. RESULTS Here we present SREPRHot, a method to predict hotspots, defined as the residues whose mutation to alanine generate a binding free energy change ≥ 2.0 kcal/mol, while others use a cutoff of 1.0 kcal/mol to obtain balanced datasets. To deal with the dataset imbalance, Synthetic Minority Over-sampling Technique (SMOTE) is utilized to generate minority samples to achieve a dataset balance. Additionally, besides conventional features, we use two types of new features, residue interface propensity previously developed by us, and topological features obtained using node-weighted networks, and propose an effective Random Grouping feature selection strategy combined with a two-step method to determine an optimal feature set. Finally, a stacking ensemble classifier is adopted to build our model. The results show SREPRHot achieves a good performance with SEN, MCC and AUC of 0.900, 0.557 and 0.829 on the independent testing dataset. The comparison study indicates SREPRHot shows a promising performance. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/ChunhuaLiLab/SREPRHot. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tong Zhou
- Falcuty of Environmental and Life Sciences, Beijing University of Technology, Beijing, 100124, China
| | - Jie Rong
- Falcuty of Environmental and Life Sciences, Beijing University of Technology, Beijing, 100124, China
| | - Yang Liu
- Falcuty of Environmental and Life Sciences, Beijing University of Technology, Beijing, 100124, China
| | - Weikang Gong
- Falcuty of Environmental and Life Sciences, Beijing University of Technology, Beijing, 100124, China
| | - Chunhua Li
- Falcuty of Environmental and Life Sciences, Beijing University of Technology, Beijing, 100124, China
| |
Collapse
|
4
|
Solayman M, Litfin T, Zhou Y, Zhan J. High-throughput mapping of RNA solvent accessibility at the single-nucleotide resolution by RtcB ligation between a fixed 5'-OH-end linker and unique 3'-P-end fragments from hydroxyl radical cleavage. RNA Biol 2022; 19:1179-1189. [PMID: 36369947 PMCID: PMC9662193 DOI: 10.1080/15476286.2022.2145098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Given the challenges for the experimental determination of RNA tertiary structures, probing solvent accessibility has become increasingly important to gain functional insights. Among various chemical probes developed, backbone-cleaving hydroxyl radical is the only one that can provide unbiased detection of all accessible nucleotides. However, the readouts have been based on reverse transcription (RT) stop at the cleaving sites, which are prone to false positives due to PCR amplification bias, early drop-off of reverse transcriptase, and the use of random primers in RT reaction. Here, we introduced a fixed-primer method called RL-Seq by performing RtcB Ligation (RL) between a fixed 5'-OH-end linker and unique 3'-P-end fragments from hydroxyl radical cleavage prior to high-throughput sequencing. The application of this method to E. coli ribosomes confirmed its ability to accurately probe solvent accessibility with high sensitivity (low required sequencing depth) and accuracy (strong correlation to structure-derived values) at the single-nucleotide resolution. Moreover, a near-perfect correlation was found between the experiments with and without using unique molecular identifiers, indicating negligible PCR biases in RL-Seq. Further improvement of RL-Seq and its potential transcriptome-wide applications are discussed.
Collapse
Affiliation(s)
- Md Solayman
- Institute for Glycomics, Griffith University, Parklands Dr, Southport, QLD, Australia
| | - Thomas Litfin
- Institute for Glycomics, Griffith University, Parklands Dr, Southport, QLD, Australia
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Parklands Dr, Southport, QLD, Australia,Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, China,CONTACT Yaoqi Zhou Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, 518055, China
| | - Jian Zhan
- Institute for Glycomics, Griffith University, Parklands Dr, Southport, QLD, Australia,Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, China,Jian Zhan Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen518055, China
| |
Collapse
|
5
|
Basu S, Bahadur RP. Conservation and coevolution determine evolvability of different classes of disordered residues in human intrinsically disordered proteins. Proteins 2021; 90:632-644. [PMID: 34626492 DOI: 10.1002/prot.26261] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 10/07/2021] [Accepted: 10/07/2021] [Indexed: 12/19/2022]
Abstract
Structure, function, and evolution are interdependent properties of proteins. Diversity of protein functions arising from structural variations is a potential driving force behind protein evolvability. Intrinsically disordered proteins or regions (IDPs or IDRs) lack well-defined structure under normal physiological conditions, yet, they are highly functional. Increased occurrence of IDPs in eukaryotes compared to prokaryotes indicates strong correlation of protein evolution and disorderedness. IDPs generally have higher evolution rate compared to globular proteins. Structural pliability allows IDPs to accommodate multiple mutations without affecting their functional potential. Nevertheless, how evolutionary signals vary between different classes of disordered residues (DRs) in IDPs is poorly understood. This study addresses variation of evolutionary behavior in terms of residue conservation and intra-protein coevolution among structural and functional classes of DRs in IDPs. Analyses are performed on 579 human IDPs, which are classified based on length of IDRs, interacting partners and functional classes. We find short IDRs are less conserved than long IDRs or full IDPs. Functional classes which require flexibility and specificity to perform their activity comparatively evolve slower than others. Disorder promoting amino acids evolve faster than order promoting amino acids. Pro, Gly, Ile, and Phe have unique coevolving nature which further emphasizes on their roles in IDPs. This study sheds light on evolutionary footprints in different classes of DRs from human IDPs and enhances our understanding of the structural and functional potential of IDPs.
Collapse
Affiliation(s)
- Sushmita Basu
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| |
Collapse
|
6
|
Krepl M, Damberger FF, von Schroetter C, Theler D, Pokorná P, Allain FHT, Šponer J. Recognition of N6-Methyladenosine by the YTHDC1 YTH Domain Studied by Molecular Dynamics and NMR Spectroscopy: The Role of Hydration. J Phys Chem B 2021; 125:7691-7705. [PMID: 34258996 DOI: 10.1021/acs.jpcb.1c03541] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The YTH domain of YTHDC1 belongs to a class of protein "readers", recognizing the N6-methyladenosine (m6A) chemical modification in mRNA. Static ensemble-averaged structures revealed details of N6-methyl recognition via a conserved aromatic cage. Here, we performed molecular dynamics (MD) simulations along with nuclear magnetic resonance (NMR) and isothermal titration calorimetry (ITC) to examine how dynamics and solvent interactions contribute to the m6A recognition and negative selectivity toward an unmethylated substrate. The structured water molecules surrounding the bound RNA and the methylated substrate's ability to exclude bulk water molecules contribute to the YTH domain's preference for m6A. Intrusions of bulk water deep into the binding pocket disrupt binding of unmethylated adenosine. The YTHDC1's preference for the 5'-Gm6A-3' motif is partially facilitated by a network of water-mediated interactions between the 2-amino group of the guanosine and residues in the m6A binding pocket. The 5'-Im6A-3' (where I is inosine) motif can be recognized too, but disruption of the water network lowers affinity. The D479A mutant also disrupts the water network and destabilizes m6A binding. Our interdisciplinary study of the YTHDC1 protein-RNA complex reveals an unusual physical mechanism by which solvent interactions contribute toward m6A recognition.
Collapse
Affiliation(s)
- Miroslav Krepl
- Institute of Biophysics of the Czech Academy of Sciences, Kralovopolska 135, 612 65 Brno, Czech Republic
| | - Fred Franz Damberger
- Department of Biology, Institute of Biochemistry, ETH Zürich, 8093 Zürich, Switzerland
| | | | - Dominik Theler
- Department of Biology, Institute of Biochemistry, ETH Zürich, 8093 Zürich, Switzerland
| | - Pavlína Pokorná
- Institute of Biophysics of the Czech Academy of Sciences, Kralovopolska 135, 612 65 Brno, Czech Republic.,National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
| | - Frédéric H-T Allain
- Department of Biology, Institute of Biochemistry, ETH Zürich, 8093 Zürich, Switzerland
| | - Jiří Šponer
- Institute of Biophysics of the Czech Academy of Sciences, Kralovopolska 135, 612 65 Brno, Czech Republic.,Regional Centre of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacký University Olomouc, Olomouc 783 71, Czech Republic
| |
Collapse
|
7
|
Mei LC, Hao GF, Yang GF. Computational methods for predicting hotspots at protein-RNA interfaces. WILEY INTERDISCIPLINARY REVIEWS-RNA 2021; 13:e1675. [PMID: 34080311 DOI: 10.1002/wrna.1675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 05/13/2021] [Accepted: 05/14/2021] [Indexed: 11/10/2022]
Abstract
Protein-RNA interactions play essential roles in many critical biological events. A comprehensive understanding of the mechanisms underlying these interactions is helpful when studying cellular activities and therapeutic applications. Hotspots are a small portion of residues contributing much toward protein-RNA binding affinity. In pharmaceutical research, the hotspot residues are seen as the best option for designing small molecules to target proteins of therapeutic interest. With the accumulation of experimental data about protein-RNA interactions, computational methods have been produced for hotspot prediction on a large scale. In this review, we first present an overview of the existing databases for protein-RNA binding data. Furthermore, we outline the most adopted computational methods for hotspots prediction in protein-RNA interactions. Finally, we discuss the applications of hotspot prediction. This article is categorized under: RNA Interactions with Proteins and Other Molecules > Protein-RNA Recognition RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications RNA Methods > RNA Analyses In Vitro and In Silico.
Collapse
Affiliation(s)
- Long-Can Mei
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China
| | - Ge-Fei Hao
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China.,State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Research and Development Center for Fine Chemicals, Guizhou University, Guiyang, China
| | - Guang-Fu Yang
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China.,Collaborative Innovation Center of Chemical Science and Engineering, Tianjin, China
| |
Collapse
|
8
|
Sequeiros-Borja CE, Surpeta B, Brezovsky J. Recent advances in user-friendly computational tools to engineer protein function. Brief Bioinform 2021; 22:bbaa150. [PMID: 32743637 PMCID: PMC8138880 DOI: 10.1093/bib/bbaa150] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 06/03/2020] [Accepted: 06/16/2020] [Indexed: 12/14/2022] Open
Abstract
Progress in technology and algorithms throughout the past decade has transformed the field of protein design and engineering. Computational approaches have become well-engrained in the processes of tailoring proteins for various biotechnological applications. Many tools and methods are developed and upgraded each year to satisfy the increasing demands and challenges of protein engineering. To help protein engineers and bioinformaticians navigate this emerging wave of dedicated software, we have critically evaluated recent additions to the toolbox regarding their application for semi-rational and rational protein engineering. These newly developed tools identify and prioritize hotspots and analyze the effects of mutations for a variety of properties, comprising ligand binding, protein-protein and protein-nucleic acid interactions, and electrostatic potential. We also discuss notable progress to target elusive protein dynamics and associated properties like ligand-transport processes and allosteric communication. Finally, we discuss several challenges these tools face and provide our perspectives on the further development of readily applicable methods to guide protein engineering efforts.
Collapse
Affiliation(s)
- Carlos Eduardo Sequeiros-Borja
- Laboratory of Biomolecular Interactions and Transport, Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University and the International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland
| | - Bartłomiej Surpeta
- Laboratory of Biomolecular Interactions and Transport, Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University and the International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland
| | - Jan Brezovsky
- Laboratory of Biomolecular Interactions and Transport, Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University and the International Institute of Molecular and Cell Biology in Warsaw
| |
Collapse
|
9
|
Jiang Y, Liu HF, Liu R. Systematic comparison and prediction of the effects of missense mutations on protein-DNA and protein-RNA interactions. PLoS Comput Biol 2021; 17:e1008951. [PMID: 33872313 PMCID: PMC8084330 DOI: 10.1371/journal.pcbi.1008951] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 04/29/2021] [Accepted: 04/08/2021] [Indexed: 12/30/2022] Open
Abstract
The binding affinities of protein-nucleic acid interactions could be altered due to missense mutations occurring in DNA- or RNA-binding proteins, therefore resulting in various diseases. Unfortunately, a systematic comparison and prediction of the effects of mutations on protein-DNA and protein-RNA interactions (these two mutation classes are termed MPDs and MPRs, respectively) is still lacking. Here, we demonstrated that these two classes of mutations could generate similar or different tendencies for binding free energy changes in terms of the properties of mutated residues. We then developed regression algorithms separately for MPDs and MPRs by introducing novel geometric partition-based energy features and interface-based structural features. Through feature selection and ensemble learning, similar computational frameworks that integrated energy- and nonenergy-based models were established to estimate the binding affinity changes resulting from MPDs and MPRs, but the selected features for the final models were different and therefore reflected the specificity of these two mutation classes. Furthermore, the proposed methodology was extended to the identification of mutations that significantly decreased the binding affinities. Extensive validations indicated that our algorithm generally performed better than the state-of-the-art methods on both the regression and classification tasks. The webserver and software are freely available at http://liulab.hzau.edu.cn/PEMPNI and https://github.com/hzau-liulab/PEMPNI. Protein-nucleic acid interactions play important roles in various cellular processes. Missense mutations occurring in DNA- or RNA-binding proteins (termed MPDs and MPRs, respectively) could change the binding affinities of these interactions. Previous studies have compared protein-DNA and protein-RNA interactions from multifaceted viewpoints, but less attention has been given to the similarities and specific differences between the effects of MPDs and MPRs and between the methodologies for predicting the affinity changes induced by the two mutation classes. Therefore, we systematically compared their impacts and demonstrated that MPDs and MPRs could have specific preferences for binding affinity changes. These observations motivated us to construct regression models separately for MPDs and MPRs by introducing novel energy and nonenergy descriptors. Although similar frameworks were developed to estimate these two categories of mutation effects, different descriptors were selected in the regression models and further revealed the specificity of mutation classes. The interplay between the energy and nonenergy modules effectively improved prediction performance. Our algorithm can also be adopted to disentangle mutations significantly decreasing binding affinities from other mutations.
Collapse
Affiliation(s)
- Yao Jiang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Hui-Fang Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Rong Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| |
Collapse
|
10
|
Hwang JY, Jung S, Kook TL, Rouchka EC, Bok J, Park JW. rMAPS2: an update of the RNA map analysis and plotting server for alternative splicing regulation. Nucleic Acids Res 2020; 48:W300-W306. [PMID: 32286627 PMCID: PMC7319468 DOI: 10.1093/nar/gkaa237] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 03/26/2020] [Accepted: 04/01/2020] [Indexed: 11/17/2022] Open
Abstract
The rMAPS2 (RNA Map Analysis and Plotting Server 2) web server, freely available at http://rmaps.cecsresearch.org/, has provided the high-throughput sequencing data research community with curated tools for the identification of RNA binding protein sites. rMAPS2 analyzes differential alternative splicing or CLIP peak data obtained from high-throughput sequencing data analysis tools like MISO, rMATS, Piranha, PIPE-CLIP and PARalyzer, and then, graphically displays enriched RNA-binding protein target sites. The initial release of rMAPS focused only on the most common alternative splicing event, skipped exon or exon skipping. However, there was a high demand for the analysis of other major types of alternative splicing events, especially for retained intron events since this is the most common type of alternative splicing in plants, such as Arabidopsis thaliana. Here, we expanded the implementation of rMAPS2 to facilitate analyses for all five major types of alternative splicing events: skipped exon, mutually exclusive exons, alternative 5′ splice site, alternative 3′ splice site and retained intron. In addition, by employing multi-threading, rMAPS2 has vastly improved the user experience with significant reductions in running time, ∼3.5 min for the analysis of all five major alternative splicing types at once.
Collapse
Affiliation(s)
- Jae Y Hwang
- Department of Computer Science and Engineering, University of Louisville, Louisville, KY 40292, USA
| | - Sungbo Jung
- Department of Computer Science and Engineering, University of Louisville, Louisville, KY 40292, USA
| | - Tae L Kook
- Department of Computer Science and Engineering, University of Louisville, Louisville, KY 40292, USA
| | - Eric C Rouchka
- Department of Computer Science and Engineering, University of Louisville, Louisville, KY 40292, USA.,KBRIN Bioinformatics Core, University of Louisville, Louisville, KY 40292, USA
| | - Jinwoong Bok
- Department of Anatomy, Yonsei University College of Medicine, Seoul 03722, Republic of Korea.,Department of Otorhinolaryngology, Yonsei University College of Medicine, Seoul 03722, Republic of Korea.,BK21 PLUS project for Medical Science, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Juw W Park
- Department of Computer Science and Engineering, University of Louisville, Louisville, KY 40292, USA.,KBRIN Bioinformatics Core, University of Louisville, Louisville, KY 40292, USA
| |
Collapse
|
11
|
Zhang N, Lu H, Chen Y, Zhu Z, Yang Q, Wang S, Li M. PremPRI: Predicting the Effects of Missense Mutations on Protein-RNA Interactions. Int J Mol Sci 2020; 21:ijms21155560. [PMID: 32756481 PMCID: PMC7432928 DOI: 10.3390/ijms21155560] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 07/28/2020] [Accepted: 07/30/2020] [Indexed: 12/23/2022] Open
Abstract
Protein–RNA interactions are crucial for many cellular processes, such as protein synthesis and regulation of gene expression. Missense mutations that alter protein–RNA interaction may contribute to the pathogenesis of many diseases. Here, we introduce a new computational method PremPRI, which predicts the effects of single mutations occurring in RNA binding proteins on the protein–RNA interactions by calculating the binding affinity changes quantitatively. The multiple linear regression scoring function of PremPRI is composed of three sequence- and eight structure-based features, and is parameterized on 248 mutations from 50 protein–RNA complexes. Our model shows a good agreement between calculated and experimental values of binding affinity changes with a Pearson correlation coefficient of 0.72 and the corresponding root-mean-square error of 0.76 kcal·mol−1, outperforming three other available methods. PremPRI can be used for finding functionally important variants, understanding the molecular mechanisms, and designing new protein–RNA interaction inhibitors.
Collapse
|
12
|
Zhu X, Liu L, He J, Fang T, Xiong Y, Mitchell JC. iPNHOT: a knowledge-based approach for identifying protein-nucleic acid interaction hot spots. BMC Bioinformatics 2020; 21:289. [PMID: 32631222 PMCID: PMC7336410 DOI: 10.1186/s12859-020-03636-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Accepted: 06/25/2020] [Indexed: 12/20/2022] Open
Abstract
Background The interaction between proteins and nucleic acids plays pivotal roles in various biological processes such as transcription, translation, and gene regulation. Hot spots are a small set of residues that contribute most to the binding affinity of a protein-nucleic acid interaction. Compared to the extensive studies of the hot spots on protein-protein interfaces, the hot spot residues within protein-nucleic acids interfaces remain less well-studied, in part because mutagenesis data for protein-nucleic acids interaction are not as abundant as that for protein-protein interactions. Results In this study, we built a new computational model, iPNHOT, to effectively predict hot spot residues on protein-nucleic acids interfaces. One training data set and an independent test set were collected from dbAMEPNI and some recent literature, respectively. To build our model, we generated 97 different sequential and structural features and used a two-step strategy to select the relevant features. The final model was built based only on 7 features using a support vector machine (SVM). The features include two unique features such as ∆SASsa1/2 and esp3, which are newly proposed in this study. Based on the cross validation results, our model gave F1 score and AUROC as 0.725 and 0.807 on the subset collected from ProNIT, respectively, compared to 0.407 and 0.670 of mCSM-NA, a state-of-the art model to predict the thermodynamic effects of protein-nucleic acid interaction. The iPNHOT model was further tested on the independent test set, which showed that our model outperformed other methods. Conclusion In this study, by collecting data from a recently published database dbAMEPNI, we proposed a new model, iPNHOT, to predict hotspots on both protein-DNA and protein-RNA interfaces. The results show that our model outperforms the existing state-of-art models. Our model is available for users through a webserver: http://zhulab.ahu.edu.cn/iPNHOT/.
Collapse
Affiliation(s)
- Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui, China. .,School of Life Sciences, Anhui University, Hefei, Anhui, China.
| | - Ling Liu
- School of Life Sciences, Anhui University, Hefei, Anhui, China
| | - Jingjing He
- School of Life Sciences, Anhui University, Hefei, Anhui, China
| | - Ting Fang
- School of Life Sciences, Anhui University, Hefei, Anhui, China
| | - Yi Xiong
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Julie C Mitchell
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
| |
Collapse
|
13
|
Yang Z, Deng X, Liu Y, Gong W, Li C. Analyses on clustering of the conserved residues at protein-RNA interfaces and its application in binding site identification. BMC Bioinformatics 2020; 21:57. [PMID: 32066366 PMCID: PMC7027071 DOI: 10.1186/s12859-020-3398-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 02/07/2020] [Indexed: 12/26/2022] Open
Abstract
Background The maintenance of protein structural stability requires the cooperativity among spatially neighboring residues. Previous studies have shown that conserved residues tend to occur clustered together within enzyme active sites and protein-protein/DNA interfaces. It is possible that conserved residues form one or more local clusters in protein tertiary structures as it can facilitate the formation of functional motifs. In this work, we systematically investigate the spatial distributions of conserved residues as well as hot spot ones within protein-RNA interfaces. Results The analysis of 191 polypeptide chains from 160 complexes shows the polypeptides interacting with tRNAs evolve relatively rapidly. A statistical analysis of residues in different regions shows that the interface residues are often more conserved, while the most conserved ones are those occurring at protein interiors which maintain the stability of folded polypeptide chains. Additionally, we found that 77.8% of the interfaces have the conserved residues clustered within the entire interface regions. Appling the clustering characteristics to the identification of the real interface, there are 31.1% of cases where the real interfaces are ranked in top 10% of 1000 randomly generated surface patches. In the conserved clusters, the preferred residues are the hydrophobic (Leu, Ile, Met), aromatic (Tyr, Phe, Trp) and interestingly only one positively charged Arg residues. For the hot spot residues, 51.5% of them are situated in the conserved residue clusters, and they are largely consistent with the preferred residue types in the conserved clusters. Conclusions The protein-RNA interface residues are often more conserved than non-interface surface ones. The conserved interface residues occur more spatially clustered relative to the entire interface residues. The high consistence of hot spot residue types and the preferred residue types in the conserved clusters has important implications for the experimental alanine scanning mutagenesis study. This work deepens the understanding of the residual organization at protein-RNA interface and is of potential applications in the identification of binding site and hot spot residues.
Collapse
Affiliation(s)
- Zhen Yang
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, China
| | - Xueqing Deng
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, China
| | - Yang Liu
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, China
| | - Weikang Gong
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, China
| | - Chunhua Li
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, China.
| |
Collapse
|
14
|
Nithin C, Mukherjee S, Bahadur RP. A structure-based model for the prediction of protein-RNA binding affinity. RNA (NEW YORK, N.Y.) 2019; 25:1628-1645. [PMID: 31395671 PMCID: PMC6859855 DOI: 10.1261/rna.071779.119] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2019] [Accepted: 08/05/2019] [Indexed: 05/28/2023]
Abstract
Protein-RNA recognition is highly affinity-driven and regulates a wide array of cellular functions. In this study, we have curated a binding affinity data set of 40 protein-RNA complexes, for which at least one unbound partner is available in the docking benchmark. The data set covers a wide affinity range of eight orders of magnitude as well as four different structural classes. On average, we find the complexes with single-stranded RNA have the highest affinity, whereas the complexes with the duplex RNA have the lowest. Nevertheless, free energy gain upon binding is the highest for the complexes with ribosomal proteins and the lowest for the complexes with tRNA with an average of -5.7 cal/mol/Å2 in the entire data set. We train regression models to predict the binding affinity from the structural and physicochemical parameters of protein-RNA interfaces. The best fit model with the lowest maximum error is provided with three interface parameters: relative hydrophobicity, conformational change upon binding and relative hydration pattern. This model has been used for predicting the binding affinity on a test data set, generated using mutated structures of yeast aspartyl-tRNA synthetase, for which experimentally determined ΔG values of 40 mutations are available. The predicted ΔGempirical values highly correlate with the experimental observations. The data set provided in this study should be useful for further development of the binding affinity prediction methods. Moreover, the model developed in this study enhances our understanding on the structural basis of protein-RNA binding affinity and provides a platform to engineer protein-RNA interfaces with desired affinity.
Collapse
Affiliation(s)
- Chandran Nithin
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Sunandan Mukherjee
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| |
Collapse
|
15
|
Pilla SP, Bahadur RP. Residue conservation elucidates the evolution of r-proteins in ribosomal assembly and function. Int J Biol Macromol 2019; 140:323-329. [PMID: 31421176 DOI: 10.1016/j.ijbiomac.2019.08.127] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2019] [Revised: 08/14/2019] [Accepted: 08/14/2019] [Indexed: 02/08/2023]
Abstract
Ribosomes are the translational machineries having two unequal subunits, small subunit (SSU) and large subunit (LSU) across all the domains of life. Origin and evolution of ribosome are encoded in its structure, and the core of the ribosome is highly conserved. Here, we have used Shannon entropy to analyze the evolution of ribosomal proteins (r-proteins) across the three domains of life. Moreover, we have analyzed the residue conservation at protein-protein (PP) and protein-RNA (PR) interfaces in SSU and LSU. Furthermore, we have studied the evolution of early, intermediate and late binding r-proteins. We show that the r-proteins of Thermus thermophilus are better conserved during the evolution. Furthermore, we find the late binders are better conserved than the early and the intermediate binders. The residues at the interior of the r-proteins are the most conserved followed by those at the interface and the solvent accessible surface. Additionally, we show that the residues at the PP interfaces are better conserved than those at the PR interfaces. However, between PR and PP interfaces, the multi-interface residues at the former are better conserved than those at the latter ones. Our findings may provide insights into the evolution of r-proteins in ribosomal assembly and function.
Collapse
Affiliation(s)
- Smita P Pilla
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, India.
| |
Collapse
|
16
|
Deng L, Yang W, Liu H. PredPRBA: Prediction of Protein-RNA Binding Affinity Using Gradient Boosted Regression Trees. Front Genet 2019; 10:637. [PMID: 31428122 PMCID: PMC6688581 DOI: 10.3389/fgene.2019.00637] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Accepted: 06/18/2019] [Indexed: 01/24/2023] Open
Abstract
Protein-RNA interactions play essential roles in many biological aspects. Quantifying the binding affinity of protein-RNA complexes is helpful to the understanding of protein-RNA recognition mechanisms and identification of strong binding partners. Due to experimentally measured protein-RNA binding affinity data available is still limited to date, there is a pressing demand for accurate and reliable computational approaches. In this paper, we propose a computational approach, PredPRBA, which can effectively predict protein-RNA binding affinity using gradient boosted regression trees. We build a dataset of protein-RNA binding affinity that includes 103 protein-RNA complex structures manually collected from related literature. Then, we generate 37 kinds of sequence and structural features and explore the relationship between the features and protein-RNA binding affinity. We find that the binding affinity mainly depends on the structure of RNA molecules. According to the type of RNA associated with proteins composed of the protein-RNA complex, we split the 103 protein-RNA complexes into six categories. For each category, we build a gradient boosted regression tree (GBRT) model based on the generated features. We perform a comprehensive evaluation for the proposed method on the binding affinity dataset using leave-one-out cross-validation. We show that PredPRBA achieves correlations ranging from 0.723 to 0.897 among six categories, which is significantly better than other typical regression methods and the pioneer protein-RNA binding affinity predictor SPOT-Seq-RNA. In addition, a user-friendly web server has been developed to predict the binding affinity of protein-RNA complexes. The PredPRBA webserver is freely available at http://PredPRBA.denglab.org/.
Collapse
Affiliation(s)
- Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, China.,School of Software, Xinjiang University, Urumqi, China
| | - Wenyi Yang
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Hui Liu
- Lab of Information Management, Changzhou University, Changzhou, China
| |
Collapse
|
17
|
Pires DEV, Ascher DB. mCSM-NA: predicting the effects of mutations on protein-nucleic acids interactions. Nucleic Acids Res 2019; 45:W241-W246. [PMID: 28383703 PMCID: PMC5570212 DOI: 10.1093/nar/gkx236] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2017] [Accepted: 04/03/2017] [Indexed: 01/17/2023] Open
Abstract
Over the past two decades, several computational methods have been proposed to predict how missense mutations can affect protein structure and function, either by altering protein stability or interactions with its partners, shedding light into potential molecular mechanisms giving rise to different phenotypes. Effectively and efficiently predicting consequences of mutations on protein–nucleic acid interactions, however, remained until recently a great and unmet challenge. Here we report an updated webserver for mCSM–NA, the only scalable method we are aware of capable of quantitatively predicting the effects of mutations in protein coding regions on nucleic acid binding affinities. We have significantly enhanced the original method by including a pharmacophore modelling and information of nucleic acid properties into our graph-based signatures, considering the reverse mutation and by using a refined, more reliable data set, based on a new release of the ProNIT database, which has significantly improved the reliability and applicability of the methodology. Our new predictive model was capable of achieving a correlation coefficient of up to 0.70 on cross-validation and 0.68 on blind-tests, outperforming its previous version. The server is freely available via a user-friendly web interface at: http://structure.bioc.cam.ac.uk/mcsm_na.
Collapse
Affiliation(s)
| | - David B Ascher
- Centro de Pesquisas René Rachou, Fundação Oswaldo Cruz, Brazil.,Department of Biochemistry, University of Cambridge, Cambridge, UK.,Department of Biochemistry and Molecular Biology, University of Melbourne, Melbourne, Australia
| |
Collapse
|
18
|
Deng L, Sui Y, Zhang J. XGBPRH: Prediction of Binding Hot Spots at Protein⁻RNA Interfaces Utilizing Extreme Gradient Boosting. Genes (Basel) 2019; 10:genes10030242. [PMID: 30901953 PMCID: PMC6471955 DOI: 10.3390/genes10030242] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 03/14/2019] [Accepted: 03/15/2019] [Indexed: 01/24/2023] Open
Abstract
Hot spot residues at protein⁻RNA complexes are vitally important for investigating the underlying molecular recognition mechanism. Accurately identifying protein⁻RNA binding hot spots is critical for drug designing and protein engineering. Although some progress has been made by utilizing various available features and a series of machine learning approaches, these methods are still in the infant stage. In this paper, we present a new computational method named XGBPRH, which is based on an eXtreme Gradient Boosting (XGBoost) algorithm and can effectively predict hot spot residues in protein⁻RNA interfaces utilizing an optimal set of properties. Firstly, we download 47 protein⁻RNA complexes and calculate a total of 156 sequence, structure, exposure, and network features. Next, we adopt a two-step feature selection algorithm to extract a combination of 6 optimal features from the combination of these 156 features. Compared with the state-of-the-art approaches, XGBPRH achieves better performances with an area under the ROC curve (AUC) score of 0.817 and an F1-score of 0.802 on the independent test set. Meanwhile, we also apply XGBPRH to two case studies. The results demonstrate that the method can effectively identify novel energy hotspots.
Collapse
Affiliation(s)
- Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410075, China.
| | - Yuanchao Sui
- School of Computer Science and Engineering, Central South University, Changsha 410075, China.
| | - Jingpu Zhang
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan 467000, China.
| |
Collapse
|
19
|
Pan Y, Wang Z, Zhan W, Deng L. Computational identification of binding energy hot spots in protein-RNA complexes using an ensemble approach. Bioinformatics 2019; 34:1473-1480. [PMID: 29281004 DOI: 10.1093/bioinformatics/btx822] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Accepted: 12/19/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation Identifying RNA-binding residues, especially energetically favored hot spots, can provide valuable clues for understanding the mechanisms and functional importance of protein-RNA interactions. Yet, limited availability of experimentally recognized energy hot spots in protein-RNA crystal structures leads to the difficulties in developing empirical identification approaches. Computational prediction of RNA-binding hot spot residues is still in its infant stage. Results Here, we describe a computational method, PrabHot (Prediction of protein-RNA binding hot spots), that can effectively detect hot spot residues on protein-RNA binding interfaces using an ensemble of conceptually different machine learning classifiers. Residue interaction network features and new solvent exposure characteristics are combined together and selected for classification with the Boruta algorithm. In particular, two new reference datasets (benchmark and independent) have been generated containing 107 hot spots from 47 known protein-RNA complex structures. In 10-fold cross-validation on the training dataset, PrabHot achieves promising performances with an AUC score of 0.86 and a sensitivity of 0.78, which are significantly better than that of the pioneer RNA-binding hot spot prediction method HotSPRing. We also demonstrate the capability of our proposed method on the independent test dataset and gain a competitive advantage as a result. Availability and implementation The PrabHot webserver is freely available at http://denglab.org/PrabHot/. Contact leideng@csu.edu.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuliang Pan
- School of Software, Central South University, Changsha 410075, China
| | - Zixiang Wang
- School of Software, Central South University, Changsha 410075, China
| | - Weihua Zhan
- School of Electronics and Computer Science, Zhejiang Wanli University, Ningbo 315100, China
| | - Lei Deng
- School of Software, Central South University, Changsha 410075, China
- Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 200433, China
| |
Collapse
|
20
|
Krüger DM, Neubacher S, Grossmann TN. Protein-RNA interactions: structural characteristics and hotspot amino acids. RNA (NEW YORK, N.Y.) 2018; 24:1457-1465. [PMID: 30093489 PMCID: PMC6191724 DOI: 10.1261/rna.066464.118] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 08/06/2018] [Indexed: 06/01/2023]
Abstract
Structural information about protein-RNA complexes supports the understanding of crucial recognition processes in the cell, and it can allow the development of high affinity ligands to interfere with these processes. In this respect, the identification of amino acid hotspots is particularly important. In contrast to protein-protein interactions, in silico approaches for protein-RNA interactions lag behind in their development. Herein, we report an analysis of available protein-RNA structures. We assembled a data set of 322 crystal and NMR structures and analyzed them regarding interface properties. In addition, we describe a computational alanine-scanning approach which provides interaction scores for interface amino acids, allowing the identification of potential hotspots in protein-RNA interfaces. We have made the computational approach available as an online tool, which allows interaction scores to be calculated for any structure of a protein-RNA complex by uploading atomic coordinates to the PRI HotScore web server (https://pri-hotscore.labs.vu.nl).
Collapse
Affiliation(s)
- Dennis M Krüger
- Chemical Genomics Centre of the Max Planck Society, 44227 Dortmund, Germany
| | - Saskia Neubacher
- Department of Chemistry and Pharmaceutical Sciences, VU University Amsterdam, 1081 HV Amsterdam, The Netherlands
| | - Tom N Grossmann
- Chemical Genomics Centre of the Max Planck Society, 44227 Dortmund, Germany
- Department of Chemistry and Pharmaceutical Sciences, VU University Amsterdam, 1081 HV Amsterdam, The Netherlands
| |
Collapse
|
21
|
Liu L, Xiong Y, Gao H, Wei DQ, Mitchell JC, Zhu X. dbAMEPNI: a database of alanine mutagenic effects for protein-nucleic acid interactions. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:4959188. [PMID: 29688380 PMCID: PMC5887268 DOI: 10.1093/database/bay034] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Accepted: 03/15/2018] [Indexed: 01/08/2023]
Abstract
Protein–nucleic acid interactions play essential roles in various biological activities such as gene regulation, transcription, DNA repair and DNA packaging. Understanding the effects of amino acid substitutions on protein–nucleic acid binding affinities can help elucidate the molecular mechanism of protein–nucleic acid recognition. Until now, no comprehensive and updated database of quantitative binding data on alanine mutagenic effects for protein–nucleic acid interactions is publicly accessible. Thus, we developed a new database of Alanine Mutagenic Effects for Protein-Nucleic Acid Interactions (dbAMEPNI). dbAMEPNI is a manually curated, literature-derived database, comprising over 577 alanine mutagenic data with experimentally determined binding affinities for protein–nucleic acid complexes. It contains several important parameters, such as dissociation constant (Kd), Gibbs free energy change (ΔΔG), experimental conditions and structural parameters of mutant residues. In addition, the database provides an extended dataset of 282 single alanine mutations with only qualitative data (or descriptive effects) of thermodynamic information. Database URL: http://zhulab.ahu.edu.cn/dbAMEPNI
Collapse
Affiliation(s)
- Ling Liu
- School of Life Sciences, Anhui University, Hefei, Anhui 230601, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hongyun Gao
- Information and Engineering College, Dalian University, Dalian 116622, Liaoning, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Julie C Mitchell
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA.,Department of Mathematics, University of Wisconsin-Madison, Madison, WI 53706, USA.,Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN 37830, USA
| | - Xiaolei Zhu
- School of Life Sciences, Anhui University, Hefei, Anhui 230601, China
| |
Collapse
|
22
|
Kulandaisamy A, Srivastava A, Kumar P, Nagarajan R, Priya SB, Gromiha MM. Identification and Analysis of Key Residues in Protein-RNA Complexes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1436-1444. [PMID: 29993582 DOI: 10.1109/tcbb.2018.2834387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Protein-RNA complexes play important roles in various biological processes. The functions of protein-RNA complexes are dictated by their interactions, binding, stability, and affinity. In this work, we have identified the key residues (KRs), which are involved in both stability and binding. We found that 42 percent of considered proteins share common binding and stabilizing residues, whereas these residues are distinct in 58 percent of the proteins. Overall, 5 percent of stabilizing and 3 percent of binding residues serve as key residues. These residues are enriched with the combination of polar, charged, aliphatic, and aromatic residues. Analysis on subclasses of protein-RNA complexes based on protein structural class, function and RNA type showed that regulatory proteins, and complexes with single stranded RNA and rRNA have appreciable number of key residues. Specifically, Arg, Tyr, and Thr are preferred in most of the subclasses of protein-RNA complexes. In addition, residues with similar chemical behavior have different preferences to be KRs, such that Arg, Tyr, Val, and Thr are preferred over Lys, Trp, Ile, and Ser, respectively. Atomic level contacts revealed that charged and polar-nonpolar contacts are dominant in enzymes, polar in structural, and nonpolar in regulatory proteins. On the other hand, polar-nonpolar contacts are enriched in all these classes of protein-RNA complexes. Further, the influence of sequence and structural features such as conservation score, surrounding hydrophobicity, solvent accessibility, secondary structure, and long-range order in key residues are also discussed. We envisage that the present study provides insights to understand the structural and functional aspects of protein-RNA complexes.
Collapse
|
23
|
Nithin C, Ghosh P, Bujnicki JM. Bioinformatics Tools and Benchmarks for Computational Docking and 3D Structure Prediction of RNA-Protein Complexes. Genes (Basel) 2018; 9:genes9090432. [PMID: 30149645 PMCID: PMC6162694 DOI: 10.3390/genes9090432] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Revised: 07/26/2018] [Accepted: 08/21/2018] [Indexed: 12/29/2022] Open
Abstract
RNA-protein (RNP) interactions play essential roles in many biological processes, such as regulation of co-transcriptional and post-transcriptional gene expression, RNA splicing, transport, storage and stabilization, as well as protein synthesis. An increasing number of RNP structures would aid in a better understanding of these processes. However, due to the technical difficulties associated with experimental determination of macromolecular structures by high-resolution methods, studies on RNP recognition and complex formation present significant challenges. As an alternative, computational prediction of RNP interactions can be carried out. Structural models obtained by theoretical predictive methods are, in general, less reliable compared to models based on experimental measurements but they can be sufficiently accurate to be used as a basis for to formulating functional hypotheses. In this article, we present an overview of computational methods for 3D structure prediction of RNP complexes. We discuss currently available methods for macromolecular docking and for scoring 3D structural models of RNP complexes in particular. Additionally, we also review benchmarks that have been developed to assess the accuracy of these methods.
Collapse
Affiliation(s)
- Chandran Nithin
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland.
| | - Pritha Ghosh
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland.
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland.
- Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, ul. Umultowska 89, PL-61-614 Poznan, Poland.
| |
Collapse
|
24
|
An account of solvent accessibility in protein-RNA recognition. Sci Rep 2018; 8:10546. [PMID: 30002431 PMCID: PMC6043566 DOI: 10.1038/s41598-018-28373-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 06/21/2018] [Indexed: 01/16/2023] Open
Abstract
Protein–RNA recognition often induces conformational changes in binding partners. Consequently, the solvent accessible surface area (SASA) buried in contact estimated from the co-crystal structures may differ from that calculated using their unbound forms. To evaluate the change in accessibility upon binding, we compare SASA of 126 protein-RNA complexes between bound and unbound forms. We observe, in majority of cases the interface of both the binding partners gain accessibility upon binding, which is often associated with either large domain movements or secondary structural transitions in RNA-binding proteins (RBPs), and binding-induced conformational changes in RNAs. At the non-interface region, majority of RNAs lose accessibility upon binding, however, no such preference is observed for RBPs. Side chains of RBPs have major contribution in change in accessibility. In case of flexible binding, we find a moderate correlation between the binding free energy and change in accessibility at the interface. Finally, we introduce a parameter, the ratio of gain to loss of accessibility upon binding, which can be used to identify the native solution among the flexible docking models. Our findings provide fundamental insights into the relationship between flexibility and solvent accessibility, and advance our understanding on binding induced folding in protein-RNA recognition.
Collapse
|
25
|
Setiawan D, Brender J, Zhang Y. Recent advances in automated protein design and its future challenges. Expert Opin Drug Discov 2018; 13:587-604. [PMID: 29695210 DOI: 10.1080/17460441.2018.1465922] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
INTRODUCTION Protein function is determined by protein structure which is in turn determined by the corresponding protein sequence. If the rules that cause a protein to adopt a particular structure are understood, it should be possible to refine or even redefine the function of a protein by working backwards from the desired structure to the sequence. Automated protein design attempts to calculate the effects of mutations computationally with the goal of more radical or complex transformations than are accessible by experimental techniques. Areas covered: The authors give a brief overview of the recent methodological advances in computer-aided protein design, showing how methodological choices affect final design and how automated protein design can be used to address problems considered beyond traditional protein engineering, including the creation of novel protein scaffolds for drug development. Also, the authors address specifically the future challenges in the development of automated protein design. Expert opinion: Automated protein design holds potential as a protein engineering technique, particularly in cases where screening by combinatorial mutagenesis is problematic. Considering solubility and immunogenicity issues, automated protein design is initially more likely to make an impact as a research tool for exploring basic biology in drug discovery than in the design of protein biologics.
Collapse
Affiliation(s)
- Dani Setiawan
- a Department of Computational Medicine and Bioinformatics , University of Michigan , Ann Arbor , MI , USA
| | - Jeffrey Brender
- b Radiation Biology Branch , Center for Cancer Research, National Cancer Institute - NIH , Bethesda , MD , USA
| | - Yang Zhang
- a Department of Computational Medicine and Bioinformatics , University of Michigan , Ann Arbor , MI , USA.,c Department of Biological Chemistry , University of Michigan , Ann Arbor , MI , USA
| |
Collapse
|
26
|
Mukherjee S, Nithin C, Divakaruni Y, Bahadur RP. Dissecting water binding sites at protein–protein interfaces: a lesson from the atomic structures in the Protein Data Bank. J Biomol Struct Dyn 2018; 37:1204-1219. [DOI: 10.1080/07391102.2018.1453379] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Sunandan Mukherjee
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Chandran Nithin
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Yasaswi Divakaruni
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| |
Collapse
|
27
|
Krepl M, Blatter M, Cléry A, Damberger FF, Allain FH, Sponer J. Structural study of the Fox-1 RRM protein hydration reveals a role for key water molecules in RRM-RNA recognition. Nucleic Acids Res 2017; 45:8046-8063. [PMID: 28505313 PMCID: PMC5737849 DOI: 10.1093/nar/gkx418] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Revised: 04/26/2017] [Accepted: 05/02/2017] [Indexed: 01/07/2023] Open
Abstract
The Fox-1 RNA recognition motif (RRM) domain is an important member of the RRM protein family. We report a 1.8 Å X-ray structure of the free Fox-1 containing six distinct monomers. We use this and the nuclear magnetic resonance (NMR) structure of the Fox-1 protein/RNA complex for molecular dynamics (MD) analyses of the structured hydration. The individual monomers of the X-ray structure show diverse hydration patterns, however, MD excellently reproduces the most occupied hydration sites. Simulations of the protein/RNA complex show hydration consistent with the isolated protein complemented by hydration sites specific to the protein/RNA interface. MD predicts intricate hydration sites with water-binding times extending up to hundreds of nanoseconds. We characterize two of them using NMR spectroscopy, RNA binding with switchSENSE and free-energy calculations of mutant proteins. Both hydration sites are experimentally confirmed and their abolishment reduces the binding free-energy. A quantitative agreement between theory and experiment is achieved for the S155A substitution but not for the S122A mutant. The S155 hydration site is evolutionarily conserved within the RRM domains. In conclusion, MD is an effective tool for predicting and interpreting the hydration patterns of protein/RNA complexes. Hydration is not easily detectable in NMR experiments but can affect stability of protein/RNA complexes.
Collapse
Affiliation(s)
- Miroslav Krepl
- Institute of Biophysics, Academy of Sciences of the Czech Republic, Kralovopolska 135, 612 65 Brno, Czech Republic
- Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science, Palacky University Olomouc, 17. listopadu 12, 771 46 Olomouc, Czech Republic
| | - Markus Blatter
- Institute of Molecular Biology and Biophysics, Department of Biology, ETH Zurich, CH-8093 Zurich, Switzerland
- Present address: Global Discovery Chemistry, Novartis Institute for BioMedical Research, Basel CH-4002, Switzerland
| | - Antoine Cléry
- Institute of Molecular Biology and Biophysics, Department of Biology, ETH Zurich, CH-8093 Zurich, Switzerland
| | - Fred F. Damberger
- Institute of Molecular Biology and Biophysics, Department of Biology, ETH Zurich, CH-8093 Zurich, Switzerland
| | - Frédéric H.T. Allain
- Institute of Molecular Biology and Biophysics, Department of Biology, ETH Zurich, CH-8093 Zurich, Switzerland
| | - Jiri Sponer
- Institute of Biophysics, Academy of Sciences of the Czech Republic, Kralovopolska 135, 612 65 Brno, Czech Republic
- Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science, Palacky University Olomouc, 17. listopadu 12, 771 46 Olomouc, Czech Republic
| |
Collapse
|
28
|
Spyrakis F, Ahmed MH, Bayden AS, Cozzini P, Mozzarelli A, Kellogg GE. The Roles of Water in the Protein Matrix: A Largely Untapped Resource for Drug Discovery. J Med Chem 2017; 60:6781-6827. [PMID: 28475332 DOI: 10.1021/acs.jmedchem.7b00057] [Citation(s) in RCA: 98] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The value of thoroughly understanding the thermodynamics specific to a drug discovery/design study is well known. Over the past decade, the crucial roles of water molecules in protein structure, function, and dynamics have also become increasingly appreciated. This Perspective explores water in the biological environment by adopting its point of view in such phenomena. The prevailing thermodynamic models of the past, where water was seen largely in terms of an entropic gain after its displacement by a ligand, are now known to be much too simplistic. We adopt a set of terminology that describes water molecules as being "hot" and "cold", which we have defined as being easy and difficult to displace, respectively. The basis of these designations, which involve both enthalpic and entropic water contributions, are explored in several classes of biomolecules and structural motifs. The hallmarks for characterizing water molecules are examined, and computational tools for evaluating water-centric thermodynamics are reviewed. This Perspective's summary features guidelines for exploiting water molecules in drug discovery.
Collapse
Affiliation(s)
- Francesca Spyrakis
- Dipartimento di Scienza e Tecnologia del Farmaco, Università degli Studi di Torino , Via Pietro Giuria 9, 10125 Torino, Italy
| | - Mostafa H Ahmed
- Department of Medicinal Chemistry & Institute for Structural Biology, Drug Discovery and Development, Virginia Commonwealth University , Richmond, Virginia 23298-0540, United States
| | - Alexander S Bayden
- CMD Bioscience , 5 Science Park, New Haven, Connecticut 06511, United States
| | - Pietro Cozzini
- Dipartimento di Scienze degli Alimenti e del Farmaco, Laboratorio di Modellistica Molecolare, Università degli Studi di Parma , Parco Area delle Scienze 59/A, 43121 Parma, Italy
| | - Andrea Mozzarelli
- Dipartimento di Scienze degli Alimenti e del Farmaco, Laboratorio di Biochimica, Università degli Studi di Parma , Parco Area delle Scienze 23/A, 43121 Parma, Italy.,Istituto di Biofisica, Consiglio Nazionale delle Ricerche , Via Moruzzi 1, 56124 Pisa, Italy
| | - Glen E Kellogg
- Department of Medicinal Chemistry & Institute for Structural Biology, Drug Discovery and Development, Virginia Commonwealth University , Richmond, Virginia 23298-0540, United States
| |
Collapse
|
29
|
Nithin C, Mukherjee S, Bahadur RP. A non-redundant protein-RNA docking benchmark version 2.0. Proteins 2016; 85:256-267. [PMID: 27862282 DOI: 10.1002/prot.25211] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Revised: 10/27/2016] [Accepted: 11/08/2016] [Indexed: 12/23/2022]
Abstract
We present an updated version of the protein-RNA docking benchmark, which we first published four years back. The non-redundant protein-RNA docking benchmark version 2.0 consists of 126 test cases, a threefold increase in number compared to its previous version. The present version consists of 21 unbound-unbound cases, of which, in 12 cases, the unbound RNAs are taken from another complex. It also consists of 95 unbound-bound cases where only the protein is available in the unbound state. Besides, we introduce 10 new bound-unbound cases where only the RNA is found in the unbound state. Based on the degree of conformational change of the interface residues upon complex formation the benchmark is classified into 72 rigid-body cases, 25 semiflexible cases and 19 full flexible cases. It also covers a wide range of conformational flexibility including small side chain movement to large domain swapping in protein structures as well as flipping and restacking in RNA bases. This benchmark should provide the docking community with more test cases for evaluating rigid-body as well as flexible docking algorithms. Besides, it will also facilitate the development of new algorithms that require large number of training set. The protein-RNA docking benchmark version 2.0 can be freely downloaded from http://www.csb.iitkgp.ernet.in/applications/PRDBv2. Proteins 2017; 85:256-267. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Chandran Nithin
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, 721302, India
| | - Sunandan Mukherjee
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, 721302, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, 721302, India
| |
Collapse
|
30
|
Protein-RNA interactions: structural biology and computational modeling techniques. Biophys Rev 2016; 8:359-367. [PMID: 28510023 DOI: 10.1007/s12551-016-0223-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Accepted: 09/20/2016] [Indexed: 12/30/2022] Open
Abstract
RNA-binding proteins are functionally diverse within cells, being involved in RNA-metabolism, translation, DNA damage repair, and gene regulation at both the transcriptional and post-transcriptional levels. Much has been learnt about their interactions with RNAs through structure determination techniques and computational modeling. This review gives an overview of the structural data currently available for protein-RNA complexes, and discusses the technical issues facing structural biologists working to solve their structures. The review focuses on three techniques used to solve the 3-dimensional structure of protein-RNA complexes at atomic resolution, namely X-ray crystallography, solution nuclear magnetic resonance (NMR) and cryo-electron microscopy (cryo-EM). The review then focuses on the main computational modeling techniques that use these atomic resolution data: discussing the prediction of RNA-binding sites on unbound proteins, docking proteins, and RNAs, and modeling the molecular dynamics of the systems. In conclusion, the review looks at the future directions this field of research might take.
Collapse
|
31
|
Hao N, Palmer AC, Ahlgren-Berg A, Shearwin KE, Dodd IB. The role of repressor kinetics in relief of transcriptional interference between convergent promoters. Nucleic Acids Res 2016; 44:6625-38. [PMID: 27378773 PMCID: PMC5001618 DOI: 10.1093/nar/gkw600] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2015] [Accepted: 06/22/2016] [Indexed: 01/09/2023] Open
Abstract
Transcriptional interference (TI), where transcription from a promoter is inhibited by the activity of other promoters in its vicinity on the same DNA, enables transcription factors to regulate a target promoter indirectly, inducing or relieving TI by controlling the interfering promoter. For convergent promoters, stochastic simulations indicate that relief of TI can be inhibited if the repressor at the interfering promoter has slow binding kinetics, making it either sensitive to frequent dislodgement by elongating RNA polymerases (RNAPs) from the target promoter, or able to be a strong roadblock to these RNAPs. In vivo measurements of relief of TI by CI or Cro repressors in the bacteriophage λ PR-PRE system show strong relief of TI and a lack of dislodgement and roadblocking effects, indicative of rapid CI and Cro binding kinetics. However, repression of the same λ promoter by a catalytically dead CRISPR Cas9 protein gave either compromised or no relief of TI depending on the orientation at which it binds DNA, consistent with dCas9 being a slow kinetics repressor. This analysis shows how the intrinsic properties of a repressor can be evolutionarily tuned to set the magnitude of relief of TI.
Collapse
Affiliation(s)
- Nan Hao
- Discipline of Biochemistry, Department of Molecular and Cellular Biology, The University of Adelaide, Adelaide, South Australia 5005, Australia
| | - Adam C Palmer
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA
| | - Alexandra Ahlgren-Berg
- Discipline of Biochemistry, Department of Molecular and Cellular Biology, The University of Adelaide, Adelaide, South Australia 5005, Australia
| | - Keith E Shearwin
- Discipline of Biochemistry, Department of Molecular and Cellular Biology, The University of Adelaide, Adelaide, South Australia 5005, Australia
| | - Ian B Dodd
- Discipline of Biochemistry, Department of Molecular and Cellular Biology, The University of Adelaide, Adelaide, South Australia 5005, Australia
| |
Collapse
|