1
|
Sagendorf JM, Mitra R, Huang J, Chen XS, Rohs R. Structure-based prediction of protein-nucleic acid binding using graph neural networks. Biophys Rev 2024; 16:297-314. [PMID: 39345796 PMCID: PMC11427629 DOI: 10.1007/s12551-024-01201-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 05/28/2024] [Indexed: 10/01/2024] Open
Abstract
Protein-nucleic acid (PNA) binding plays critical roles in the transcription, translation, regulation, and three-dimensional organization of the genome. Structural models of proteins bound to nucleic acids (NA) provide insights into the chemical, electrostatic, and geometric properties of the protein structure that give rise to NA binding but are scarce relative to models of unbound proteins. We developed a deep learning approach for predicting PNA binding given the unbound structure of a protein that we call PNAbind. Our method utilizes graph neural networks to encode the spatial distribution of physicochemical and geometric properties of protein structures that are predictive of NA binding. Using global physicochemical encodings, our models predict the overall binding function of a protein, and using local encodings, they predict the location of individual NA binding residues. Our models can discriminate between specificity for DNA or RNA binding, and we show that predictions made on computationally derived protein structures can be used to gain mechanistic understanding of chemical and structural features that determine NA recognition. Binding site predictions were validated against benchmark datasets, achieving AUROC scores in the range of 0.92-0.95. We applied our models to the HIV-1 restriction factor APOBEC3G and showed that our model predictions are consistent with and help explain experimental RNA binding data. Supplementary information The online version contains supplementary material available at 10.1007/s12551-024-01201-w.
Collapse
Affiliation(s)
- Jared M. Sagendorf
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA
- Present Address: Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158 USA
| | - Raktim Mitra
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA
| | - Jiawei Huang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA
| | - Xiaojiang S. Chen
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089 USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089 USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089 USA
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089 USA
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089 USA
| |
Collapse
|
2
|
Sun C, Feng Y. EPDRNA: A Model for Identifying DNA-RNA Binding Sites in Disease-Related Proteins. Protein J 2024; 43:513-521. [PMID: 38491248 DOI: 10.1007/s10930-024-10183-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/02/2024] [Indexed: 03/18/2024]
Abstract
Protein-DNA and protein-RNA interactions are involved in many biological processes and regulate many cellular functions. Moreover, they are related to many human diseases. To understand the molecular mechanism of protein-DNA binding and protein-RNA binding, it is important to identify which residues in the protein sequence bind to DNA and RNA. At present, there are few methods for specifically identifying the binding sites of disease-related protein-DNA and protein-RNA. In this study, so we combined four machine learning algorithms into an ensemble classifier (EPDRNA) to predict DNA and RNA binding sites in disease-related proteins. The dataset used in model was collated from UniProt and PDB database, and PSSM, physicochemical properties and amino acid type were used as features. The EPDRNA adopted soft voting and achieved the best AUC value of 0.73 at the DNA binding sites, and the best AUC value of 0.71 at the RNA binding sites in 10-fold cross validation in the training sets. In order to further verify the performance of the model, we assessed EPDRNA for the prediction of DNA-binding sites and the prediction of RNA-binding sites on the independent test dataset. The EPDRNA achieved 85% recall rate and 25% precision on the protein-DNA interaction independent test set, and achieved 82% recall rate and 27% precision on the protein-RNA interaction independent test set. The online EPDRNA webserver is freely available at http://www.s-bioinformatics.cn/epdrna .
Collapse
Affiliation(s)
- CanZhuang Sun
- College of Science, Inner Mongolia Agriculture University, Hohhot, 010018, People's Republic of China
| | - YongE Feng
- College of Science, Inner Mongolia Agriculture University, Hohhot, 010018, People's Republic of China.
| |
Collapse
|
3
|
Sagendorf JM, Mitra R, Huang J, Chen XS, Rohs R. PNAbind: Structure-based prediction of protein-nucleic acid binding using graph neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.27.582387. [PMID: 38529493 PMCID: PMC10962711 DOI: 10.1101/2024.02.27.582387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
The recognition and binding of nucleic acids (NAs) by proteins depends upon complementary chemical, electrostatic and geometric properties of the protein-NA binding interface. Structural models of protein-NA complexes provide insights into these properties but are scarce relative to models of unbound proteins. We present a deep learning approach for predicting protein-NA binding given the apo structure of a protein (PNAbind). Our method utilizes graph neural networks to encode spatial distributions of physicochemical and geometric properties of the protein molecular surface that are predictive of NA binding. Using global physicochemical encodings, our models predict the overall binding function of a protein and can discriminate between specificity for DNA or RNA binding. We show that such predictions made on protein structures modeled with AlphaFold2 can be used to gain mechanistic understanding of chemical and structural features that determine NA recognition. Using local encodings, our models predict the location of NA binding sites at the level of individual binding residues. Binding site predictions were validated against benchmark datasets, achieving AUROC scores in the range of 0.92-0.95. We applied our models to the HIV-1 restriction factor APOBEC3G and show that our predictions are consistent with experimental RNA binding data.
Collapse
|
4
|
Lee M. Machine learning for small interfering RNAs: a concise review of recent developments. Front Genet 2023; 14:1226336. [PMID: 37519887 PMCID: PMC10372481 DOI: 10.3389/fgene.2023.1226336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 07/04/2023] [Indexed: 08/01/2023] Open
Abstract
The advent of machine learning and its subsequent integration into small interfering RNA (siRNA) research heralds a new epoch in the field of RNA interference (RNAi). This review emphasizes the urgency and relevance of assimilating the plethora of contributions and advancements in this domain, particularly focusing on the period of 2019-2023. Given the rapid progression of deep learning technologies, our synthesis of recent research is paramount to staying apprised of the state-of-the-art methods being utilized. It not only offers a comprehensive insight into the confluence of machine learning and siRNA but also serves as a beacon, guiding future explorations in this intersectional research field. Our rigorous examination of studies promises a discerning perspective on the contemporary landscape of machine learning applications in siRNA design and function. This review is an effort to foster further discourse and propel academic inquiry in this multifaceted domain.
Collapse
|
5
|
Li K, Wu H, Yue Z, Sun Y, Xia C. A convolutional network and attention mechanism-based approach to predict protein-RNA binding residues. Comput Biol Chem 2023; 105:107901. [PMID: 37327559 DOI: 10.1016/j.compbiolchem.2023.107901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 05/29/2023] [Accepted: 05/31/2023] [Indexed: 06/18/2023]
Abstract
Protein-RNA interactions play a key role in various biological cellular processes, and many experimental and computational studies have been initiated to analyze their interactions. However, experimental determination is quite complex and expensive. Therefore, researchers have worked to develop efficient computational tools to detect protein-RNA binding residues. The accuracy of existing methods is limited by the features of the target and the performance of the computational models; there remains room for improvement. To solve the problem of the accurate detection of protein-RNA binding residues, we propose a convolutional network model named PBRPre based on improved MobileNet. First, by extracting the position information of the target complex and the 3-mer amino acid feature data, the position-specific scoring matrix (PSSM) is improved by using spatial neighbor smoothing processing and discrete wavelet transform to fully exploit the spatial structure information of the target and enrich the feature dataset. Second, the deep learning model MobileNet is used to integrate and optimize the potential features in the target complexes; then, by introducing the Vision Transformer (ViT) network classification layer, the deep-level information of the target is mined to enhance the processing ability of the model for global information and to improve the detection accuracy of the classifiers. The results show that the AUC value of the model can reach 0.866 in the independent testing dataset, which shows that PBRPre can effectively realize the detection of protein-RNA binding residues. All datasets and resource codes of PBRPre are available at https://github.com/linglewu/PBRPre for academic use.
Collapse
Affiliation(s)
- Ke Li
- School of Information & Computer, Anhui Agricultural University, Hefei, Anhui 230036, China; Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei, Anhui 230601, China; Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui 230036, China.
| | - Hongwei Wu
- School of Information & Computer, Anhui Agricultural University, Hefei, Anhui 230036, China; Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Zhenyu Yue
- School of Information & Computer, Anhui Agricultural University, Hefei, Anhui 230036, China; Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Yu Sun
- School of Information & Computer, Anhui Agricultural University, Hefei, Anhui 230036, China; Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Chuan Xia
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui 230036, China
| |
Collapse
|
6
|
Zhang F, Li M, Zhang J, Kurgan L. HybridRNAbind: prediction of RNA interacting residues across structure-annotated and disorder-annotated proteins. Nucleic Acids Res 2023; 51:e25. [PMID: 36629262 PMCID: PMC10018345 DOI: 10.1093/nar/gkac1253] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 11/22/2022] [Accepted: 12/15/2022] [Indexed: 01/12/2023] Open
Abstract
The sequence-based predictors of RNA-binding residues (RBRs) are trained on either structure-annotated or disorder-annotated binding regions. A recent study of predictors of protein-binding residues shows that they are plagued by high levels of cross-predictions (protein binding residues are predicted as nucleic acid binding) and that structure-trained predictors perform poorly for the disorder-annotated regions and vice versa. Consequently, we analyze a representative set of the structure and disorder trained predictors of RBRs to comprehensively assess quality of their predictions. Our empirical analysis that relies on a new and low-similarity benchmark dataset reveals that the structure-trained predictors of RBRs perform well for the structure-annotated proteins while the disorder-trained predictors provide accurate results for the disorder-annotated proteins. However, these methods work only modestly well on the opposite types of annotations, motivating the need for new solutions. Using an empirical approach, we design HybridRNAbind meta-model that generates accurate predictions and low amounts of cross-predictions when tested on data that combines structure and disorder-annotated RBRs. We release this meta-model as a convenient webserver which is available at https://www.csuligroup.com/hybridRNAbind/.
Collapse
Affiliation(s)
- Fuhao Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
7
|
Wu Z, Basu S, Wu X, Kurgan L. qNABpredict: Quick, accurate, and taxonomy-aware sequence-based prediction of content of nucleic acid binding amino acids. Protein Sci 2023; 32:e4544. [PMID: 36519304 PMCID: PMC9798252 DOI: 10.1002/pro.4544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/07/2022] [Accepted: 12/08/2022] [Indexed: 12/23/2022]
Abstract
Protein sequence-based predictors of nucleic acid (NA)-binding include methods that predict NA-binding proteins and NA-binding residues. The residue-level tools produce more details but suffer high computational cost since they must predict every amino acid in the input sequence and rely on multiple sequence alignments. We propose an alternative approach that predicts content (fraction) of the NA-binding residues, offering more information than the protein-level prediction and much shorter runtime than the residue-level tools. Our first-of-its-kind content predictor, qNABpredict, relies on a small, rationally designed and fast-to-compute feature set that represents relevant characteristics extracted from the input sequence and a well-parametrized support vector regression model. We provide two versions of qNABpredict, a taxonomy-agnostic model that can be used for proteins of unknown taxonomic origin and more accurate taxonomy-aware models that are tailored to specific taxonomic kingdoms: archaea, bacteria, eukaryota, and viruses. Empirical tests on a low-similarity test dataset show that qNABpredict is 100 times faster and generates statistically more accurate content predictions when compared to the content extracted from results produced by the residue-level predictors. We also show that qNABpredict's content predictions can be used to improve results generated by the residue-level predictors. We release qNABpredict as a convenient webserver and source code at http://biomine.cs.vcu.edu/servers/qNABpredict/. This new tool should be particularly useful to predict details of protein-NA interactions for large protein families and proteomes.
Collapse
Affiliation(s)
- Zhonghua Wu
- School of Mathematical Sciences and LPMCNankai UniversityTianjinChina
| | - Sushmita Basu
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Xuantai Wu
- School of Mathematical Sciences and LPMCNankai UniversityTianjinChina
| | - Lukasz Kurgan
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| |
Collapse
|
8
|
PRIP: A Protein-RNA Interface Predictor Based on Semantics of Sequences. Life (Basel) 2022; 12:life12020307. [PMID: 35207594 PMCID: PMC8879494 DOI: 10.3390/life12020307] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 01/28/2022] [Accepted: 02/04/2022] [Indexed: 01/08/2023] Open
Abstract
RNA–protein interactions play an indispensable role in many biological processes. Growing evidence has indicated that aberration of the RNA–protein interaction is associated with many serious human diseases. The precise and quick detection of RNA–protein interactions is crucial to finding new functions and to uncovering the mechanism of interactions. Although many methods have been presented to recognize RNA-binding sites, there is much room left for the improvement of predictive accuracy. We present a sequence semantics-based method (called PRIP) for predicting RNA-binding interfaces. The PRIP extracted semantic embedding by pre-training the Word2vec with the corpus. Extreme gradient boosting was employed to train a classifier. The PRIP obtained a SN of 0.73 over the five-fold cross validation and a SN of 0.67 over the independent test, outperforming the state-of-the-art methods. Compared with other methods, this PRIP learned the hidden relations between words in the context. The analysis of the semantics relationship implied that the semantics of some words were specific to RNA-binding interfaces. This method is helpful to explore the mechanism of RNA–protein interactions from a semantics point of view.
Collapse
|
9
|
A comparative analysis of machine learning classifiers for predicting protein-binding nucleotides in RNA sequences. Comput Struct Biotechnol J 2022; 20:3195-3207. [PMID: 35832617 PMCID: PMC9249596 DOI: 10.1016/j.csbj.2022.06.036] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 06/14/2022] [Accepted: 06/14/2022] [Indexed: 11/24/2022] Open
Abstract
RNA are master players in various cellular and biological processes and RNA-protein interactions are vital for proper functioning of cellular machineries. Knowledge of binding sites is crucial to decipher their functional implications. RNA NC-triplet and NC-quartet features could give reasonably high performance. RF model outperformed other machine learning classifiers with 85% accuracy and 0.93 AUC and performed better than few existing methods. An online webserver “Nucpred” is developed with trained model and freely accessible for scientific community.
RNA-protein interactions play vital roles in driving the cellular machineries. Despite significant involvement in several biological processes, the underlying molecular mechanism of RNA-protein interactions is still elusive. This may be due to the experimental difficulties in solving co-crystallized RNA-protein complexes. Inherent flexibility of RNA molecules to adopt different conformations makes them functionally diverse. Their interactions with protein have implications in RNA disease biology. Thus, study of binding interfaces can provide a mechanistic insight of the molecular functioning and aberrations caused due to altered interactions. Moreover, high-throughput sequencing technologies have generated huge sequence data compared to available structural data of RNA-protein complexes. In such a scenario, efficient computational algorithms are required for identification of protein-binding interfaces of RNA in the absence of known structures. We have investigated several machine learning classifiers and various features derived from nucleotide sequences to identify protein-binding nucleotides in RNA. We achieve best performance with nucleotide-triplet and nucleotide-quartet feature-based random forest models. An overall accuracy of 84.8%, sensitivity of 83.2%, specificity of 86.1%, MCC of 0.70 and AUC of 0.93 is achieved. We have further implemented the developed models in a user-friendly webserver “Nucpred”, which is freely accessible at “http://www.csb.iitkgp.ac.in/applications/Nucpred/index”.
Collapse
|
10
|
Zhang J, Ghadermarzi S, Katuwawala A, Kurgan L. DNAgenie: accurate prediction of DNA-type-specific binding residues in protein sequences. Brief Bioinform 2021; 22:6355416. [PMID: 34415020 DOI: 10.1093/bib/bbab336] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 07/02/2021] [Accepted: 07/28/2021] [Indexed: 01/02/2023] Open
Abstract
Efforts to elucidate protein-DNA interactions at the molecular level rely in part on accurate predictions of DNA-binding residues in protein sequences. While there are over a dozen computational predictors of the DNA-binding residues, they are DNA-type agnostic and significantly cross-predict residues that interact with other ligands as DNA binding. We leverage a custom-designed machine learning architecture to introduce DNAgenie, first-of-its-kind predictor of residues that interact with A-DNA, B-DNA and single-stranded DNA. DNAgenie uses a comprehensive physiochemical profile extracted from an input protein sequence and implements a two-step refinement process to provide accurate predictions and to minimize the cross-predictions. Comparative tests on an independent test dataset demonstrate that DNAgenie outperforms the current methods that we adapt to predict residue-level interactions with the three DNA types. Further analysis finds that the use of the second (refinement) step leads to a substantial reduction in the cross predictions. Empirical tests show that DNAgenie's outputs that are converted to coarse-grained protein-level predictions compare favorably against recent tools that predict which DNA-binding proteins interact with double-stranded versus single-stranded DNAs. Moreover, predictions from the sequences of the whole human proteome reveal that the results produced by DNAgenie substantially overlap with the known DNA-binding proteins while also including promising leads for several hundred previously unknown putative DNA binders. These results suggest that DNAgenie is a valuable tool for the sequence-based characterization of protein functions. The DNAgenie's webserver is available at http://biomine.cs.vcu.edu/servers/DNAgenie/.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology at the Xinyang Normal University, No.237, Nanhu Road, Xinyang 464000, Henan Province, P.R. China
| | - Sina Ghadermarzi
- Department of Computer Science at the Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, Virginia 23284, USA
| | - Akila Katuwawala
- Department of Computer Science from the Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, Virginia 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science at the Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, Virginia 23284, USA
| |
Collapse
|
11
|
Chen Z, Zhao P, Li C, Li F, Xiang D, Chen YZ, Akutsu T, Daly RJ, Webb GI, Zhao Q, Kurgan L, Song J. iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization. Nucleic Acids Res 2021; 49:e60. [PMID: 33660783 PMCID: PMC8191785 DOI: 10.1093/nar/gkab122] [Citation(s) in RCA: 145] [Impact Index Per Article: 36.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 02/05/2021] [Accepted: 02/25/2021] [Indexed: 12/14/2022] Open
Abstract
Sequence-based analysis and prediction are fundamental bioinformatic tasks that facilitate understanding of the sequence(-structure)-function paradigm for DNAs, RNAs and proteins. Rapid accumulation of sequences requires equally pervasive development of new predictive models, which depends on the availability of effective tools that support these efforts. We introduce iLearnPlus, the first machine-learning platform with graphical- and web-based interfaces for the construction of machine-learning pipelines for analysis and predictions using nucleic acid and protein sequences. iLearnPlus provides a comprehensive set of algorithms and automates sequence-based feature extraction and analysis, construction and deployment of models, assessment of predictive performance, statistical analysis, and data visualization; all without programming. iLearnPlus includes a wide range of feature sets which encode information from the input sequences and over twenty machine-learning algorithms that cover several deep-learning approaches, outnumbering the current solutions by a wide margin. Our solution caters to experienced bioinformaticians, given the broad range of options, and biologists with no programming background, given the point-and-click interface and easy-to-follow design process. We showcase iLearnPlus with two case studies concerning prediction of long noncoding RNAs (lncRNAs) from RNA transcripts and prediction of crotonylation sites in protein chains. iLearnPlus is an open-source platform available at https://github.com/Superzchen/iLearnPlus/ with the webserver at http://ilearnplus.erc.monash.edu/.
Collapse
Affiliation(s)
- Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450046, China
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Chen Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia.,Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, Victoria 3000, Australia
| | - Dongxu Xiang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Yong-Zi Chen
- Laboratory of Tumor Cell Biology, Key Laboratory of Cancer Prevention and Therapy, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300060, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Roger J Daly
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Quanzhi Zhao
- Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450046, China.,Key Laboratory of Rice Biology in Henan Province, Henan Agricultural University, Zhengzhou 450046, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
12
|
Oldfield CJ, Peng Z, Kurgan L. Disordered RNA-Binding Region Prediction with DisoRDPbind. Methods Mol Biol 2021; 2106:225-239. [PMID: 31889261 DOI: 10.1007/978-1-0716-0231-7_14] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
RNA chaperone activity is one of the many functions of intrinsically disordered regions (IDRs). IDRs function without the prerequisite of a stable structure. Instead, their functions arise from structural ensembles. A common theme in IDR function is molecular recognition; IDRs mediate interactions with other proteins, RNA, and DNA. Many computational methods are available to predict IDRs from protein sequence, but relatively few are available for predicting IDR functions. Available methods primarily focus on protein-protein interactions. DisoRDPbind was developed to predict several protein functions including interactions with RNA. This method is available as a user-friendly web interface, located at http://biomine.cs.vcu.edu/servers/DisoRDPbind/ . The development and architecture of DisoRDPbind is briefly presented, and its accuracy relative to other RNA-binding residue predictors is discussed. We explain usage of the web interface in detail and provide an example of prediction results and interpretation. While DisoRDPbind does not identify RNA chaperones directly, we provide a case study of an RNA chaperone, HCV core protein, as an example of the method's utility in the study of RNA chaperones.
Collapse
Affiliation(s)
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, People's Republic of China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
13
|
Liu Y, Gong W, Zhao Y, Deng X, Zhang S, Li C. aPRBind: protein-RNA interface prediction by combining sequence and I-TASSER model-based structural features learned with convolutional neural networks. Bioinformatics 2021; 37:937-942. [PMID: 32821925 DOI: 10.1093/bioinformatics/btaa747] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 07/26/2020] [Accepted: 08/17/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Protein-RNA interactions play a critical role in various biological processes. The accurate prediction of RNA-binding residues in proteins has been one of the most challenging and intriguing problems in the field of computational biology. The existing methods still have a relatively low accuracy especially for the sequence-based ab-initio methods. RESULTS In this work, we propose an approach aPRBind, a convolutional neural network-based ab-initio method for RNA-binding residue prediction. aPRBind is trained with sequence features and structural ones (particularly including residue dynamics information and residue-nucleotide propensity developed by us) that are extracted from the predicted structures by I-TASSER. The analysis of feature contributions indicates the sequence features are most important, followed by dynamics information, and the sequence and structural features are complementary in binding site prediction. The performance comparison of our method with other peer ones on benchmark dataset shows that aPRBind outperforms some state-of-the-art ab-initio methods. Additionally, aPRBind can give a better prediction for the modeled structures with TM-score≥0.5, and meanwhile since the structural features are not very sensitive to the refined 3D structures, aPRBind has only a marginal dependence on the accuracy of the structure model, which allows aPRBind to be applied to the RNA-binding site prediction for the modeled or unbound structures. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/ChunhuaLiLab/aPRbind. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang Liu
- Department of Biomedical Engineering, Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Weikang Gong
- Department of Biomedical Engineering, Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Yanpeng Zhao
- Department of Biomedical Engineering, Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Xueqing Deng
- Department of Biomedical Engineering, Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Shan Zhang
- Department of Biomedical Engineering, Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- Department of Biomedical Engineering, Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| |
Collapse
|
14
|
Recent Advances in the Prediction of Protein Structural Classes: Feature Descriptors and Machine Learning Algorithms. CRYSTALS 2021. [DOI: 10.3390/cryst11040324] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
In the postgenomic age, rapid growth in the number of sequence-known proteins has been accompanied by much slower growth in the number of structure-known proteins (as a result of experimental limitations), and a widening gap between the two is evident. Because protein function is linked to protein structure, successful prediction of protein structure is of significant importance in protein function identification. Foreknowledge of protein structural class can help improve protein structure prediction with significant medical and pharmaceutical implications. Thus, a fast, suitable, reliable, and reasonable computational method for protein structural class prediction has become pivotal in bioinformatics. Here, we review recent efforts in protein structural class prediction from protein sequence, with particular attention paid to new feature descriptors, which extract information from protein sequence, and the use of machine learning algorithms in both feature selection and the construction of new classification models. These new feature descriptors include amino acid composition, sequence order, physicochemical properties, multiprofile Bayes, and secondary structure-based features. Machine learning methods, such as artificial neural networks (ANNs), support vector machine (SVM), K-nearest neighbor (KNN), random forest, deep learning, and examples of their application are discussed in detail. We also present our view on possible future directions, challenges, and opportunities for the applications of machine learning algorithms for prediction of protein structural classes.
Collapse
|
15
|
Liu Y, Gong W, Yang Z, Li C. SNB-PSSM: A spatial neighbor-based PSSM used for protein-RNA binding site prediction. J Mol Recognit 2021; 34:e2887. [PMID: 33442949 DOI: 10.1002/jmr.2887] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Revised: 12/22/2020] [Accepted: 12/23/2020] [Indexed: 01/09/2023]
Abstract
Protein-RNA interactions play essential roles in a wide variety of biological processes. Recognition of RNA-binding residues on proteins has been a challenging problem. Most of methods utilize the position-specific scoring matrix (PSSM). It has been found that considering the evolutionary information of sequence neighboring residues can improve the prediction. In this work, we introduce a novel method SNB-PSSM (spatial neighbor-based PSSM) combined with the structure window scheme where the evolutionary information of spatially neighboring residues is considered. The results show our method consistently outperforms the standard and smoothed PSSM methods. Tested on multiple datasets, this approach shows an encouraging performance compared with RNABindRPlus, BindN+, PPRInt, xypan, Predict_RBP, SpaPF, PRNA, and KYG, although is inferior to RNAProSite, RBscore, and aaRNA. In addition, since our method is not sensitive to protein structure changes, it can be applied well on binding site predictions of modeled structures. Thus, the result also suggests the evolution of binding sites is spatially cooperative. The proposed method as an effective tool of considering evolutionary information can be widely used for the nucleic acid-/protein-binding site prediction and functional motif finding.
Collapse
Affiliation(s)
- Yang Liu
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China
| | - Weikang Gong
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China
| | - Zhen Yang
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China
| | - Chunhua Li
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China
| |
Collapse
|
16
|
Wang K, Hu G, Wu Z, Su H, Yang J, Kurgan L. Comprehensive Survey and Comparative Assessment of RNA-Binding Residue Predictions with Analysis by RNA Type. Int J Mol Sci 2020; 21:E6879. [PMID: 32961749 PMCID: PMC7554811 DOI: 10.3390/ijms21186879] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Revised: 09/15/2020] [Accepted: 09/17/2020] [Indexed: 02/07/2023] Open
Abstract
With close to 30 sequence-based predictors of RNA-binding residues (RBRs), this comparative survey aims to help with understanding and selection of the appropriate tools. We discuss past reviews on this topic, survey a comprehensive collection of predictors, and comparatively assess six representative methods. We provide a novel and well-designed benchmark dataset and we are the first to report and compare protein-level and datasets-level results, and to contextualize performance to specific types of RNAs. The methods considered here are well-cited and rely on machine learning algorithms on occasion combined with homology-based prediction. Empirical tests reveal that they provide relatively accurate predictions. Virtually all methods perform well for the proteins that interact with rRNAs, some generate accurate predictions for mRNAs, snRNA, SRP and IRES, while proteins that bind tRNAs are predicted poorly. Moreover, except for DRNApred, they confuse DNA and RNA-binding residues. None of the six methods consistently outperforms the others when tested on individual proteins. This variable and complementary protein-level performance suggests that users should not rely on applying just the single best dataset-level predictor. We recommend that future work should focus on the development of approaches that facilitate protein-level selection of accurate predictors and the consensus-based prediction of RBRs.
Collapse
Affiliation(s)
- Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China; (K.W.); (Z.W.); (H.S.); (J.Y.)
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China;
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China; (K.W.); (Z.W.); (H.S.); (J.Y.)
| | - Hong Su
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China; (K.W.); (Z.W.); (H.S.); (J.Y.)
| | - Jianyi Yang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China; (K.W.); (Z.W.); (H.S.); (J.Y.)
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
17
|
A deep learning framework to predict binding preference of RNA constituents on protein surface. Nat Commun 2019; 10:4941. [PMID: 31666519 PMCID: PMC6821705 DOI: 10.1038/s41467-019-12920-0] [Citation(s) in RCA: 69] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 10/08/2019] [Indexed: 12/21/2022] Open
Abstract
Protein-RNA interaction plays important roles in post-transcriptional regulation. However, the task of predicting these interactions given a protein structure is difficult. Here we show that, by leveraging a deep learning model NucleicNet, attributes such as binding preference of RNA backbone constituents and different bases can be predicted from local physicochemical characteristics of protein structure surface. On a diverse set of challenging RNA-binding proteins, including Fem-3-binding-factor 2, Argonaute 2 and Ribonuclease III, NucleicNet can accurately recover interaction modes discovered by structural biology experiments. Furthermore, we show that, without seeing any in vitro or in vivo assay data, NucleicNet can still achieve consistency with experiments, including RNAcompete, Immunoprecipitation Assay, and siRNA Knockdown Benchmark. NucleicNet can thus serve to provide quantitative fitness of RNA sequences for given binding pockets or to predict potential binding pockets and binding RNAs for previously unknown RNA binding proteins.
Collapse
|
18
|
Pan Y, Wang Z, Zhan W, Deng L. Computational identification of binding energy hot spots in protein-RNA complexes using an ensemble approach. Bioinformatics 2019; 34:1473-1480. [PMID: 29281004 DOI: 10.1093/bioinformatics/btx822] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Accepted: 12/19/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation Identifying RNA-binding residues, especially energetically favored hot spots, can provide valuable clues for understanding the mechanisms and functional importance of protein-RNA interactions. Yet, limited availability of experimentally recognized energy hot spots in protein-RNA crystal structures leads to the difficulties in developing empirical identification approaches. Computational prediction of RNA-binding hot spot residues is still in its infant stage. Results Here, we describe a computational method, PrabHot (Prediction of protein-RNA binding hot spots), that can effectively detect hot spot residues on protein-RNA binding interfaces using an ensemble of conceptually different machine learning classifiers. Residue interaction network features and new solvent exposure characteristics are combined together and selected for classification with the Boruta algorithm. In particular, two new reference datasets (benchmark and independent) have been generated containing 107 hot spots from 47 known protein-RNA complex structures. In 10-fold cross-validation on the training dataset, PrabHot achieves promising performances with an AUC score of 0.86 and a sensitivity of 0.78, which are significantly better than that of the pioneer RNA-binding hot spot prediction method HotSPRing. We also demonstrate the capability of our proposed method on the independent test dataset and gain a competitive advantage as a result. Availability and implementation The PrabHot webserver is freely available at http://denglab.org/PrabHot/. Contact leideng@csu.edu.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuliang Pan
- School of Software, Central South University, Changsha 410075, China
| | - Zixiang Wang
- School of Software, Central South University, Changsha 410075, China
| | - Weihua Zhan
- School of Electronics and Computer Science, Zhejiang Wanli University, Ningbo 315100, China
| | - Lei Deng
- School of Software, Central South University, Changsha 410075, China
- Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 200433, China
| |
Collapse
|
19
|
Song J, Liu G, Wang R, Sun L, Zhang P. A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods. BIOTECHNOL BIOTEC EQ 2019. [DOI: 10.1080/13102818.2019.1612275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
Affiliation(s)
- Jiazhi Song
- Department of Computational intelligence College of Computer Science and Technology, Jilin University, Changchun, PR China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, PR China
| | - Guixia Liu
- Department of Computational intelligence College of Computer Science and Technology, Jilin University, Changchun, PR China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, PR China
| | - Rongquan Wang
- Department of Computational intelligence College of Computer Science and Technology, Jilin University, Changchun, PR China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, PR China
| | - Liyan Sun
- Department of Computational intelligence College of Computer Science and Technology, Jilin University, Changchun, PR China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, PR China
| | - Ping Zhang
- Department of Computational intelligence College of Computer Science and Technology, Jilin University, Changchun, PR China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, PR China
| |
Collapse
|
20
|
Jung Y, El-Manzalawy Y, Dobbs D, Honavar VG. Partner-specific prediction of RNA-binding residues in proteins: A critical assessment. Proteins 2018; 87:198-211. [PMID: 30536635 PMCID: PMC6389706 DOI: 10.1002/prot.25639] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 10/10/2018] [Accepted: 11/29/2018] [Indexed: 01/06/2023]
Abstract
RNA-protein interactions play essential roles in regulating gene expression. While some RNA-protein interactions are "specific", that is, the RNA-binding proteins preferentially bind to particular RNA sequence or structural motifs, others are "non-RNA specific." Deciphering the protein-RNA recognition code is essential for comprehending the functional implications of these interactions and for developing new therapies for many diseases. Because of the high cost of experimental determination of protein-RNA interfaces, there is a need for computational methods to identify RNA-binding residues in proteins. While most of the existing computational methods for predicting RNA-binding residues in RNA-binding proteins are oblivious to the characteristics of the partner RNA, there is growing interest in methods for partner-specific prediction of RNA binding sites in proteins. In this work, we assess the performance of two recently published partner-specific protein-RNA interface prediction tools, PS-PRIP, and PRIdictor, along with our own new tools. Specifically, we introduce a novel metric, RNA-specificity metric (RSM), for quantifying the RNA-specificity of the RNA binding residues predicted by such tools. Our results show that the RNA-binding residues predicted by previously published methods are oblivious to the characteristics of the putative RNA binding partner. Moreover, when evaluated using partner-agnostic metrics, RNA partner-specific methods are outperformed by the state-of-the-art partner-agnostic methods. We conjecture that either (a) the protein-RNA complexes in PDB are not representative of the protein-RNA interactions in nature, or (b) the current methods for partner-specific prediction of RNA-binding residues in proteins fail to account for the differences in RNA partner-specific versus partner-agnostic protein-RNA interactions, or both.
Collapse
Affiliation(s)
- Yong Jung
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania.,Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, Pennsylvania.,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania
| | - Yasser El-Manzalawy
- Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, Pennsylvania.,Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, Pennsylvania.,College of Information Sciences and Technology, Pennsylvania State University, Pennsylvania
| | - Drena Dobbs
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, Iowa.,Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, Iowa
| | - Vasant G Honavar
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania.,Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, Pennsylvania.,Institute for Cyberscience, Pennsylvania State University, University Park, Pennsylvania.,Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, Pennsylvania.,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania.,College of Information Sciences and Technology, Pennsylvania State University, Pennsylvania
| |
Collapse
|
21
|
Chen F, Sun H, Wang J, Zhu F, Liu H, Wang Z, Lei T, Li Y, Hou T. Assessing the performance of MM/PBSA and MM/GBSA methods. 8. Predicting binding free energies and poses of protein-RNA complexes. RNA (NEW YORK, N.Y.) 2018; 24:1183-1194. [PMID: 29930024 PMCID: PMC6097651 DOI: 10.1261/rna.065896.118] [Citation(s) in RCA: 84] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 06/13/2018] [Indexed: 05/10/2023]
Abstract
Molecular docking provides a computationally efficient way to predict the atomic structural details of protein-RNA interactions (PRI), but accurate prediction of the three-dimensional structures and binding affinities for PRI is still notoriously difficult, partly due to the unreliability of the existing scoring functions for PRI. MM/PBSA and MM/GBSA are more theoretically rigorous than most scoring functions for protein-RNA docking, but their prediction performance for protein-RNA systems remains unclear. Here, we systemically evaluated the capability of MM/PBSA and MM/GBSA to predict the binding affinities and recognize the near-native binding structures for protein-RNA systems with different solvent models and interior dielectric constants (εin). For predicting the binding affinities, the predictions given by MM/GBSA based on the minimized structures in explicit solvent and the GBGBn1 model with εin = 2 yielded the highest correlation with the experimental data. Moreover, the MM/GBSA calculations based on the minimized structures in implicit solvent and the GBGBn1 model distinguished the near-native binding structures within the top 10 decoys for 117 out of the 148 protein-RNA systems (79.1%). This performance is better than all docking scoring functions studied here. Therefore, the MM/GBSA rescoring is an efficient way to improve the prediction capability of scoring functions for protein-RNA systems.
Collapse
Affiliation(s)
- Fu Chen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China
- College of Life and Environmental Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Huiyong Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Junmei Wang
- Department of Pharmaceutical Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, USA
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Hui Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Zhe Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Tailong Lei
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Youyong Li
- Institute of Functional Nano and Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
22
|
Chowdhury S, Zhang J, Kurgan L. In Silico Prediction and Validation of Novel RNA Binding Proteins and Residues in the Human Proteome. Proteomics 2018; 18:e1800064. [PMID: 29806170 DOI: 10.1002/pmic.201800064] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Revised: 05/05/2018] [Indexed: 12/22/2022]
Abstract
Deciphering a complete landscape of protein-RNA interactions in the human proteome remains an elusive challenge. We computationally elucidate RNA binding proteins (RBPs) using an approach that complements previous efforts. We employ two modern complementary sequence-based methods that provide accurate predictions from the structured and the intrinsically disordered sequences, even in the absence of sequence similarity to the known RBPs. We generate and analyze putative RNA binding residues on the whole proteome scale. Using a conservative setting that ensures low, 5% false positive rate, we identify 1511 putative RBPs that include 281 known RBPs and 166 RBPs that were previously predicted. We empirically demonstrate that these overlaps are statistically significant. We also validate the putative RBPs based on two major hallmarks of their RNA binding residues: high levels of evolutionary conservation and enrichment in charged amino acids. Moreover, we show that the novel RBPs are significantly under-annotated functionally which coincides with the fact that they were not yet found to interact with RNAs. We provide two examples of our novel putative RBPs for which there is recent evidence of their interactions with RNAs. The dataset of novel putative RBPs and RNA binding residues for the future hypothesis generation is provided in the Supporting Information.
Collapse
Affiliation(s)
- Shomeek Chowdhury
- Dr. Vikram Sarabhai Institute of Cell and Molecular Biology, Maharaja Sayajirao University of Baroda, Gujarat, 390005, India.,Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - Jian Zhang
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA.,School of Computer and Information Technology, Xinyang Normal University, Xinyang, 464000, P. R. China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
23
|
Deciphering RNA-Recognition Patterns of Intrinsically Disordered Proteins. Int J Mol Sci 2018; 19:ijms19061595. [PMID: 29843482 PMCID: PMC6032373 DOI: 10.3390/ijms19061595] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Revised: 05/10/2018] [Accepted: 05/16/2018] [Indexed: 02/06/2023] Open
Abstract
Intrinsically disordered regions (IDRs) and protein (IDPs) are highly flexible owing to their lack of well-defined structures. A subset of such proteins interacts with various substrates; including RNA; frequently adopting regular structures in the final complex. In this work; we have analysed a dataset of protein–RNA complexes undergoing disorder-to-order transition (DOT) upon binding. We found that DOT regions are generally small in size (less than 3 residues) for RNA binding proteins. Like structured proteins; positively charged residues are found to interact with RNA molecules; indicating the dominance of electrostatic and cation-π interactions. However, a comparison of binding frequency shows that interface hydrophobic and aromatic residues have more interactions in only DOT regions than in a protein. Further; DOT regions have significantly higher exposure to water than their structured counterparts. Interactions of DOT regions with RNA increase the sheet formation with minor changes in helix forming residues. We have computed the interaction energy for amino acids–nucleotide pairs; which showed the preference of His–G; Asn–U and Ser–U at for the interface of DOT regions. This study provides insights to understand protein–RNA interactions and the results could also be used for developing a tool for identifying DOT regions in RNA binding proteins.
Collapse
|
24
|
Tang Y, Liu D, Wang Z, Wen T, Deng L. A boosting approach for prediction of protein-RNA binding residues. BMC Bioinformatics 2017; 18:465. [PMID: 29219069 PMCID: PMC5773889 DOI: 10.1186/s12859-017-1879-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Background RNA binding proteins play important roles in post-transcriptional RNA processing and transcriptional regulation. Distinguishing the RNA-binding residues in proteins is crucial for understanding how protein and RNA recognize each other and function together as a complex. Results We propose PredRBR, an effectively computational approach to predict RNA-binding residues. PredRBR is built with gradient tree boosting and an optimal feature set selected from a large number of sequence and structure characteristics and two categories of structural neighborhood properties. In cross-validation experiments on the RBP170 data set show that PredRBR achieves an overall accuracy of 0.84, a sensitivity of 0.85, MCC of 0.55 and AUC of 0.92, which are significantly better than that of other widely used machine learning algorithms such as Support Vector Machine, Random Forest, and Adaboost. We further calculate the feature importance of different feature categories and find that structural neighborhood characteristics are critical in the recognization of RNA binding residues. Also, PredRBR yields significantly better prediction accuracy on an independent test set (RBP101) in comparison with other state-of-the-art methods. Conclusions The superior performance over existing RNA-binding residue prediction methods indicates the importance of the gradient tree boosting algorithm combined with the optimal selected features.
Collapse
Affiliation(s)
- Yongjun Tang
- Department of Clinical Pharmacology, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, 410008, China.,Institute of Clinical Pharmacology, Hunan Key Laboratory of Pharmacogenetics, Central South University, 87 Xiangya Road, Changsha, 410008, China.,Department of Pediatrics, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, 410008, China
| | - Diwei Liu
- School of Software, Central South University, No.22 Shaoshan South Road, Changsha, 410075, China
| | - Zixiang Wang
- School of Software, Central South University, No.22 Shaoshan South Road, Changsha, 410075, China
| | - Ting Wen
- School of Software, Central South University, No.22 Shaoshan South Road, Changsha, 410075, China
| | - Lei Deng
- School of Software, Central South University, No.22 Shaoshan South Road, Changsha, 410075, China.
| |
Collapse
|
25
|
Yan J, Kurgan L. DRNApred, fast sequence-based method that accurately predicts and discriminates DNA- and RNA-binding residues. Nucleic Acids Res 2017; 45:e84. [PMID: 28132027 PMCID: PMC5449545 DOI: 10.1093/nar/gkx059] [Citation(s) in RCA: 82] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 01/24/2017] [Indexed: 01/18/2023] Open
Abstract
Protein-DNA and protein-RNA interactions are part of many diverse and essential cellular functions and yet most of them remain to be discovered and characterized. Recent research shows that sequence-based predictors of DNA-binding residues accurately find these residues but also cross-predict many RNA-binding residues as DNA-binding, and vice versa. Most of these methods are also relatively slow, prohibiting applications on the whole-genome scale. We describe a novel sequence-based method, DRNApred, which accurately and in high-throughput predicts and discriminates between DNA- and RNA-binding residues. DRNApred was designed using a new dataset with both DNA- and RNA-binding proteins, regression that penalizes cross-predictions, and a novel two-layered architecture. DRNApred outperforms state-of-the-art predictors of DNA- or RNA-binding residues on a benchmark test dataset by substantially reducing the cross predictions and predicting arguably higher quality false positives that are located nearby the native binding residues. Moreover, it also more accurately predicts the DNA- and RNA-binding proteins. Application on the human proteome confirms that DRNApred reduces the cross predictions among the native nucleic acid binders. Also, novel putative DNA/RNA-binding proteins that it predicts share similar subcellular locations and residue charge profiles with the known native binding proteins. Webserver of DRNApred is freely available at http://biomine.cs.vcu.edu/servers/DRNApred/.
Collapse
Affiliation(s)
- Jing Yan
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, 23284, USA
| |
Collapse
|
26
|
Zhang J, Kurgan L. Review and comparative assessment of sequence-based predictors of protein-binding residues. Brief Bioinform 2017; 19:821-837. [DOI: 10.1093/bib/bbx022] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Indexed: 12/31/2022] Open
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
27
|
Eisinga R, Heskes T, Pelzer B, Te Grotenhuis M. Exact p-values for pairwise comparison of Friedman rank sums, with application to comparing classifiers. BMC Bioinformatics 2017; 18:68. [PMID: 28122501 PMCID: PMC5267387 DOI: 10.1186/s12859-017-1486-2] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2016] [Accepted: 01/11/2017] [Indexed: 12/21/2022] Open
Abstract
Background The Friedman rank sum test is a widely-used nonparametric method in computational biology. In addition to examining the overall null hypothesis of no significant difference among any of the rank sums, it is typically of interest to conduct pairwise comparison tests. Current approaches to such tests rely on large-sample approximations, due to the numerical complexity of computing the exact distribution. These approximate methods lead to inaccurate estimates in the tail of the distribution, which is most relevant for p-value calculation. Results We propose an efficient, combinatorial exact approach for calculating the probability mass distribution of the rank sum difference statistic for pairwise comparison of Friedman rank sums, and compare exact results with recommended asymptotic approximations. Whereas the chi-squared approximation performs inferiorly to exact computation overall, others, particularly the normal, perform well, except for the extreme tail. Hence exact calculation offers an improvement when small p-values occur following multiple testing correction. Exact inference also enhances the identification of significant differences whenever the observed values are close to the approximate critical value. We illustrate the proposed method in the context of biological machine learning, were Friedman rank sum difference tests are commonly used for the comparison of classifiers over multiple datasets. Conclusions We provide a computationally fast method to determine the exact p-value of the absolute rank sum difference of a pair of Friedman rank sums, making asymptotic tests obsolete. Calculation of exact p-values is easy to implement in statistical software and the implementation in R is provided in one of the Additional files and is also available at http://www.ru.nl/publish/pages/726696/friedmanrsd.zip. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1486-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rob Eisinga
- Department of Social Science Research Methods, Radboud University Nijmegen, PO Box 9104, , 6500 HE, Nijmegen, The Netherlands.
| | - Tom Heskes
- Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - Ben Pelzer
- Department of Social Science Research Methods, Radboud University Nijmegen, PO Box 9104, , 6500 HE, Nijmegen, The Netherlands
| | - Manfred Te Grotenhuis
- Department of Social Science Research Methods, Radboud University Nijmegen, PO Box 9104, , 6500 HE, Nijmegen, The Netherlands
| |
Collapse
|
28
|
Walia RR, El-Manzalawy Y, Honavar VG, Dobbs D. Sequence-Based Prediction of RNA-Binding Residues in Proteins. Methods Mol Biol 2017; 1484:205-235. [PMID: 27787829 PMCID: PMC5796408 DOI: 10.1007/978-1-4939-6406-2_15] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Identifying individual residues in the interfaces of protein-RNA complexes is important for understanding the molecular determinants of protein-RNA recognition and has many potential applications. Recent technical advances have led to several high-throughput experimental methods for identifying partners in protein-RNA complexes, but determining RNA-binding residues in proteins is still expensive and time-consuming. This chapter focuses on available computational methods for identifying which amino acids in an RNA-binding protein participate directly in contacting RNA. Step-by-step protocols for using three different web-based servers to predict RNA-binding residues are described. In addition, currently available web servers and software tools for predicting RNA-binding sites, as well as databases that contain valuable information about known protein-RNA complexes, RNA-binding motifs in proteins, and protein-binding recognition sites in RNA are provided. We emphasize sequence-based methods that can reliably identify interfacial residues without the requirement for structural information regarding either the RNA-binding protein or its RNA partner.
Collapse
Affiliation(s)
| | - Yasser El-Manzalawy
- College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Vasant G Honavar
- College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Drena Dobbs
- Genetics, Development and Cell Biology Department, Iowa State University, 3112 Molecular Biology Building, Ames, IA, 50011-3650, USA.
| |
Collapse
|
29
|
Atabekova AK, Pankratenko AV, Makarova SS, Lazareva EA, Owens RA, Solovyev AG, Morozov SY. Phylogenetic and functional analyses of a plant protein related to human B-cell receptor-associated proteins. Biochimie 2017; 132:28-37. [PMID: 27770627 DOI: 10.1016/j.biochi.2016.10.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2016] [Accepted: 10/17/2016] [Indexed: 12/20/2022]
Abstract
Human B-cell receptor-associated protein BAP31 (HsBAP31) is the endoplasmic reticulum-resident protein involved in protein sorting and transport as well as pro-apoptotic signaling. Plant orthologs of HsBAP31 termed 'plant BAP-like proteins' (PBL proteins) have thus far remained unstudied. Recently, the PBL protein from Nicotiana tabacum (NtPBL) was identified as an interactor of Nt-4/1, a plant protein known to interact with plant virus movement proteins and affect the long-distance transport of potato spindle tuber viroid (PSTVd) via the phloem. Here, we have compared the sequences of PBL proteins and studied the biochemical properties of NtPBL. Analysis of a number of fully sequenced plant genomes revealed that PBL-encoding genes represent a small multigene family with up to six members per genome. Two conserved motifs were identified in the C-terminal region of PBL proteins. The NtPBL C-terminal hydrophilic region (NtPBL-C) was expressed in bacterial cells, purified, and used for analysis of its RNA binding properties in vitro. In gel shift experiments, NtPBL-C was found to bind several tested RNAs, showing the most efficient binding to microRNA precursors (pre-miRNA) and less efficient interaction with PSTVd. Mutational analysis suggested that NtPBL-C has a composite RNA-binding site, with two conserved lysine residues in the most C-terminal protein region being involved in binding of pre-miRNA but not PSTVd RNA. Virus-mediated transient expression of NtPBL-C in plants resulted in stunting and leaf malformation, developmental abnormalities similar to those described previously for blockage of miRNA biogenesis/function. We hypothesize that the NtPBL protein represents a previously undiscovered component of the miRNA pathway.
Collapse
Affiliation(s)
- Anastasia K Atabekova
- Department of Virology, Biological Faculty, Moscow State University, Moscow, 119992, Russia
| | - Anna V Pankratenko
- Department of Virology, Biological Faculty, Moscow State University, Moscow, 119992, Russia
| | - Svetlana S Makarova
- Department of Virology, Biological Faculty, Moscow State University, Moscow, 119992, Russia
| | - Ekaterina A Lazareva
- Department of Virology, Biological Faculty, Moscow State University, Moscow, 119992, Russia
| | - Robert A Owens
- Molecular Plant Pathology Laboratory, USDA-ARS, Beltsville, MD, 20705, USA
| | - Andrey G Solovyev
- Belozersky Institute of Physico-Chemical Biology, Moscow State University, Moscow, 119992, Russia; Sechenov First Moscow State Medical University, Institute of Molecular Medicine, Moscow, 119991, Russia
| | - Sergey Y Morozov
- Department of Virology, Biological Faculty, Moscow State University, Moscow, 119992, Russia; Belozersky Institute of Physico-Chemical Biology, Moscow State University, Moscow, 119992, Russia.
| |
Collapse
|
30
|
Abstract
Experimental methods for identifying protein(s) bound by a specific promoter-associated RNA (paRNA) of interest can be expensive, difficult, and time-consuming. This chapter describes a general computational framework for identifying potential binding partners in RNA-protein complexes or RNA-protein interaction networks. Protocols for using three web-based tools to predict RNA-protein interaction partners are outlined. Also, tables listing additional webservers and software tools for predicting RNA-protein interactions, as well as databases that contain valuable information about known RNA-protein complexes and recognition sites for RNA-binding proteins, are provided. Although only one of the tools described, lncPro, was designed expressly to identify proteins that bind long noncoding RNAs (including paRNAs), all three approaches can be applied to predict potential binding partners for both coding and noncoding RNAs (ncRNAs).
Collapse
Affiliation(s)
- Carla M Mann
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA
| | - Usha K Muppirala
- Genome Informatics Facility, Iowa State University, Ames, IA, 50011, USA
| | - Drena Dobbs
- Genetics, Development and Cell Biology Department, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
31
|
Kunz M, Wolf B, Schulze H, Atlan D, Walles T, Walles H, Dandekar T. Non-Coding RNAs in Lung Cancer: Contribution of Bioinformatics Analysis to the Development of Non-Invasive Diagnostic Tools. Genes (Basel) 2016; 8:E8. [PMID: 28035947 PMCID: PMC5295003 DOI: 10.3390/genes8010008] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2016] [Revised: 12/05/2016] [Accepted: 12/15/2016] [Indexed: 01/11/2023] Open
Abstract
Lung cancer is currently the leading cause of cancer related mortality due to late diagnosis and limited treatment intervention. Non-coding RNAs are not translated into proteins and have emerged as fundamental regulators of gene expression. Recent studies reported that microRNAs and long non-coding RNAs are involved in lung cancer development and progression. Moreover, they appear as new promising non-invasive biomarkers for early lung cancer diagnosis. Here, we highlight their potential as biomarker in lung cancer and present how bioinformatics can contribute to the development of non-invasive diagnostic tools. For this, we discuss several bioinformatics algorithms and software tools for a comprehensive understanding and functional characterization of microRNAs and long non-coding RNAs.
Collapse
Affiliation(s)
- Meik Kunz
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, University of Wuerzburg, 97074 Wuerzburg, Germany.
| | - Beat Wolf
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, University of Wuerzburg, 97074 Wuerzburg, Germany.
- University of Applied Sciences and Arts of Western Switzerland, Perolles 80, 1700 Fribourg, Switzerland.
| | - Harald Schulze
- Institute of Experimental Biomedicine, University Hospital Wuerzburg, 97080 Wuerzburg, Germany.
| | - David Atlan
- Phenosystems SA, 137 Rue de Tubize, 1440 Braine le Château, Belgium.
| | - Thorsten Walles
- Department of Cardiothoracic Surgery, University Hospital of Wuerzburg, 97080 Wuerzburg, Germany.
| | - Heike Walles
- Department of Tissue Engineering and Regenerative Medicine, University Hospital Wuerzburg, Roentgenring 11, 97070 Wuerzburg, Germany.
- Translational Center Wuerzburg "Regenerative therapies in oncology and musculoskeletal disease" Wuerzburg branch of the Fraunhofer Institute Interfacial Engineering and Biotechnology (IGB), Roentgenring 11, 97070 Wuerzburg, Germany.
| | - Thomas Dandekar
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, University of Wuerzburg, 97074 Wuerzburg, Germany.
- BioComputing Unit, European Molecular Biology Laboratory (EMBL) Heidelberg, Meyerhofstraße 1, 69117 Heidelberg, Germany.
| |
Collapse
|
32
|
Zhang X, Liu S. RBPPred: predicting RNA-binding proteins from sequence using SVM. Bioinformatics 2016; 33:854-862. [DOI: 10.1093/bioinformatics/btw730] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2016] [Accepted: 11/16/2016] [Indexed: 11/13/2022] Open
|
33
|
Protein-RNA interactions: structural biology and computational modeling techniques. Biophys Rev 2016; 8:359-367. [PMID: 28510023 DOI: 10.1007/s12551-016-0223-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Accepted: 09/20/2016] [Indexed: 12/30/2022] Open
Abstract
RNA-binding proteins are functionally diverse within cells, being involved in RNA-metabolism, translation, DNA damage repair, and gene regulation at both the transcriptional and post-transcriptional levels. Much has been learnt about their interactions with RNAs through structure determination techniques and computational modeling. This review gives an overview of the structural data currently available for protein-RNA complexes, and discusses the technical issues facing structural biologists working to solve their structures. The review focuses on three techniques used to solve the 3-dimensional structure of protein-RNA complexes at atomic resolution, namely X-ray crystallography, solution nuclear magnetic resonance (NMR) and cryo-electron microscopy (cryo-EM). The review then focuses on the main computational modeling techniques that use these atomic resolution data: discussing the prediction of RNA-binding sites on unbound proteins, docking proteins, and RNAs, and modeling the molecular dynamics of the systems. In conclusion, the review looks at the future directions this field of research might take.
Collapse
|
34
|
Pérez-Cano L, Romero-Durana M, Fernández-Recio J. Structural and energy determinants in protein-RNA docking. Methods 2016; 118-119:163-170. [PMID: 27816523 DOI: 10.1016/j.ymeth.2016.11.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Revised: 10/14/2016] [Accepted: 11/01/2016] [Indexed: 01/02/2023] Open
Abstract
Deciphering the structural and energetic determinants of protein-RNA interactions harbors the potential to understand key cell processes at molecular level, such as gene expression and regulation. With this purpose, computational methods like docking aim to complement current biophysical and structural biology efforts. However, the few reported docking algorithms for protein-RNA interactions show limited predictive success rates, mainly due to incomplete sampling of the conformational space of both the protein and the RNA molecules, as well as to the difficulties of the scoring function in identifying the correct docking models. Here, we have tested the predictive value of a variety of knowledge-based and energetic scoring functions on a recently published protein-RNA docking benchmark and developed a scoring function able to efficiently discriminate docking decoys. We first performed docking calculations with the bound conformation, which allowed us to analyze the problem in optimal conditions. We found that geometry-based terms and electrostatics were the most important scoring terms, while binding propensities and desolvation were much less relevant for the scoring of protein-RNA models. This is in contrast with what we observed for protein-protein docking. The results also showed an interesting dependence of the predictive rates on the flexibility of the protein molecule, which arises from the observed higher positive charge of flexible interfaces and provides hints for future development of more efficient protein-RNA docking methods.
Collapse
Affiliation(s)
- Laura Pérez-Cano
- Joint BSC-CRG-IRB Research Program in Computational Biology, Life Sciences Department, Barcelona Supercomputing Center (BSC), Jordi Girona 29, Barcelona 08034, Spain; Center for Neurobehavioral Genetics and Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Miguel Romero-Durana
- Joint BSC-CRG-IRB Research Program in Computational Biology, Life Sciences Department, Barcelona Supercomputing Center (BSC), Jordi Girona 29, Barcelona 08034, Spain
| | - Juan Fernández-Recio
- Joint BSC-CRG-IRB Research Program in Computational Biology, Life Sciences Department, Barcelona Supercomputing Center (BSC), Jordi Girona 29, Barcelona 08034, Spain.
| |
Collapse
|
35
|
EL-Manzalawy Y, Abbas M, Malluhi Q, Honavar V. FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues. PLoS One 2016; 11:e0158445. [PMID: 27383535 PMCID: PMC4934694 DOI: 10.1371/journal.pone.0158445] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Accepted: 06/16/2016] [Indexed: 11/24/2022] Open
Abstract
A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.
Collapse
Affiliation(s)
- Yasser EL-Manzalawy
- College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, United States of America
- Systems and Computer Engineering, Al-Azhar University, Cairo, Egypt
- * E-mail:
| | - Mostafa Abbas
- KINDI Center for Computing Research, College of Engineering, Qatar University, Duha, Qatar
| | - Qutaibah Malluhi
- KINDI Center for Computing Research, College of Engineering, Qatar University, Duha, Qatar
| | - Vasant Honavar
- College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, United States of America
| |
Collapse
|
36
|
Sun M, Wang X, Zou C, He Z, Liu W, Li H. Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors. BMC Bioinformatics 2016; 17:231. [PMID: 27266516 PMCID: PMC4897909 DOI: 10.1186/s12859-016-1110-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 06/02/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers. RESULTS In this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631. CONCLUSIONS The good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind .
Collapse
Affiliation(s)
- Meijian Sun
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Xia Wang
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Chuanxin Zou
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Zenghui He
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Wei Liu
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Honglin Li
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China.
| |
Collapse
|
37
|
Freire JM, Veiga AS, de la Torre BG, Santos NC, Andreu D, Da Poian AT, Castanho MARB. Peptides as models for the structure and function of viral capsid proteins: Insights on dengue virus capsid. Biopolymers 2016; 100:325-36. [PMID: 23868207 DOI: 10.1002/bip.22266] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2013] [Revised: 04/11/2013] [Accepted: 04/19/2013] [Indexed: 12/24/2022]
Abstract
The structural organization of viral particles is among the most astonishing examples of molecular self-assembly in nature, involving proteins, nucleic acids, and, sometimes, lipids. Proper assembly is essential to produce well structured infectious virions. A great variety of structural arrangements can be found in viral particles. Nucleocapsids, for instance, may display highly ordered geometric shapes or consist in macroscopically amorphous packs of the viral genome. Alphavirus and flavivirus are viral genera that exemplify these extreme cases, the former comprising viral particles structured with a T = 4 icosahedral symmetry, whereas flavivirus capsids have no regular geometry. Dengue virus is a member of flavivirus genus and is used in this article to illustrate how viral protein-derived peptides can be used advantageously over full-length proteins to unravel the foundations of viral supramolecular assemblies. Membrane- and viral RNA-binding data of capsid protein-derived dengue virus peptides are used to explain the amorphous organization of the viral capsid. Our results combine bioinformatic and spectroscopic approaches using two- or three-component peptide and/or nucleic acid and/or lipid systems.
Collapse
Affiliation(s)
- João Miguel Freire
- Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028, Lisbon, Portugal
| | | | | | | | | | | | | |
Collapse
|
38
|
Wang W, Liu J, Sun L. Surface shapes and surrounding environment analysis of single- and double-stranded DNA-binding proteins in protein-DNA interface. Proteins 2016; 84:979-89. [PMID: 27038080 DOI: 10.1002/prot.25045] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2016] [Revised: 03/15/2016] [Accepted: 03/25/2016] [Indexed: 11/12/2022]
Abstract
Protein-DNA bindings are critical to many biological processes. However, the structural mechanisms underlying these interactions are not fully understood. Here, we analyzed the residues shape (peak, flat, or valley) and the surrounding environment of double-stranded DNA-binding proteins (DSBs) and single-stranded DNA-binding proteins (SSBs) in protein-DNA interfaces. In the results, we found that the interface shapes, hydrogen bonds, and the surrounding environment present significant differences between the two kinds of proteins. Built on the investigation results, we constructed a random forest (RF) classifier to distinguish DSBs and SSBs with satisfying performance. In conclusion, we present a novel methodology to characterize protein interfaces, which will deepen our understanding of the specificity of proteins binding to ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA). Proteins 2016; 84:979-989. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Wei Wang
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453007, China.,Laboratory of Computation Intelligence and Information Processing, Engineering Technology Research Center for Computing Intelligence and Data Mining, Henan Province, China
| | - Juan Liu
- Institute of Computer Software, School of Computer, Wuhan University, Wuhan, 430072, China
| | - Lin Sun
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453007, China.,Laboratory of Computation Intelligence and Information Processing, Engineering Technology Research Center for Computing Intelligence and Data Mining, Henan Province, China
| |
Collapse
|
39
|
Martins JR, Bitondi MMG. The HEX 110 Hexamerin Is a Cytoplasmic and Nucleolar Protein in the Ovaries of Apis mellifera. PLoS One 2016; 11:e0151035. [PMID: 26954256 PMCID: PMC4783013 DOI: 10.1371/journal.pone.0151035] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 02/10/2016] [Indexed: 11/21/2022] Open
Abstract
Hexamerins are insect storage proteins abundantly secreted by the larval fat body into the haemolymph. The canonical role of hexamerins consists of serving as an amino acid reserve for development toward the adult stage. However, in Apis mellifera, immunofluorescence assays coupled to confocal laser-scanning microscopy, and high-throughput sequencing, have recently shown the presence of hexamerins in other organs than the fat body. These findings have led us to study these proteins with the expectation of uncovering additional functions in insect development. We show here that a honeybee hexamerin, HEX 110, localizes in the cytoplasm and nucleus of ovarian cells. In the nucleus of somatic and germline cells, HEX 110 colocalized with a nucleolar protein, fibrillarin, suggesting a structural or even regulatory function in the nucleolus. RNase A provoked the loss of HEX 110 signals in the ovarioles, indicating that the subcellular localization depends on RNA. This was reinforced by incubating ovaries with pyronin Y, a RNA-specific dye. Together, the colocalization with fibrillarin and pyronin Y, and the sensitivity to RNase, highlight unprecedented roles for HEX110 in the nucleolus, the nuclear structure harbouring the gene cluster involved in ribosomal RNA production. However, the similar patterns of HEX 110 foci distribution in the active and inactive ovaries of queens and workers preclude its association with the functional status of these organs.
Collapse
Affiliation(s)
- Juliana Ramos Martins
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Monte Alegre, Ribeirão Preto, São Paulo, Brazil
| | - Márcia Maria Gentile Bitondi
- Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Monte Alegre, Ribeirão Preto, São Paulo, Brazil
| |
Collapse
|
40
|
MUPPIRALA USHA, LEWIS BENJAMINA, MANN CARLAM, DOBBS DRENA. A MOTIF-BASED METHOD FOR PREDICTING INTERFACIAL RESIDUES IN BOTH THE RNA AND PROTEIN COMPONENTS OF PROTEIN-RNA COMPLEXES. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 21:445-455. [PMID: 26776208 PMCID: PMC4721245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Efforts to predict interfacial residues in protein-RNA complexes have largely focused on predicting RNA-binding residues in proteins. Computational methods for predicting protein-binding residues in RNA sequences, however, are a problem that has received relatively little attention to date. Although the value of sequence motifs for classifying and annotating protein sequences is well established, sequence motifs have not been widely applied to predicting interfacial residues in macromolecular complexes. Here, we propose a novel sequence motif-based method for "partner-specific" interfacial residue prediction. Given a specific protein-RNA pair, the goal is to simultaneously predict RNA binding residues in the protein sequence and protein-binding residues in the RNA sequence. In 5-fold cross validation experiments, our method, PS-PRIP, achieved 92% Specificity and 61% Sensitivity, with a Matthews correlation coefficient (MCC) of 0.58 in predicting RNA-binding sites in proteins. The method achieved 69% Specificity and 75% Sensitivity, but with a low MCC of 0.13 in predicting protein binding sites in RNAs. Similar performance results were obtained when PS-PRIP was tested on two independent "blind" datasets of experimentally validated protein- RNA interactions, suggesting the method should be widely applicable and valuable for identifying potential interfacial residues in protein-RNA complexes for which structural information is not available. The PS-PRIP webserver and datasets are available at: http://pridb.gdcb.iastate.edu/PSPRIP/.
Collapse
MESH Headings
- Amino Acid Motifs
- Amino Acid Sequence
- Base Sequence
- Binding Sites/genetics
- Computational Biology/methods
- Computational Biology/statistics & numerical data
- Databases, Nucleic Acid/statistics & numerical data
- Databases, Protein/statistics & numerical data
- Escherichia coli Proteins/chemistry
- Escherichia coli Proteins/genetics
- Escherichia coli Proteins/metabolism
- Models, Molecular
- Protein Binding
- RNA/chemistry
- RNA/genetics
- RNA/metabolism
- RNA, Bacterial/chemistry
- RNA, Bacterial/genetics
- RNA, Bacterial/metabolism
- RNA, Ribosomal, 16S/chemistry
- RNA, Ribosomal, 16S/genetics
- RNA, Ribosomal, 16S/metabolism
- RNA-Binding Proteins/chemistry
- RNA-Binding Proteins/genetics
- RNA-Binding Proteins/metabolism
- Ribosomal Proteins/chemistry
- Ribosomal Proteins/genetics
- Ribosomal Proteins/metabolism
- Software
Collapse
Affiliation(s)
- USHA MUPPIRALA
- Genome Informatics Facility, Iowa State University, Ames, Iowa, 50011, USA
| | - BENJAMIN A LEWIS
- Department of Computer Science, Truman State University, Kirksville, Missouri, 63501, USA,
| | - CARLA M. MANN
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, Iowa, 50011, USA,
| | - DRENA DOBBS
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, Iowa, 50011, USA,
| |
Collapse
|
41
|
Cannon JGD, Sherman RM, Wang VMY, Newman GA. Cross-species conservation of complementary amino acid-ribonucleobase interactions and their potential for ribosome-free encoding. Sci Rep 2015; 5:18054. [PMID: 26656258 PMCID: PMC4674897 DOI: 10.1038/srep18054] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 11/02/2015] [Indexed: 01/01/2023] Open
Abstract
The role of amino acid-RNA nucleobase interactions in the evolution of RNA translation and protein-mRNA autoregulation remains an open area of research. We describe the inference of pairwise amino acid-RNA nucleobase interaction preferences using structural data from known RNA-protein complexes. We observed significant matching between an amino acid’s nucleobase affinity and corresponding codon content in both the standard genetic code and mitochondrial variants. Furthermore, we showed that knowledge of nucleobase preferences allows statistically significant prediction of protein primary sequence from mRNA using purely physiochemical information. Interestingly, ribosomal primary sequences were more accurately predicted than non-ribosomal sequences, suggesting a potential role for direct amino acid-nucleobase interactions in the genesis of amino acid-based ribosomal components. Finally, we observed matching between amino acid-nucleobase affinities and corresponding mRNA sequences in 35 evolutionarily diverse proteomes. We believe these results have important implications for the study of the evolutionary origins of the genetic code and protein-mRNA cross-regulation.
Collapse
Affiliation(s)
- John G D Cannon
- Department of Biology, Carleton College, 1 College Street, Northfield MN, 55057, United States
| | - Rachel M Sherman
- Department of Biology, Harvey Mudd College, 301 Platt Blvd, Claremont CA 91711, United States.,Department of Computer Science, Harvey Mudd College, 301 Platt Blvd, Claremont CA 91711, United States
| | - Victoria M Y Wang
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge, CB2 1GA, United Kingdom
| | - Grace A Newman
- Department of Mathematics, Carleton College, 1 College Street, Northfield MN, 55057, United States
| |
Collapse
|
42
|
Xue LC, Dobbs D, Bonvin AMJJ, Honavar V. Computational prediction of protein interfaces: A review of data driven methods. FEBS Lett 2015; 589:3516-26. [PMID: 26460190 PMCID: PMC4655202 DOI: 10.1016/j.febslet.2015.10.003] [Citation(s) in RCA: 111] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Revised: 10/01/2015] [Accepted: 10/02/2015] [Indexed: 01/06/2023]
Abstract
Reliably pinpointing which specific amino acid residues form the interface(s) between a protein and its binding partner(s) is critical for understanding the structural and physicochemical determinants of protein recognition and binding affinity, and has wide applications in modeling and validating protein interactions predicted by high-throughput methods, in engineering proteins, and in prioritizing drug targets. Here, we review the basic concepts, principles and recent advances in computational approaches to the analysis and prediction of protein-protein interfaces. We point out caveats for objectively evaluating interface predictors, and discuss various applications of data-driven interface predictors for improving energy model-driven protein-protein docking. Finally, we stress the importance of exploiting binding partner information in reliably predicting interfaces and highlight recent advances in this emerging direction.
Collapse
Affiliation(s)
- Li C Xue
- Faculty of Science - Chemistry, Bijvoet Center for Biomolecular Research, Utrecht Univ., Utrecht 3584 CH, The Netherlands.
| | - Drena Dobbs
- Department of Genetics, Development & Cell Biology, Iowa State Univ., Ames, IA 50011, USA; Bioinformatics & Computational Biology Program, Iowa State Univ., Ames, IA 50011, USA
| | - Alexandre M J J Bonvin
- Faculty of Science - Chemistry, Bijvoet Center for Biomolecular Research, Utrecht Univ., Utrecht 3584 CH, The Netherlands
| | - Vasant Honavar
- College of Information Sciences & Technology, Pennsylvania State Univ., University Park, PA 16802, USA; Genomics & Bioinformatics Program, Pennsylvania State Univ., University Park, PA 16802, USA; Neuroscience Program, Pennsylvania State Univ., University Park, PA 16802, USA; The Huck Institutes of the Life Sciences, Pennsylvania State Univ., University Park, PA 16802, USA; Center for Big Data Analytics & Discovery Informatics, Pennsylvania State Univ., University Park, PA 16802, USA; Institute for Cyberscience, Pennsylvania State Univ., University Park, PA 16802, USA
| |
Collapse
|
43
|
Computational Prediction of RNA-Binding Proteins and Binding Sites. Int J Mol Sci 2015; 16:26303-17. [PMID: 26540053 PMCID: PMC4661811 DOI: 10.3390/ijms161125952] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Revised: 10/20/2015] [Accepted: 10/23/2015] [Indexed: 11/19/2022] Open
Abstract
Proteins and RNA interaction have vital roles in many cellular processes such as protein synthesis, sequence encoding, RNA transfer, and gene regulation at the transcriptional and post-transcriptional levels. Approximately 6%–8% of all proteins are RNA-binding proteins (RBPs). Distinguishing these RBPs or their binding residues is a major aim of structural biology. Previously, a number of experimental methods were developed for the determination of protein–RNA interactions. However, these experimental methods are expensive, time-consuming, and labor-intensive. Alternatively, researchers have developed many computational approaches to predict RBPs and protein–RNA binding sites, by combining various machine learning methods and abundant sequence and/or structural features. There are three kinds of computational approaches, which are prediction from protein sequence, prediction from protein structure, and protein-RNA docking. In this paper, we review all existing studies of predictions of RNA-binding sites and RBPs and complexes, including data sets used in different approaches, sequence and structural features used in several predictors, prediction method classifications, performance comparisons, evaluation methods, and future directions.
Collapse
|
44
|
Peng Z, Kurgan L. High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder. Nucleic Acids Res 2015; 43:e121. [PMID: 26109352 PMCID: PMC4605291 DOI: 10.1093/nar/gkv585] [Citation(s) in RCA: 117] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Revised: 04/24/2015] [Accepted: 05/24/2015] [Indexed: 01/05/2023] Open
Abstract
Intrinsically disordered proteins and regions (IDPs and IDRs) lack stable 3D structure under physiological conditions in-vitro, are common in eukaryotes, and facilitate interactions with RNA, DNA and proteins. Current methods for prediction of IDPs and IDRs do not provide insights into their functions, except for a handful of methods that address predictions of protein-binding regions. We report first-of-its-kind computational method DisoRDPbind for high-throughput prediction of RNA, DNA and protein binding residues located in IDRs from protein sequences. DisoRDPbind is implemented using a runtime-efficient multi-layered design that utilizes information extracted from physiochemical properties of amino acids, sequence complexity, putative secondary structure and disorder and sequence alignment. Empirical tests demonstrate that it provides accurate predictions that are competitive with other predictors of disorder-mediated protein binding regions and complementary to the methods that predict RNA- and DNA-binding residues annotated based on crystal structures. Application in Homo sapiens, Mus musculus, Caenorhabditis elegans and Drosophila melanogaster proteomes reveals that RNA- and DNA-binding proteins predicted by DisoRDPbind complement and overlap with the corresponding known binding proteins collected from several sources. Also, the number of the putative protein-binding regions predicted with DisoRDPbind correlates with the promiscuity of proteins in the corresponding protein-protein interaction networks. Webserver: http://biomine.ece.ualberta.ca/DisoRDPbind/.
Collapse
Affiliation(s)
- Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, P.R. China Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, T6G 2V4, Canada
| | - Lukasz Kurgan
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, T6G 2V4, Canada
| |
Collapse
|
45
|
Varadi M, Zsolyomi F, Guharoy M, Tompa P. Functional Advantages of Conserved Intrinsic Disorder in RNA-Binding Proteins. PLoS One 2015; 10:e0139731. [PMID: 26439842 PMCID: PMC4595337 DOI: 10.1371/journal.pone.0139731] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 09/15/2015] [Indexed: 11/22/2022] Open
Abstract
Proteins form large macromolecular assemblies with RNA that govern essential molecular processes. RNA-binding proteins have often been associated with conformational flexibility, yet the extent and functional implications of their intrinsic disorder have never been fully assessed. Here, through large-scale analysis of comprehensive protein sequence and structure datasets we demonstrate the prevalence of intrinsic structural disorder in RNA-binding proteins and domains. We addressed their functionality through a quantitative description of the evolutionary conservation of disordered segments involved in binding, and investigated the structural implications of flexibility in terms of conformational stability and interface formation. We conclude that the functional role of intrinsically disordered protein segments in RNA-binding is two-fold: first, these regions establish extended, conserved electrostatic interfaces with RNAs via induced fit. Second, conformational flexibility enables them to target different RNA partners, providing multi-functionality, while also ensuring specificity. These findings emphasize the functional importance of intrinsically disordered regions in RNA-binding proteins.
Collapse
Affiliation(s)
- Mihaly Varadi
- Structural Biology Research Center (SBRC), Flemish Institute of Biotechnology (VIB), Brussels, Belgium; Structural Biology Brussel (SBB), Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | - Fruzsina Zsolyomi
- Structural Biology Research Center (SBRC), Flemish Institute of Biotechnology (VIB), Brussels, Belgium; Structural Biology Brussel (SBB), Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | - Mainak Guharoy
- Structural Biology Research Center (SBRC), Flemish Institute of Biotechnology (VIB), Brussels, Belgium; Structural Biology Brussel (SBB), Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | - Peter Tompa
- Structural Biology Research Center (SBRC), Flemish Institute of Biotechnology (VIB), Brussels, Belgium; Structural Biology Brussel (SBB), Vrije Universiteit Brussel (VUB), Brussels, Belgium; Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| |
Collapse
|
46
|
SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues. PLoS One 2015; 10:e0133260. [PMID: 26176857 PMCID: PMC4503397 DOI: 10.1371/journal.pone.0133260] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 06/25/2015] [Indexed: 11/19/2022] Open
Abstract
Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.
Collapse
|
47
|
Miao Z, Westhof E. Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score. Nucleic Acids Res 2015; 43:5340-51. [PMID: 25940624 PMCID: PMC4477668 DOI: 10.1093/nar/gkv446] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Revised: 04/23/2015] [Accepted: 04/24/2015] [Indexed: 11/13/2022] Open
Abstract
We describe a general binding score for predicting the nucleic acid binding probability in proteins. The score is directly derived from physicochemical and evolutionary features and integrates a residue neighboring network approach. Our process achieves stable and high accuracies on both DNA- and RNA-binding proteins and illustrates how the main driving forces for nucleic acid binding are common. Because of the effective integration of the synergetic effects of the network of neighboring residues and the fact that the prediction yields a hierarchical scoring on the protein surface, energy funnels for nucleic acid binding appear on protein surfaces, pointing to the dynamic process occurring in the binding of nucleic acids to proteins.
Collapse
Affiliation(s)
- Zhichao Miao
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 15 Rue Descartes, 67000 Strasbourg, France
| | - Eric Westhof
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 15 Rue Descartes, 67000 Strasbourg, France
| |
Collapse
|
48
|
Yan J, Friedrich S, Kurgan L. A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues. Brief Bioinform 2015; 17:88-105. [DOI: 10.1093/bib/bbv023] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Indexed: 01/07/2023] Open
|
49
|
Xiong D, Zeng J, Gong H. RBRIdent: An algorithm for improved identification of RNA-binding residues in proteins from primary sequences. Proteins 2015; 83:1068-77. [DOI: 10.1002/prot.24806] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 03/23/2015] [Accepted: 03/24/2015] [Indexed: 01/15/2023]
Affiliation(s)
- Dapeng Xiong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University; Beijing 100084 China
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University; Beijing 100084 China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University; Beijing 100084 China
| |
Collapse
|
50
|
Nagarajan R, Chothani SP, Ramakrishnan C, Sekijima M, Gromiha MM. Structure based approach for understanding organism specific recognition of protein-RNA complexes. Biol Direct 2015; 10:8. [PMID: 25886642 PMCID: PMC4352265 DOI: 10.1186/s13062-015-0039-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 02/03/2015] [Indexed: 12/11/2022] Open
Abstract
Background Protein-RNA interactions perform diverse functions within the cell. Understanding the recognition mechanism of protein-RNA complexes has been a challenging task in molecular and computational biology. In earlier works, the recognition mechanisms have been studied for a specific complex or using a set of non–redundant complexes. In this work, we have constructed 18 sets of same protein-RNA complexes belonging to different organisms from Protein Data Bank (PDB). The similarities and differences in each set of complexes have been revealed in terms of various sequence and structure based features such as root mean square deviation, sequence homology, propensity of binding site residues, variance, conservation at binding sites, binding segments, binding motifs of amino acid residues and nucleotides, preferred amino acid-nucleotide pairs and influence of neighboring residues for binding. Results We found that the proteins of mesophilic organisms have more number of binding sites than thermophiles and the binding propensities of amino acid residues are distinct in E. coli, H. sapiens, S. cerevisiae, thermophiles and archaea. Proteins prefer to bind with RNA using a single residue segment in all the organisms while RNA prefers to use a stretch of up to six nucleotides for binding with proteins. We have developed amino acid residue-nucleotide pair potentials for different organisms, which could be used for predicting the binding specificity. Further, molecular dynamics simulation studies on aspartyl tRNA synthetase complexed with aspartyl tRNA showed specific modes of recognition in E. coli, T. thermophilus and S. cerevisiae. Conclusion Based on structural analysis and molecular dynamics simulations we suggest that the mode of recognition depends on the type of the organism in a protein-RNA complex. Reviewers This article was reviewed by Sandor Pongor, Gajendra Raghava and Narayanaswamy Srinivasan. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0039-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Raju Nagarajan
- Department of Biotechnology, Bhupat Jyoti Metha School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India.
| | - Sonia Pankaj Chothani
- Department of Biotechnology, Bhupat Jyoti Metha School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India. .,Philips Research North America, 345 Scarborough Road, Briarcliff Manor, NY, 10510, USA.
| | - Chandrasekaran Ramakrishnan
- Department of Biotechnology, Bhupat Jyoti Metha School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India.
| | - Masakazu Sekijima
- Global Scientific Information and Computing Center (GSIC), Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8550, Japan.
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat Jyoti Metha School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India.
| |
Collapse
|