1
|
Luige J, Armaos A, Tartaglia GG, Ørom UAV. Predicting nuclear G-quadruplex RNA-binding proteins with roles in transcription and phase separation. Nat Commun 2024; 15:2585. [PMID: 38519458 PMCID: PMC10959947 DOI: 10.1038/s41467-024-46731-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 03/08/2024] [Indexed: 03/25/2024] Open
Abstract
RNA-binding proteins are central for many biological processes and their characterization has demonstrated a broad range of functions as well as a wide spectrum of target structures. RNA G-quadruplexes are important regulatory elements occurring in both coding and non-coding transcripts, yet our knowledge of their structure-based interactions is at present limited. Here, using theoretical predictions and experimental approaches, we show that many chromatin-binding proteins bind to RNA G-quadruplexes, and we classify them based on their RNA G-quadruplex-binding potential. Combining experimental identification of nuclear RNA G-quadruplex-binding proteins with computational approaches, we build a prediction tool that assigns probability score for a nuclear protein to bind RNA G-quadruplexes. We show that predicted G-quadruplex RNA-binding proteins exhibit a high degree of protein disorder and hydrophilicity and suggest involvement in both transcription and phase-separation into membrane-less organelles. Finally, we present the G4-Folded/UNfolded Nuclear Interaction Explorer System (G4-FUNNIES) for estimating RNA G4-binding propensities at http://service.tartaglialab.com/new_submission/G4FUNNIES .
Collapse
Affiliation(s)
- Johanna Luige
- RNA Biology and Innovation, Institute of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Alexandros Armaos
- Centre for Human Technologies (CHT), Istituto Italiano di Tecnologia (IIT), Via Enrico Melen, 83, 16152, Genova, Italy
| | - Gian Gaetano Tartaglia
- Centre for Human Technologies (CHT), Istituto Italiano di Tecnologia (IIT), Via Enrico Melen, 83, 16152, Genova, Italy.
- Catalan Institution for Research and Advanced Studies ICREA Passeig Lluis Companys, 23 08010, Barcelona, Spain.
| | - Ulf Andersson Vang Ørom
- RNA Biology and Innovation, Institute of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark.
| |
Collapse
|
2
|
Stuehler DS, Hunter WB, Carrillo-Tarazona Y, Espitia H, Cicero JM, Bell T, Mann HR, Clarke SKV, Paris TM, Metz JL, D'Elia T, Qureshi JA, Cano LM. Wild lime psyllid Leuronota fagarae Burckhardt (Hemiptera: Psylloidea) picorna-like virus full genome annotation and classification. J Invertebr Pathol 2023; 201:107995. [PMID: 37748676 DOI: 10.1016/j.jip.2023.107995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 09/14/2023] [Accepted: 09/21/2023] [Indexed: 09/27/2023]
Abstract
Picorna-like viruses of the order Picornavirales are a poorly defined group of positive-sense, single-stranded RNA viruses that include numerous pathogens known to infect plants, animals, and insects. A new picorna-like viral species was isolated from the wild lime psyllid (WLP), Leuronota fagarae, in the state of Florida, USA, and labelled: Leuronota fagarae picorna-like virus isolate FL (LfPLV-FL). The virus was found to have homology to a picorna-like virus identified in the Asian Citrus Psyllid (ACP), Diaphorina citri, collected in the state of Florida. Computational analysis of RNA extracts from WLP adult heads identified a 10,006-nucleotide sequence encoding a 2,942 amino acid polyprotein with similar functional domain structure to polyproteins of both Dicistroviridae and Iflaviridae. Sequence comparisons of nucleic acid and amino acid translations of the conserved RNA-dependent RNA polymerase, along with the entire N-terminal nonstructural coding region, provided insight into an evolutionary relationship of LfPLV-FL to insect-infecting iflaviruses. Viruses belonging to the family Iflaviridae encode a polyprotein of around 3000 amino acids in length that is processed post-translationally to produce components necessary for replication. The classification of a novel picorna-like virus in L. fagarae, with evolutionary characteristics similar to picorna-like viruses infecting Bactericera cockerelli and D. citri, provides an opportunity to examine virus host specificity, as well as identify critical components of the virus' genome required for successful transmission, infection, and replication. This bioinformatic classification allows for further insight into a novel virus species, and aids in the research of a closely related virus of the invasive psyllid, D. citri, a major pest of Floridian citriculture. The potential use of viral pathogens as expression vectors to manage the spread D. citri is an area that requires additional research; however, it may bring forth an effective control strategy to reduce the transmission of Candidatus Liberibacter asiaticus (CLas), the causative agent of Huanglongbing (HLB).
Collapse
Affiliation(s)
- Douglas S Stuehler
- ORISE Participant, DOE/USDA, ARS, Fort Pierce, FL 34945, USA; USDA, ARS, 2001 South Rock Road, Fort Pierce, FL 34945, USA.
| | - Wayne B Hunter
- USDA, ARS, 2001 South Rock Road, Fort Pierce, FL 34945, USA.
| | - Yisel Carrillo-Tarazona
- University of Florida, IFAS, Department of Plant Pathology, Indian River Research and Education Center, Fort Pierce, FL 34945, USA.
| | - Hector Espitia
- University of Florida, IFAS, Department of Plant Pathology, Indian River Research and Education Center, Fort Pierce, FL 34945, USA.
| | - Joseph M Cicero
- University of Florida, IFAS, Department of Plant Pathology, Indian River Research and Education Center, Fort Pierce, FL 34945, USA
| | - Tracey Bell
- Indian River State College, Fort Pierce, FL 34949, USA.
| | - Hannah R Mann
- Indian River State College, Fort Pierce, FL 34949, USA
| | | | - Thomson M Paris
- ORISE Participant, DOE/USDA, ARS, Fort Pierce, FL 34945, USA; USDA, ARS, 2001 South Rock Road, Fort Pierce, FL 34945, USA.
| | - Jackie L Metz
- University of Florida, IFAS, Department of Plant Pathology, Indian River Research and Education Center, Fort Pierce, FL 34945, USA.
| | - Tom D'Elia
- Department of Biology, Indian River State College, Fort Pierce, FL 34949, USA.
| | - Jawwad A Qureshi
- University of Florida, Southwest Florida Research and Education Center (SWFREC), 2685 SR 29 North Immokalee, FL 34142, USA.
| | - Liliana M Cano
- University of Florida, IFAS, Department of Plant Pathology, Indian River Research and Education Center, Fort Pierce, FL 34945, USA.
| |
Collapse
|
3
|
Sommerauer C, Kutter C. Noncoding RNAs in liver physiology and metabolic diseases. Am J Physiol Cell Physiol 2022; 323:C1003-C1017. [PMID: 35968891 DOI: 10.1152/ajpcell.00232.2022] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The liver holds central roles in detoxification, energy metabolism and whole-body homeostasis but can develop malignant phenotypes when being chronically overwhelmed with fatty acids and glucose. The global rise of metabolic-associated fatty liver disease (MAFLD) is already affecting a quarter of the global population. Pharmaceutical treatment options against different stages of MAFLD do not yet exist and several clinical trials against hepatic transcription factors and other proteins have failed. However, emerging roles of noncoding RNAs, including long (lncRNA) and short noncoding RNAs (sRNA), in various cellular processes pose exciting new avenues for treatment interventions. Actions of noncoding RNAs mostly rely on interactions with proteins, whereby the noncoding RNA fine-tunes protein function in a process termed riboregulation. The developmental stage-, disease stage- and cell type-specific nature of noncoding RNAs harbors enormous potential to precisely target certain cellular pathways in a spatio-temporally defined manner. Proteins interacting with RNAs can be categorized into canonical or non-canonical RNA binding proteins (RBPs) depending on the existence of classical RNA binding domains. Both, RNA- and RBP-centric methods have generated new knowledge of the RNA-RBP interface and added an additional regulatory layer. In this review, we summarize recent advances of how of RBP-lncRNA interactions and various sRNAs shape cellular physiology and the development of liver diseases such as MAFLD and hepatocellular carcinoma.
Collapse
Affiliation(s)
- Christian Sommerauer
- Science for Life Laboratory, Department of Microbiology, Tumor and Cell Biology, grid.4714.6Karolinska Institute, Stockholm, Sweden
| | - Claudia Kutter
- Science for Life Laboratory, Department of Microbiology, Tumor and Cell Biology, grid.4714.6Karolinska Institute, Stockholm, Sweden
| |
Collapse
|
4
|
PRIP: A Protein-RNA Interface Predictor Based on Semantics of Sequences. Life (Basel) 2022; 12:life12020307. [PMID: 35207594 PMCID: PMC8879494 DOI: 10.3390/life12020307] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 01/28/2022] [Accepted: 02/04/2022] [Indexed: 01/08/2023] Open
Abstract
RNA–protein interactions play an indispensable role in many biological processes. Growing evidence has indicated that aberration of the RNA–protein interaction is associated with many serious human diseases. The precise and quick detection of RNA–protein interactions is crucial to finding new functions and to uncovering the mechanism of interactions. Although many methods have been presented to recognize RNA-binding sites, there is much room left for the improvement of predictive accuracy. We present a sequence semantics-based method (called PRIP) for predicting RNA-binding interfaces. The PRIP extracted semantic embedding by pre-training the Word2vec with the corpus. Extreme gradient boosting was employed to train a classifier. The PRIP obtained a SN of 0.73 over the five-fold cross validation and a SN of 0.67 over the independent test, outperforming the state-of-the-art methods. Compared with other methods, this PRIP learned the hidden relations between words in the context. The analysis of the semantics relationship implied that the semantics of some words were specific to RNA-binding interfaces. This method is helpful to explore the mechanism of RNA–protein interactions from a semantics point of view.
Collapse
|
5
|
Alvarado-Marchena L, Marquez-Molins J, Martinez-Perez M, Aparicio F, Pallás V. Mapping of Functional Subdomains in the atALKBH9B m 6A-Demethylase Required for Its Binding to the Viral RNA and to the Coat Protein of Alfalfa Mosaic Virus. FRONTIERS IN PLANT SCIENCE 2021; 12:701683. [PMID: 34290728 PMCID: PMC8287571 DOI: 10.3389/fpls.2021.701683] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 06/09/2021] [Indexed: 06/01/2023]
Abstract
N 6-methyladenosine (m6A) modification is a dynamically regulated RNA modification that impacts many cellular processes and pathways. This epitranscriptomic methylation relies on the participation of RNA methyltransferases (referred to as "writers") and demethylases (referred to as "erasers"), respectively. We previously demonstrated that the Arabidopsis thaliana protein atALKBH9B showed m6A-demethylase activity and interacted with the coat protein (CP) of alfalfa mosaic virus (AMV), causing a profound impact on the viral infection cycle. To dissect the functional activity of atALKBH9B in AMV infection, we performed a protein-mapping analysis to identify the putative domains required for regulating this process. In this context, the mutational analysis of the protein revealed that the residues between 427 and 467 positions are critical for in vitro binding to the AMV RNA. The atALKBH9B amino acid sequence showed intrinsically disordered regions (IDRs) located at the N-terminal part delimiting the internal AlkB-like domain and at the C-terminal part. We identified an RNA binding domain containing an RGxxxRGG motif that overlaps with the C-terminal IDR. Moreover, bimolecular fluorescent experiments allowed us to determine that residues located between 387 and 427 are critical for the interaction with the AMV CP, which should be critical for modulating the viral infection process. Finally, we observed that atALKBH9B deletions of either N-terminal 20 residues or the C-terminal's last 40 amino acids impede their accumulation in siRNA bodies. The involvement of the regions responsible for RNA and viral CP binding and those required for its localization in stress granules in the viral cycle is discussed.
Collapse
|
6
|
Dettori LG, Torrejon D, Chakraborty A, Dutta A, Mohamed M, Papp C, Kuznetsov VA, Sung P, Feng W, Bah A. A Tale of Loops and Tails: The Role of Intrinsically Disordered Protein Regions in R-Loop Recognition and Phase Separation. Front Mol Biosci 2021; 8:691694. [PMID: 34179096 PMCID: PMC8222781 DOI: 10.3389/fmolb.2021.691694] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 05/14/2021] [Indexed: 11/13/2022] Open
Abstract
R-loops are non-canonical, three-stranded nucleic acid structures composed of a DNA:RNA hybrid, a displaced single-stranded (ss)DNA, and a trailing ssRNA overhang. R-loops perform critical biological functions under both normal and disease conditions. To elucidate their cellular functions, we need to understand the mechanisms underlying R-loop formation, recognition, signaling, and resolution. Previous high-throughput screens identified multiple proteins that bind R-loops, with many of these proteins containing folded nucleic acid processing and binding domains that prevent (e.g., topoisomerases), resolve (e.g., helicases, nucleases), or recognize (e.g., KH, RRMs) R-loops. However, a significant number of these R-loop interacting Enzyme and Reader proteins also contain long stretches of intrinsically disordered regions (IDRs). The precise molecular and structural mechanisms by which the folded domains and IDRs synergize to recognize and process R-loops or modulate R-loop-mediated signaling have not been fully explored. While studying one such modular R-loop Reader, the Fragile X Protein (FMRP), we unexpectedly discovered that the C-terminal IDR (C-IDR) of FMRP is the predominant R-loop binding site, with the three N-terminal KH domains recognizing the trailing ssRNA overhang. Interestingly, the C-IDR of FMRP has recently been shown to undergo spontaneous Liquid-Liquid Phase Separation (LLPS) assembly by itself or in complex with another non-canonical nucleic acid structure, RNA G-quadruplex. Furthermore, we have recently shown that FMRP can suppress persistent R-loops that form during transcription, a process that is also enhanced by LLPS via the assembly of membraneless transcription factories. These exciting findings prompted us to explore the role of IDRs in R-loop processing and signaling proteins through a comprehensive bioinformatics and computational biology study. Here, we evaluated IDR prevalence, sequence composition and LLPS propensity for the known R-loop interactome. We observed that, like FMRP, the majority of the R-loop interactome, especially Readers, contains long IDRs that are highly enriched in low complexity sequences with biased amino acid composition, suggesting that these IDRs could directly interact with R-loops, rather than being “mere flexible linkers” connecting the “functional folded enzyme or binding domains”. Furthermore, our analysis shows that several proteins in the R-loop interactome are either predicted to or have been experimentally demonstrated to undergo LLPS or are known to be associated with phase separated membraneless organelles. Thus, our overall results present a thought-provoking hypothesis that IDRs in the R-loop interactome can provide a functional link between R-loop recognition via direct binding and downstream signaling through the assembly of LLPS-mediated membrane-less R-loop foci. The absence or dysregulation of the function of IDR-enriched R-loop interactors can potentially lead to severe genomic defects, such as the widespread R-loop-mediated DNA double strand breaks that we recently observed in Fragile X patient-derived cells.
Collapse
Affiliation(s)
- Leonardo G Dettori
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Diego Torrejon
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Arijita Chakraborty
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Arijit Dutta
- Department of Biochemistry and Structural Biology, University of Texas Health San Antonio, San Antonio, TX, United States
| | - Mohamed Mohamed
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Csaba Papp
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States.,Department of Urology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Vladimir A Kuznetsov
- Department of Urology, SUNY Upstate Medical University, Syracuse, NY, United States.,Bioinformatics Institute, ASTAR Biomedical Institutes, Singapore, Singapore
| | - Patrick Sung
- Department of Biochemistry and Structural Biology, University of Texas Health San Antonio, San Antonio, TX, United States
| | - Wenyi Feng
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Alaji Bah
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| |
Collapse
|
7
|
Yang C, Ding Y, Meng Q, Tang J, Guo F. Granular multiple kernel learning for identifying RNA-binding protein residues via integrating sequence and structure information. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05573-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
8
|
Zhang J, Chen Q, Liu B. NCBRPred: predicting nucleic acid binding residues in proteins based on multilabel learning. Brief Bioinform 2021; 22:6102667. [PMID: 33454744 DOI: 10.1093/bib/bbaa397] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 11/05/2020] [Accepted: 12/03/2020] [Indexed: 01/01/2023] Open
Abstract
The interactions between proteins and nucleic acid sequences play many important roles in gene expression and some cellular activities. Accurate prediction of the nucleic acid binding residues in proteins will facilitate the research of the protein functions, gene expression, drug design, etc. In this regard, several computational methods have been proposed to predict the nucleic acid binding residues in proteins. However, these methods cannot satisfactorily measure the global interactions among the residues along protein. Furthermore, these methods are suffering cross-prediction problem, new strategies should be explored to solve this problem. In this study, a new computational method called NCBRPred was proposed to predict the nucleic acid binding residues based on the multilabel sequence labeling model. NCBRPred used the bidirectional Gated Recurrent Units (BiGRUs) to capture the global interactions among the residues, and treats this task as a multilabel learning task. Experimental results on three widely used benchmark datasets and an independent dataset showed that NCBRPred achieved higher predictive results with lower cross-prediction, outperforming 10 existing state-of-the-art predictors. The web-server and a stand-alone package of NCBRPred are freely available at http://bliulab.net/NCBRPred. It is anticipated that NCBRPred will become a very useful tool for identifying nucleic acid binding residues.
Collapse
Affiliation(s)
- Jun Zhang
- Computer Science and Technology with Harbin Institute of Technology, Shenzhen, China
| | - Qingcai Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| |
Collapse
|
9
|
Bartas M, Červeň J, Guziurová S, Slychko K, Pečinka P. Amino Acid Composition in Various Types of Nucleic Acid-Binding Proteins. Int J Mol Sci 2021; 22:ijms22020922. [PMID: 33477647 PMCID: PMC7831508 DOI: 10.3390/ijms22020922] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 01/15/2021] [Accepted: 01/16/2021] [Indexed: 12/20/2022] Open
Abstract
Nucleic acid-binding proteins are traditionally divided into two categories: With the ability to bind DNA or RNA. In the light of new knowledge, such categorizing should be overcome because a large proportion of proteins can bind both DNA and RNA. Another even more important features of nucleic acid-binding proteins are so-called sequence or structure specificities. Proteins able to bind nucleic acids in a sequence-specific manner usually contain one or more of the well-defined structural motifs (zinc-fingers, leucine zipper, helix-turn-helix, or helix-loop-helix). In contrast, many proteins do not recognize nucleic acid sequence but rather local DNA or RNA structures (G-quadruplexes, i-motifs, triplexes, cruciforms, left-handed DNA/RNA form, and others). Finally, there are also proteins recognizing both sequence and local structural properties of nucleic acids (e.g., famous tumor suppressor p53). In this mini-review, we aim to summarize current knowledge about the amino acid composition of various types of nucleic acid-binding proteins with a special focus on significant enrichment and/or depletion in each category.
Collapse
|
10
|
Hou L, Wei Y, Lin Y, Wang X, Lai Y, Yin M, Chen Y, Guo X, Wu S, Zhu Y, Yuan J, Tariq M, Li N, Sun H, Wang H, Zhang X, Chen J, Bao X, Jauch R. Concurrent binding to DNA and RNA facilitates the pluripotency reprogramming activity of Sox2. Nucleic Acids Res 2020; 48:3869-3887. [PMID: 32016422 PMCID: PMC7144947 DOI: 10.1093/nar/gkaa067] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 01/16/2020] [Accepted: 01/22/2020] [Indexed: 02/03/2023] Open
Abstract
Some transcription factors that specifically bind double-stranded DNA appear to also function as RNA-binding proteins. Here, we demonstrate that the transcription factor Sox2 is able to directly bind RNA in vitro as well as in mouse and human cells. Sox2 targets RNA via a 60-amino-acid RNA binding motif (RBM) positioned C-terminally of the DNA binding high mobility group (HMG) box. Sox2 can associate with RNA and DNA simultaneously to form ternary RNA/Sox2/DNA complexes. Deletion of the RBM does not affect selection of target genes but mitigates binding to pluripotency related transcripts, switches exon usage and impairs the reprogramming of somatic cells to a pluripotent state. Our findings designate Sox2 as a multi-functional factor that associates with RNA whilst binding to cognate DNA sequences, suggesting that it may co-transcriptionally regulate RNA metabolism during somatic cell reprogramming.
Collapse
Affiliation(s)
- Linlin Hou
- Department of Biochemistry, Molecular Cancer Research Center, School of Medicine, Sun Yat-Sen University, Guangzhou/Shenzhen, China.,CAS Key Laboratory of Regenerative Biology, Joint School of Life Sciences, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences and Guangzhou Medical University, Guangzhou 511436, China.,Genome Regulation Laboratory, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Yuanjie Wei
- Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Yingying Lin
- Department of Biochemistry, Molecular Cancer Research Center, School of Medicine, Sun Yat-Sen University, Guangzhou/Shenzhen, China.,Laboratory of RNA Molecular Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Xiwei Wang
- Laboratory of RNA Molecular Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Yiwei Lai
- CAS Key Laboratory of Regenerative Biology, Joint School of Life Sciences, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences and Guangzhou Medical University, Guangzhou 511436, China.,Laboratory of RNA, Chromatin, and Human Disease, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Menghui Yin
- CAS Key Laboratory of Regenerative Biology, Joint School of Life Sciences, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences and Guangzhou Medical University, Guangzhou 511436, China
| | - Yanpu Chen
- CAS Key Laboratory of Regenerative Biology, Joint School of Life Sciences, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences and Guangzhou Medical University, Guangzhou 511436, China.,Genome Regulation Laboratory, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Max Planck Institute for Heart and Lung Research, 61231 Bad Nauheim, Germany
| | - Xiangpeng Guo
- CAS Key Laboratory of Regenerative Biology, Joint School of Life Sciences, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences and Guangzhou Medical University, Guangzhou 511436, China.,Laboratory of RNA, Chromatin, and Human Disease, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Senbin Wu
- Laboratory of RNA Molecular Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | | | - Jie Yuan
- Department of Chemical Pathology, Li Ka Shing Institute of Health Sciences, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong, China
| | - Muqddas Tariq
- CAS Key Laboratory of Regenerative Biology, Joint School of Life Sciences, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences and Guangzhou Medical University, Guangzhou 511436, China.,Laboratory of RNA, Chromatin, and Human Disease, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Na Li
- CAS Key Laboratory of Regenerative Biology, Joint School of Life Sciences, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences and Guangzhou Medical University, Guangzhou 511436, China.,Laboratory of RNA, Chromatin, and Human Disease, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Hao Sun
- Department of Chemical Pathology, Li Ka Shing Institute of Health Sciences, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong, China
| | - Huating Wang
- Department of Orthopaedics and Traumatology, Li Ka Shing Institute of Health Sciences, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong, China
| | - Xiaofei Zhang
- Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,CAS Key Laboratory of Regenerative Biology, Hefei Institute of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Jiekai Chen
- CAS Key Laboratory of Regenerative Biology, Joint School of Life Sciences, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences and Guangzhou Medical University, Guangzhou 511436, China.,Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Xichen Bao
- CAS Key Laboratory of Regenerative Biology, Joint School of Life Sciences, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences and Guangzhou Medical University, Guangzhou 511436, China.,Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Laboratory of RNA Molecular Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Ralf Jauch
- CAS Key Laboratory of Regenerative Biology, Joint School of Life Sciences, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences and Guangzhou Medical University, Guangzhou 511436, China.,Genome Regulation Laboratory, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
11
|
Wang W, Langlois R, Langlois M, Genchev GZ, Wang X, Lu H. Functional Site Discovery From Incomplete Training Data: A Case Study With Nucleic Acid-Binding Proteins. Front Genet 2019; 10:729. [PMID: 31543893 PMCID: PMC6729729 DOI: 10.3389/fgene.2019.00729] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 07/11/2019] [Indexed: 12/27/2022] Open
Abstract
Function annotation efforts provide a foundation to our understanding of cellular processes and the functioning of the living cell. This motivates high-throughput computational methods to characterize new protein members of a particular function. Research work has focused on discriminative machine-learning methods, which promise to make efficient, de novo predictions of protein function. Furthermore, available function annotation exists predominantly for individual proteins rather than residues of which only a subset is necessary for the conveyance of a particular function. This limits discriminative approaches to predicting functions for which there is sufficient residue-level annotation, e.g., identification of DNA-binding proteins or where an excellent global representation can be divined. Complete understanding of the various functions of proteins requires discovery and functional annotation at the residue level. Herein, we cast this problem into the setting of multiple-instance learning, which only requires knowledge of the protein’s function yet identifies functionally relevant residues and need not rely on homology. We developed a new multiple-instance leaning algorithm derived from AdaBoost and benchmarked this algorithm against two well-studied protein function prediction tasks: annotating proteins that bind DNA and RNA. This algorithm outperforms certain previous approaches in annotating protein function while identifying functionally relevant residues involved in binding both DNA and RNA, and on one protein-DNA benchmark, it achieves near perfect classification.
Collapse
Affiliation(s)
- Wenchuan Wang
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, College of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, Chinas
| | - Robert Langlois
- Department of Bioengineering and Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States
| | - Marina Langlois
- Department of Bioengineering and Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States
| | - Georgi Z Genchev
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, College of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, Chinas.,Department of Bioengineering and Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States.,Bulgarian Institute for Genomics and Precision Medicine, Sofia, Bulgaria
| | - Xiaolei Wang
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, College of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, Chinas.,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Hui Lu
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, College of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, Chinas.,Department of Bioengineering and Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States.,Center for Biomedical Informatics, Shanghai Children's Hospital, Shanghai, China
| |
Collapse
|
12
|
Chen J, Kuhn LA. Deciphering the three-domain architecture in schlafens and the structures and roles of human schlafen12 and serpinB12 in transcriptional regulation. J Mol Graph Model 2019; 90:59-76. [PMID: 31026779 PMCID: PMC6657700 DOI: 10.1016/j.jmgm.2019.04.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Revised: 04/03/2019] [Accepted: 04/05/2019] [Indexed: 12/22/2022]
Abstract
Schlafen proteins are important in cell differentiation and defense against viruses, and yet this family of vertebrate proteins is just beginning to be understood at the molecular level. Here, the three-dimensional architecture and molecular interfaces of human schlafen12 (hSLFN12), which promotes intestinal stem cell differentiation, are analyzed by sequence conservation and structural modeling in light of the functions of its homologs and binding partners. Our analysis shows that the schlafen or divergent AAA ATPase domain described in the N-terminal region of schlafens in databases and the literature is a misannotation. This N-terminal region is conclusively an AlbA_2 DNA/RNA binding domain, forming the conserved core of schlafens and their sequence homologs from bacteria through mammals. Group III schlafens additionally contain a AAA NTPase domain in their C-terminal helicase region. In hSLFN12, we have uncovered a domain matching rho GTPases, which directly follows the AlbA_2 domain in all group II-III schlafens. Potential roles for the GTPase-like domain include antiviral activity and cytoskeletal interactions that contribute to nucleocytoplasmic shuttling and cell polarization during differentiation. Based on features conserved with rSlfn13, the AlbA_2 region in hSLFN12 is likely to bind RNA, possibly as a ribonuclease. We hypothesize that RNA binding by hSLFN12 contributes to an RNA-induced transcriptional silencing/E3 ligase complex, given the functions of hSLFN12's partners, SUV39H1, JMJD6, and PDLIM7. hSLFN12's partner hSerpinB12 may contribute to heterochromatin formation, based on its homology to MENT, or directly regulate transcription via its binding to RNA polymerase II. The analysis presented here provides clear architectural and transcriptional regulation hypotheses to guide experimental design for hSLFN12 and the thousands of schlafens that share its motifs.
Collapse
Affiliation(s)
- Jiaxing Chen
- Protein Structural Analysis and Design Lab, Department of Biochemistry and Molecular Biology, Michigan State University, 603 Wilson Road, East Lansing, MI, 48824-1319, USA
| | - Leslie A Kuhn
- Protein Structural Analysis and Design Lab, Department of Biochemistry and Molecular Biology, Michigan State University, 603 Wilson Road, East Lansing, MI, 48824-1319, USA.
| |
Collapse
|
13
|
Ma X, Guo J, Sun X. Prediction of microRNA-binding residues in protein using a Laplacian support vector machine based on sequence information. J Bioinform Comput Biol 2018; 16:1840009. [DOI: 10.1142/s0219720018400097] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The identification of microRNA (miRNA)-binding protein residues significantly impacts several research areas, including gene regulation and expression. We propose a method, PmiRBR, which combines a novel hybrid feature with the Laplacian support vector machine (LapSVM) algorithm to predict miRNA-binding residues in protein sequences. The hybrid feature is constituted by secondary structure, conservation scores, and a novel feature, which includes evolutionary information combined with the physicochemical properties of amino acids. Performance comparisons of the various features indicate that our novel feature contributes the most to prediction improvement. Our results demonstrate that PmiRBR can achieve 85.96% overall accuracy, with 43.89% sensitivity and 90.56% specificity. PmiRBR significantly outperforms other approaches at miRNA-binding residue prediction.
Collapse
Affiliation(s)
- Xin Ma
- School of Science, Nanjing Audit University, Nanjing 211815, P. R. China
| | - Jing Guo
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
| |
Collapse
|
14
|
Chowdhury S, Zhang J, Kurgan L. In Silico Prediction and Validation of Novel RNA Binding Proteins and Residues in the Human Proteome. Proteomics 2018; 18:e1800064. [PMID: 29806170 DOI: 10.1002/pmic.201800064] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Revised: 05/05/2018] [Indexed: 12/22/2022]
Abstract
Deciphering a complete landscape of protein-RNA interactions in the human proteome remains an elusive challenge. We computationally elucidate RNA binding proteins (RBPs) using an approach that complements previous efforts. We employ two modern complementary sequence-based methods that provide accurate predictions from the structured and the intrinsically disordered sequences, even in the absence of sequence similarity to the known RBPs. We generate and analyze putative RNA binding residues on the whole proteome scale. Using a conservative setting that ensures low, 5% false positive rate, we identify 1511 putative RBPs that include 281 known RBPs and 166 RBPs that were previously predicted. We empirically demonstrate that these overlaps are statistically significant. We also validate the putative RBPs based on two major hallmarks of their RNA binding residues: high levels of evolutionary conservation and enrichment in charged amino acids. Moreover, we show that the novel RBPs are significantly under-annotated functionally which coincides with the fact that they were not yet found to interact with RNAs. We provide two examples of our novel putative RBPs for which there is recent evidence of their interactions with RNAs. The dataset of novel putative RBPs and RNA binding residues for the future hypothesis generation is provided in the Supporting Information.
Collapse
Affiliation(s)
- Shomeek Chowdhury
- Dr. Vikram Sarabhai Institute of Cell and Molecular Biology, Maharaja Sayajirao University of Baroda, Gujarat, 390005, India.,Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - Jian Zhang
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA.,School of Computer and Information Technology, Xinyang Normal University, Xinyang, 464000, P. R. China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
15
|
Shen WJ, Cui W, Chen D, Zhang J, Xu J. RPiRLS: Quantitative Predictions of RNA Interacting with Any Protein of Known Sequence. Molecules 2018; 23:molecules23030540. [PMID: 29495575 PMCID: PMC6017498 DOI: 10.3390/molecules23030540] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Revised: 02/24/2018] [Accepted: 02/25/2018] [Indexed: 02/05/2023] Open
Abstract
RNA-protein interactions (RPIs) have critical roles in numerous fundamental biological processes, such as post-transcriptional gene regulation, viral assembly, cellular defence and protein synthesis. As the number of available RNA-protein binding experimental data has increased rapidly due to high-throughput sequencing methods, it is now possible to measure and understand RNA-protein interactions by computational methods. In this study, we integrate a sequence-based derived kernel with regularized least squares to perform prediction. The derived kernel exploits the contextual information around an amino acid or a nucleic acid as well as the repetitive conserved motif information. We propose a novel machine learning method, called RPiRLS to predict the interaction between any RNA and protein of known sequences. For the RPiRLS classifier, each protein sequence comprises up to 20 diverse amino acids but for the RPiRLS-7G classifier, each protein sequence is represented by using 7-letter reduced alphabets based on their physiochemical properties. We evaluated both methods on a number of benchmark data sets and compared their performances with two newly developed and state-of-the-art methods, RPI-Pred and IPMiner. On the non-redundant benchmark test sets extracted from the PRIDB, the RPiRLS method outperformed RPI-Pred and IPMiner in terms of accuracy, specificity and sensitivity. Further, RPiRLS achieved an accuracy of 92% on the prediction of lncRNA-protein interactions. The proposed method can also be extended to construct RNA-protein interaction networks. The RPiRLS web server is freely available at http://bmc.med.stu.edu.cn/RPiRLS.
Collapse
Affiliation(s)
- Wen-Jun Shen
- Department of Bioinformatics, Shantou University Medical College, Shantou 515000, Guangdong, China.
| | - Wenjuan Cui
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China.
| | - Danze Chen
- Department of Bioinformatics, Shantou University Medical College, Shantou 515000, Guangdong, China.
| | - Jieming Zhang
- Department of Bioinformatics, Shantou University Medical College, Shantou 515000, Guangdong, China.
| | - Jianzhen Xu
- Department of Bioinformatics, Shantou University Medical College, Shantou 515000, Guangdong, China.
| |
Collapse
|
16
|
Tang Y, Liu D, Wang Z, Wen T, Deng L. A boosting approach for prediction of protein-RNA binding residues. BMC Bioinformatics 2017; 18:465. [PMID: 29219069 PMCID: PMC5773889 DOI: 10.1186/s12859-017-1879-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Background RNA binding proteins play important roles in post-transcriptional RNA processing and transcriptional regulation. Distinguishing the RNA-binding residues in proteins is crucial for understanding how protein and RNA recognize each other and function together as a complex. Results We propose PredRBR, an effectively computational approach to predict RNA-binding residues. PredRBR is built with gradient tree boosting and an optimal feature set selected from a large number of sequence and structure characteristics and two categories of structural neighborhood properties. In cross-validation experiments on the RBP170 data set show that PredRBR achieves an overall accuracy of 0.84, a sensitivity of 0.85, MCC of 0.55 and AUC of 0.92, which are significantly better than that of other widely used machine learning algorithms such as Support Vector Machine, Random Forest, and Adaboost. We further calculate the feature importance of different feature categories and find that structural neighborhood characteristics are critical in the recognization of RNA binding residues. Also, PredRBR yields significantly better prediction accuracy on an independent test set (RBP101) in comparison with other state-of-the-art methods. Conclusions The superior performance over existing RNA-binding residue prediction methods indicates the importance of the gradient tree boosting algorithm combined with the optimal selected features.
Collapse
Affiliation(s)
- Yongjun Tang
- Department of Clinical Pharmacology, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, 410008, China.,Institute of Clinical Pharmacology, Hunan Key Laboratory of Pharmacogenetics, Central South University, 87 Xiangya Road, Changsha, 410008, China.,Department of Pediatrics, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, 410008, China
| | - Diwei Liu
- School of Software, Central South University, No.22 Shaoshan South Road, Changsha, 410075, China
| | - Zixiang Wang
- School of Software, Central South University, No.22 Shaoshan South Road, Changsha, 410075, China
| | - Ting Wen
- School of Software, Central South University, No.22 Shaoshan South Road, Changsha, 410075, China
| | - Lei Deng
- School of Software, Central South University, No.22 Shaoshan South Road, Changsha, 410075, China.
| |
Collapse
|
17
|
Yan J, Kurgan L. DRNApred, fast sequence-based method that accurately predicts and discriminates DNA- and RNA-binding residues. Nucleic Acids Res 2017; 45:e84. [PMID: 28132027 PMCID: PMC5449545 DOI: 10.1093/nar/gkx059] [Citation(s) in RCA: 72] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 01/24/2017] [Indexed: 01/18/2023] Open
Abstract
Protein-DNA and protein-RNA interactions are part of many diverse and essential cellular functions and yet most of them remain to be discovered and characterized. Recent research shows that sequence-based predictors of DNA-binding residues accurately find these residues but also cross-predict many RNA-binding residues as DNA-binding, and vice versa. Most of these methods are also relatively slow, prohibiting applications on the whole-genome scale. We describe a novel sequence-based method, DRNApred, which accurately and in high-throughput predicts and discriminates between DNA- and RNA-binding residues. DRNApred was designed using a new dataset with both DNA- and RNA-binding proteins, regression that penalizes cross-predictions, and a novel two-layered architecture. DRNApred outperforms state-of-the-art predictors of DNA- or RNA-binding residues on a benchmark test dataset by substantially reducing the cross predictions and predicting arguably higher quality false positives that are located nearby the native binding residues. Moreover, it also more accurately predicts the DNA- and RNA-binding proteins. Application on the human proteome confirms that DRNApred reduces the cross predictions among the native nucleic acid binders. Also, novel putative DNA/RNA-binding proteins that it predicts share similar subcellular locations and residue charge profiles with the known native binding proteins. Webserver of DRNApred is freely available at http://biomine.cs.vcu.edu/servers/DRNApred/.
Collapse
Affiliation(s)
- Jing Yan
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, 23284, USA
| |
Collapse
|
18
|
Zhou J, Lu Q, Xu R, He Y, Wang H. EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation. BMC Bioinformatics 2017; 18:379. [PMID: 28851273 PMCID: PMC5576297 DOI: 10.1186/s12859-017-1792-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 08/15/2017] [Indexed: 11/23/2022] Open
Abstract
Background Prediction of DNA-binding residue is important for understanding the protein-DNA recognition mechanism. Many computational methods have been proposed for the prediction, but most of them do not consider the relationships of evolutionary information between residues. Results In this paper, we first propose a novel residue encoding method, referred to as the Position Specific Score Matrix (PSSM) Relation Transformation (PSSM-RT), to encode residues by utilizing the relationships of evolutionary information between residues. PDNA-62 and PDNA-224 are used to evaluate PSSM-RT and two existing PSSM encoding methods by five-fold cross-validation. Performance evaluations indicate that PSSM-RT is more effective than previous methods. This validates the point that the relationship of evolutionary information between residues is indeed useful in DNA-binding residue prediction. An ensemble learning classifier (EL_PSSM-RT) is also proposed by combining ensemble learning model and PSSM-RT to better handle the imbalance between binding and non-binding residues in datasets. EL_PSSM-RT is evaluated by five-fold cross-validation using PDNA-62 and PDNA-224 as well as two independent datasets TS-72 and TS-61. Performance comparisons with existing predictors on the four datasets demonstrate that EL_PSSM-RT is the best-performing method among all the predicting methods with improvement between 0.02–0.07 for MCC, 4.18–21.47% for ST and 0.013–0.131 for AUC. Furthermore, we analyze the importance of the pair-relationships extracted by PSSM-RT and the results validates the usefulness of PSSM-RT for encoding DNA-binding residues. Conclusions We propose a novel prediction method for the prediction of DNA-binding residue with the inclusion of relationship of evolutionary information and ensemble learning. Performance evaluation shows that the relationship of evolutionary information between residues is indeed useful in DNA-binding residue prediction and ensemble learning can be used to address the data imbalance issue between binding and non-binding residues. A web service of EL_PSSM-RT (http://hlt.hitsz.edu.cn:8080/PSSM-RT_SVM/) is provided for free access to the biological research community. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1792-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jiyun Zhou
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, Guangdong, 518055, China.,Department of Computing, the Hong Kong Polytechnic University, Kowloon, Hong Kong
| | - Qin Lu
- Department of Computing, the Hong Kong Polytechnic University, Kowloon, Hong Kong
| | - Ruifeng Xu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, Guangdong, 518055, China. .,Shenzhen Engineering Laboratory of Performance Robots at Digital Stage, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China.
| | - Yulan He
- School of Engineering and Applied Science, Aston University, Birmingham, UK
| | - Hongpeng Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, Guangdong, 518055, China
| |
Collapse
|
19
|
Cheng Z, Huang K, Wang Y, Liu H, Guan J, Zhou S. Selecting high-quality negative samples for effectively predicting protein-RNA interactions. BMC SYSTEMS BIOLOGY 2017; 11:9. [PMID: 28361676 PMCID: PMC5374704 DOI: 10.1186/s12918-017-0390-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Background The identification of Protein-RNA Interactions (PRIs) is important to understanding cell activities. Recently, several machine learning-based methods have been developed for identifying PRIs. However, the performance of these methods is unsatisfactory. One major reason is that they usually use unreliable negative samples in the training process. Methods For boosting the performance of PRI prediction, we propose a novel method to generate reliable negative samples. Concretely, we firstly collect the known PRIs as positive samples for generating positive sets. For each positive set, we construct two corresponding negative sets, one is by our method and the other by random method. Each positive set is combined with a negative set to form a dataset for model training and performance evaluation. Consequently, we get 18 datasets of different species and different ratios of negative samples to positive samples. Secondly, sequence-based features are extracted to represent each of PRIs and protein-RNA pairs in the datasets. A filter-based method is employed to cut down the dimensionality of feature vectors for reducing computational cost. Finally, the performance of support vector machine (SVM), random forest (RF) and naive Bayes (NB) is evaluated on the generated 18 datasets. Results Extensive experiments show that comparing to using randomly-generated negative samples, all classifiers achieve substantial performance improvement by using negative samples selected by our method. The improvements on accuracy and geometric mean for the SVM classifier, the RF classifier and the NB classifier are as high as 204.5 and 68.7%, 174.5 and 53.9%, 80.9 and 54.3%, respectively. Conclusion Our method is useful to the identification of PRIs.
Collapse
Affiliation(s)
- Zhanzhan Cheng
- School of Computer Science, Fudan University, Handan Road, Shanghai, 200433, China
| | - Kai Huang
- School of Computer Science, Fudan University, Handan Road, Shanghai, 200433, China
| | - Yang Wang
- School of Computer Science, Jiangxi Normal University, Nanchang, 330022, China
| | - Hui Liu
- The Bioinformatics Lab at Changzhou NO. 7 People's Hospital, Changzhou, Jiangsu, 213011, China.,Lab of Information Management, Changzhou University, Changzhou, 213164, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai, 201804, China
| | - Shuigeng Zhou
- School of Computer Science, Fudan University, Handan Road, Shanghai, 200433, China. .,The Bioinformatics Lab at Changzhou NO. 7 People's Hospital, Changzhou, Jiangsu, 213011, China.
| |
Collapse
|
20
|
Using 3dRPC for RNA-protein complex structure prediction. BIOPHYSICS REPORTS 2017; 2:95-99. [PMID: 28317012 PMCID: PMC5334405 DOI: 10.1007/s41048-017-0034-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 01/05/2017] [Indexed: 02/07/2023] Open
Abstract
3dRPC is a computational method designed for three-dimensional RNA–protein complex structure prediction. Starting from a protein structure and a RNA structure, 3dRPC first generates presumptive complex structures by RPDOCK and then evaluates the structures by RPRANK. RPDOCK is an FFT-based docking algorithm that takes features of RNA–protein interactions into consideration, and RPRANK is a knowledge-based potential using root mean square deviation as a measure. Here we give a detailed description of the usage of 3dRPC. The source code is available at http://biophy.hust.edu.cn/3dRPC.html.
Collapse
|
21
|
Walia RR, El-Manzalawy Y, Honavar VG, Dobbs D. Sequence-Based Prediction of RNA-Binding Residues in Proteins. Methods Mol Biol 2017; 1484:205-235. [PMID: 27787829 PMCID: PMC5796408 DOI: 10.1007/978-1-4939-6406-2_15] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Identifying individual residues in the interfaces of protein-RNA complexes is important for understanding the molecular determinants of protein-RNA recognition and has many potential applications. Recent technical advances have led to several high-throughput experimental methods for identifying partners in protein-RNA complexes, but determining RNA-binding residues in proteins is still expensive and time-consuming. This chapter focuses on available computational methods for identifying which amino acids in an RNA-binding protein participate directly in contacting RNA. Step-by-step protocols for using three different web-based servers to predict RNA-binding residues are described. In addition, currently available web servers and software tools for predicting RNA-binding sites, as well as databases that contain valuable information about known protein-RNA complexes, RNA-binding motifs in proteins, and protein-binding recognition sites in RNA are provided. We emphasize sequence-based methods that can reliably identify interfacial residues without the requirement for structural information regarding either the RNA-binding protein or its RNA partner.
Collapse
Affiliation(s)
| | - Yasser El-Manzalawy
- College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Vasant G Honavar
- College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Drena Dobbs
- Genetics, Development and Cell Biology Department, Iowa State University, 3112 Molecular Biology Building, Ames, IA, 50011-3650, USA.
| |
Collapse
|
22
|
Abstract
Protein–RNA interactions play important roles in the biological systems. Searching for regular patterns in the Protein–RNA binding interfaces is important for understanding how protein and RNA recognize each other and bind to form a complex. Herein, we present a graph-mining method for discovering biological patterns in the protein–RNA interfaces. We represented known protein–RNA interfaces using graphs and then discovered graph patterns enriched in the interfaces. Comparison of the discovered graph patterns with UniProt annotations showed that the graph patterns had a significant overlap with residue sites that had been proven crucial for the RNA binding by experimental methods. Using 200 patterns as input features, a support vector machine method was able to classify protein surface patches into RNA-binding sites and non-RNA-binding sites with 84.0% accuracy and 88.9% precision. We built a simple scoring function that calculated the total number of the graph patterns that occurred in a protein–RNA interface. That scoring function was able to discriminate near-native protein–RNA complexes from docking decoys with a performance comparable with that of a state-of-the-art complex scoring function. Our work also revealed possible patterns that might be important for binding affinity.
Collapse
Affiliation(s)
- Wen Cheng
- Department of Computer Science, North Dakota State University , Fargo, North Dakota
| | - Changhui Yan
- Department of Computer Science, North Dakota State University , Fargo, North Dakota
| |
Collapse
|
23
|
|
24
|
Zhou J, Xu R, He Y, Lu Q, Wang H, Kong B. PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context. Sci Rep 2016; 6:27653. [PMID: 27282833 PMCID: PMC4901350 DOI: 10.1038/srep27653] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 05/18/2016] [Indexed: 02/01/2023] Open
Abstract
Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community.
Collapse
Affiliation(s)
- Jiyun Zhou
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.,Department of Computing, the Hong Kong Polytechnic University, Hong Kong
| | - Ruifeng Xu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.,Shenzhen Engineering Laboratory of Performance Robots at Digital Stage, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China
| | - Yulan He
- School of Engineering and Applied Science, Aston University, UK
| | - Qin Lu
- Department of Computing, the Hong Kong Polytechnic University, Hong Kong
| | - Hongpeng Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Bing Kong
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| |
Collapse
|
25
|
Sun M, Wang X, Zou C, He Z, Liu W, Li H. Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors. BMC Bioinformatics 2016; 17:231. [PMID: 27266516 PMCID: PMC4897909 DOI: 10.1186/s12859-016-1110-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 06/02/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers. RESULTS In this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631. CONCLUSIONS The good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind .
Collapse
Affiliation(s)
- Meijian Sun
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Xia Wang
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Chuanxin Zou
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Zenghui He
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Wei Liu
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Honglin Li
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China.
| |
Collapse
|
26
|
Wang C, Uversky VN, Kurgan L. Disordered nucleiome: Abundance of intrinsic disorder in the DNA- and RNA-binding proteins in 1121 species from Eukaryota, Bacteria and Archaea. Proteomics 2016; 16:1486-98. [DOI: 10.1002/pmic.201500177] [Citation(s) in RCA: 70] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Revised: 02/26/2016] [Accepted: 03/29/2016] [Indexed: 12/12/2022]
Affiliation(s)
- Chen Wang
- Department of Computer Science; Virginia Commonwealth University; Richmond VA USA
- Department of Electrical and Computer Engineering; University of Alberta; Edmonton Canada
| | - Vladimir N. Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute; Morsani College of Medicine; University of South Florida; Tampa FL USA
- Institute for Biological Instrumentation; Russian Academy of Sciences; Pushchino Moscow Region Russian Federation
- Department of Biology; Faculty of Science; King Abdulaziz University; Jeddah Kingdom of Saudi Arabia
| | - Lukasz Kurgan
- Department of Computer Science; Virginia Commonwealth University; Richmond VA USA
- Department of Electrical and Computer Engineering; University of Alberta; Edmonton Canada
| |
Collapse
|
27
|
Abstract
Protein-RNA interactions play important roles in a wide variety of cellular processes, ranging from transcriptional and posttranscriptional regulation of genes to host defense against pathogens. In this chapter we present the computational approach catRAPID to predict protein-RNA interactions and discuss how it could be used to find trends in ribonucleoprotein networks. We envisage that the combination of computational and experimental approaches will be crucial to unravel the role of coding and noncoding RNAs in protein networks.
Collapse
|
28
|
Klus P, Ponti RD, Livi CM, Tartaglia GG. Protein aggregation, structural disorder and RNA-binding ability: a new approach for physico-chemical and gene ontology classification of multiple datasets. BMC Genomics 2015; 16:1071. [PMID: 26673865 PMCID: PMC4681139 DOI: 10.1186/s12864-015-2280-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Accepted: 12/08/2015] [Indexed: 01/27/2023] Open
Abstract
Background Comparison between multiple protein datasets requires the choice of an appropriate reference system and a number of variables to describe their differences. Here we introduce an innovative approach to discriminate multiple protein datasets (multiCM) and to measure enrichments in gene ontology terms (cleverGO) using semantic similarities. Results We illustrate the powerfulness of our approach by investigating the links between RNA-binding ability and other protein features, such as structural disorder and aggregation, in S. cerevisiae, C. elegans, M. musculus and H. sapiens. Our results are in striking agreement with available experimental evidence and unravel features that are key to understand the mechanisms regulating cellular homeostasis. Conclusions In an intuitive way, multiCM and cleverGO provide accurate classifications of physico-chemical features and annotations of biological processes, molecular functions and cellular components, which is extremely useful for the discovery and characterization of new trends in protein datasets. The multiCM and cleverGO can be freely accessed on the Web at http://www.tartaglialab.com/cs_multi/submission and http://www.tartaglialab.com/GO_analyser/universal. Each of the pages contains links to the corresponding documentation and tutorial. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2280-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Petr Klus
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), 08003, Barcelona, Spain
| | - Riccardo Delli Ponti
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), 08003, Barcelona, Spain
| | - Carmen Maria Livi
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), 08003, Barcelona, Spain
| | - Gian Gaetano Tartaglia
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain. .,Universitat Pompeu Fabra (UPF), 08003, Barcelona, Spain. .,Institució Catalana de Recerca i Estudis Avançats (ICREA), 23 Passeig Lluís Companys, 08010, Barcelona, Spain.
| |
Collapse
|
29
|
Computational Prediction of RNA-Binding Proteins and Binding Sites. Int J Mol Sci 2015; 16:26303-17. [PMID: 26540053 PMCID: PMC4661811 DOI: 10.3390/ijms161125952] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Revised: 10/20/2015] [Accepted: 10/23/2015] [Indexed: 11/19/2022] Open
Abstract
Proteins and RNA interaction have vital roles in many cellular processes such as protein synthesis, sequence encoding, RNA transfer, and gene regulation at the transcriptional and post-transcriptional levels. Approximately 6%–8% of all proteins are RNA-binding proteins (RBPs). Distinguishing these RBPs or their binding residues is a major aim of structural biology. Previously, a number of experimental methods were developed for the determination of protein–RNA interactions. However, these experimental methods are expensive, time-consuming, and labor-intensive. Alternatively, researchers have developed many computational approaches to predict RBPs and protein–RNA binding sites, by combining various machine learning methods and abundant sequence and/or structural features. There are three kinds of computational approaches, which are prediction from protein sequence, prediction from protein structure, and protein-RNA docking. In this paper, we review all existing studies of predictions of RNA-binding sites and RBPs and complexes, including data sets used in different approaches, sequence and structural features used in several predictors, prediction method classifications, performance comparisons, evaluation methods, and future directions.
Collapse
|
30
|
Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection. BIOMED RESEARCH INTERNATIONAL 2015; 2015:425810. [PMID: 26543860 PMCID: PMC4620426 DOI: 10.1155/2015/425810] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Accepted: 09/21/2015] [Indexed: 11/17/2022]
Abstract
The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.
Collapse
|
31
|
Arginine 112 is involved in HCV translation modulation by NS5A domain I. Biochem Biophys Res Commun 2015; 465:95-100. [DOI: 10.1016/j.bbrc.2015.07.136] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Accepted: 07/28/2015] [Indexed: 01/08/2023]
|
32
|
Ren H, Shen Y. RNA-binding residues prediction using structural features. BMC Bioinformatics 2015; 16:249. [PMID: 26254826 PMCID: PMC4529986 DOI: 10.1186/s12859-015-0691-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2015] [Accepted: 07/31/2015] [Indexed: 01/25/2023] Open
Abstract
Background RNA-protein complexes play an essential role in many biological processes. To explore potential functions of RNA-protein complexes, it’s important to identify RNA-binding residues in proteins. Results In this work, we propose a set of new structural features for RNA-binding residue prediction. A set of template patches are first extracted from RNA-binding interfaces. To construct structural features for a residue, we compare its surrounding patches with each template patch and use the accumulated distances as its structural features. These new features provide sufficient structural information of surrounding surface of a residue and they can be used to measure the structural similarity between the surface surrounding two residues. The new structural features, together with other sequence features, are used to predict RNA-binding residues using ensemble learning technique. Conclusions The experimental results reveal the effectiveness of the proposed structural features. In addition, the clustering results on template patches exhibit distinct structural patterns of RNA-binding sites, although the sequences of template patches in the same cluster are not conserved. We speculate that RNAs may have structure preferences when binding with proteins.
Collapse
Affiliation(s)
- Huizhu Ren
- 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, Key Laboratory of Hormones and Development (Ministry of Health), Metabolic Diseases Hospital & Tianjin Institute of Endocrinology, Tianjin Medical University, Tianjin, 300070, China.
| | - Ying Shen
- School of Software Engineering, Tongji University, Shanghai, 201804, China. .,Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information, Ministry of Education, Nanjing University of Science and Technology, Nanjing, 210094, P.R. China.
| |
Collapse
|
33
|
Kim HH, Lee SJ, Gardiner AS, Perrone-Bizzozero NI, Yoo S. Different motif requirements for the localization zipcode element of β-actin mRNA binding by HuD and ZBP1. Nucleic Acids Res 2015; 43:7432-46. [PMID: 26152301 PMCID: PMC4551932 DOI: 10.1093/nar/gkv699] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Accepted: 06/29/2015] [Indexed: 11/13/2022] Open
Abstract
Interactions of RNA-binding proteins (RBPs) with their target transcripts are essential for regulating gene expression at the posttranscriptional level including mRNA export/localization, stability, and translation. ZBP1 and HuD are RBPs that play pivotal roles in mRNA transport and local translational control in neuronal processes. While HuD possesses three RNA recognition motifs (RRMs), ZBP1 contains two RRMs and four K homology (KH) domains that either increase target specificity or provide a multi-target binding capability. Here we used isolated cis-element sequences of the target mRNA to examine directly protein-RNA interactions in cell-free systems. We found that both ZBP1 and HuD bind the zipcode element in rat β-actin mRNA's 3' UTR. Differences between HuD and ZBP1 were observed in their binding preference to the element. HuD showed a binding preference for U-rich sequence. In contrast, ZBP1 binding to the zipcode RNA depended more on the structural level, as it required the proper spatial organization of a stem-loop that is mainly determined by the U-rich element juxtaposed to the 3' end of a 5'-ACACCC-3' motif. On the basis of this work, we propose that ZBP1 and HuD bind to overlapping sites in the β-actin zipcode, but they recognize different features of this target sequence.
Collapse
Affiliation(s)
- Hak Hee Kim
- Nemours Biomedical Research, Alfred I. duPont Hosp. for Children, Wilmington, DE 19803, USA
| | - Seung Joon Lee
- Department of Biological Sciences, University of South Carolina, Columbia, SC 29208, USA
| | - Amy S Gardiner
- Department of Neuroscience, University of New Mexico School of Medicine, Albuquerque, NM 87131, USA
| | - Nora I Perrone-Bizzozero
- Department of Neuroscience, University of New Mexico School of Medicine, Albuquerque, NM 87131, USA
| | - Soonmoon Yoo
- Nemours Biomedical Research, Alfred I. duPont Hosp. for Children, Wilmington, DE 19803, USA
| |
Collapse
|
34
|
Miao Z, Westhof E. Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score. Nucleic Acids Res 2015; 43:5340-51. [PMID: 25940624 PMCID: PMC4477668 DOI: 10.1093/nar/gkv446] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Revised: 04/23/2015] [Accepted: 04/24/2015] [Indexed: 11/13/2022] Open
Abstract
We describe a general binding score for predicting the nucleic acid binding probability in proteins. The score is directly derived from physicochemical and evolutionary features and integrates a residue neighboring network approach. Our process achieves stable and high accuracies on both DNA- and RNA-binding proteins and illustrates how the main driving forces for nucleic acid binding are common. Because of the effective integration of the synergetic effects of the network of neighboring residues and the fact that the prediction yields a hierarchical scoring on the protein surface, energy funnels for nucleic acid binding appear on protein surfaces, pointing to the dynamic process occurring in the binding of nucleic acids to proteins.
Collapse
Affiliation(s)
- Zhichao Miao
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 15 Rue Descartes, 67000 Strasbourg, France
| | - Eric Westhof
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 15 Rue Descartes, 67000 Strasbourg, France
| |
Collapse
|
35
|
Tuvshinjargal N, Lee W, Park B, Han K. Predicting protein-binding RNA nucleotides with consideration of binding partners. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2015; 120:3-15. [PMID: 25907142 DOI: 10.1016/j.cmpb.2015.03.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2014] [Revised: 03/30/2015] [Accepted: 03/30/2015] [Indexed: 06/04/2023]
Abstract
In recent years several computational methods have been developed to predict RNA-binding sites in protein. Most of these methods do not consider interacting partners of a protein, so they predict the same RNA-binding sites for a given protein sequence even if the protein binds to different RNAs. Unlike the problem of predicting RNA-binding sites in protein, the problem of predicting protein-binding sites in RNA has received little attention mainly because it is much more difficult and shows a lower accuracy on average. In our previous study, we developed a method that predicts protein-binding nucleotides from an RNA sequence. In an effort to improve the prediction accuracy and usefulness of the previous method, we developed a new method that uses both RNA and protein sequence data. In this study, we identified effective features of RNA and protein molecules and developed a new support vector machine (SVM) model to predict protein-binding nucleotides from RNA and protein sequence data. The new model that used both protein and RNA sequence data achieved a sensitivity of 86.5%, a specificity of 86.2%, a positive predictive value (PPV) of 72.6%, a negative predictive value (NPV) of 93.8% and Matthews correlation coefficient (MCC) of 0.69 in a 10-fold cross validation; it achieved a sensitivity of 58.8%, a specificity of 87.4%, a PPV of 65.1%, a NPV of 84.2% and MCC of 0.48 in independent testing. For comparative purpose, we built another prediction model that used RNA sequence data alone and ran it on the same dataset. In a 10 fold-cross validation it achieved a sensitivity of 85.7%, a specificity of 80.5%, a PPV of 67.7%, a NPV of 92.2% and MCC of 0.63; in independent testing it achieved a sensitivity of 67.7%, a specificity of 78.8%, a PPV of 57.6%, a NPV of 85.2% and MCC of 0.45. In both cross-validations and independent testing, the new model that used both RNA and protein sequences showed a better performance than the model that used RNA sequence data alone in most performance measures. To the best of our knowledge, this is the first sequence-based prediction of protein-binding nucleotides in RNA which considers the binding partner of RNA. The new model will provide valuable information for designing biochemical experiments to find putative protein-binding sites in RNA with unknown structure.
Collapse
Affiliation(s)
| | - Wook Lee
- Department of Computer Science and Engineering, Inha University, Incheon, South Korea
| | - Byungkyu Park
- Department of Computer Science and Engineering, Inha University, Incheon, South Korea
| | - Kyungsook Han
- Department of Computer Science and Engineering, Inha University, Incheon, South Korea.
| |
Collapse
|
36
|
Cheng Z, Zhou S, Guan J. Computationally predicting protein-RNA interactions using only positive and unlabeled examples. J Bioinform Comput Biol 2015; 13:1541005. [DOI: 10.1142/s021972001541005x] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein–RNA interactions (PRIs) are considerably important in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulations of gene expression to the active defense of host against virus. With the development of high throughput technology, large amounts of PRI information is available for computationally predicting unknown PRIs. In recent years, a number of computational methods for predicting PRIs have been developed in the literature, which usually artificially construct negative samples based on verified nonredundant datasets of PRIs to train classifiers. However, such negative samples are not real negative samples, some even may be unknown positive samples. Consequently, the classifiers trained with such training datasets cannot achieve satisfactory prediction performance. In this paper, we propose a novel method PRIPU that employs biased-support vector machine (SVM) for predicting Protein-RNA Interactions using only Positive and Unlabeled examples. To the best of our knowledge, this is the first work that predicts PRIs using only positive and unlabeled samples. We first collect known PRIs as our benchmark datasets and extract sequence-based features to represent each PRI. To reduce the dimension of feature vectors for lowering computational cost, we select a subset of features by a filter-based feature selection method. Then, biased-SVM is employed to train prediction models with different PRI datasets. To evaluate the new method, we also propose a new performance measure called explicit positive recall (EPR), which is specifically suitable for the task of learning positive and unlabeled data. Experimental results over three datasets show that our method not only outperforms four existing methods, but also is able to predict unknown PRIs. Source code, datasets and related documents of PRIPU are available at: http://admis.fudan.edu.cn/projects/pripu.htm .
Collapse
Affiliation(s)
- Zhanzhan Cheng
- Shanghai Key Lab of Intelligent Information Processing and School of Computer Science, Fudan University, 220 Handan Road, Shanghai 200433, China
| | - Shuigeng Zhou
- Shanghai Key Lab of Intelligent Information Processing and School of Computer Science, Fudan University, 220 Handan Road, Shanghai 200433, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, 4800 Cao'an Road, Shanghai 201804, China
| |
Collapse
|
37
|
Pérez-Cano L, Fernández-Recio J. Dissection and prediction of RNA-binding sites on proteins. Biomol Concepts 2015; 1:345-55. [PMID: 25962008 DOI: 10.1515/bmc.2010.037] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
RNA-binding proteins are involved in many important regulatory processes in cells and their study is essential for a complete understanding of living organisms. They show a large variability from both structural and functional points of view. However, several recent studies performed on protein-RNA crystal structures have revealed interesting common properties. RNA-binding sites usually constitute patches of positively charged or polar residues that make most of the specific and non-specific contacts with RNA. Negatively charged or aliphatic residues are less frequent at protein-RNA interfaces, although they can also be found either forming aliphatic and positive-negative pairs in protein RNA-binding sites or contacting RNA through their main chains. Aromatic residues found within these interfaces are usually involved in specific base recognition at RNA single-strand regions. This specific recognition, in combination with structural complementarity, represents the key source for specificity in protein-RNA association. From all this knowledge, a variety of computational methods for prediction of RNA-binding sites have been developed based either on protein sequence or on protein structure. Some reported methods are really successful in the identification of RNA-binding proteins or the prediction of RNA-binding sites. Given the growing interest in the field, all these studies and prediction methods will undoubtedly contribute to the identification and comprehension of protein-RNA interactions.
Collapse
|
38
|
Yan J, Friedrich S, Kurgan L. A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues. Brief Bioinform 2015; 17:88-105. [DOI: 10.1093/bib/bbv023] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Indexed: 01/07/2023] Open
|
39
|
Xiong D, Zeng J, Gong H. RBRIdent: An algorithm for improved identification of RNA-binding residues in proteins from primary sequences. Proteins 2015; 83:1068-77. [DOI: 10.1002/prot.24806] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 03/23/2015] [Accepted: 03/24/2015] [Indexed: 01/15/2023]
Affiliation(s)
- Dapeng Xiong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University; Beijing 100084 China
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University; Beijing 100084 China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University; Beijing 100084 China
| |
Collapse
|
40
|
Prediction of Protein-RNA Interactions Using Sequence and Structure Descriptors**This work was partially supported by the National Natural Science Foundation of China (NSFC) Grant No. 31100949, the Scientific Research Foundation for the Returned Overseas Chinese Scholars, Ministry of Education of China, the Fundamental Research Funds of Shandong University Grant No. 2014TB006, University of Rochester Center for AIDS Research Grant P30 AI078498 (NIH/NIAID) and NIH R01 Grant GM100788-01. ACTA ACUST UNITED AC 2015. [DOI: 10.1016/j.ifacol.2015.12.090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
41
|
Tiwari AK, Srivastava R. A survey of computational intelligence techniques in protein function prediction. INTERNATIONAL JOURNAL OF PROTEOMICS 2014; 2014:845479. [PMID: 25574395 PMCID: PMC4276698 DOI: 10.1155/2014/845479] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Revised: 10/31/2014] [Accepted: 11/07/2014] [Indexed: 02/08/2023]
Abstract
During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction.
Collapse
Affiliation(s)
- Arvind Kumar Tiwari
- Department of Computer Science & Engineering, Indian Institute of Technology (BHU), Varanasi 221005, India
| | - Rajeev Srivastava
- Department of Computer Science & Engineering, Indian Institute of Technology (BHU), Varanasi 221005, India
| |
Collapse
|
42
|
Park B, Kim H, Han K. DBBP: database of binding pairs in protein-nucleic acid interactions. BMC Bioinformatics 2014; 15 Suppl 15:S5. [PMID: 25474259 PMCID: PMC4271565 DOI: 10.1186/1471-2105-15-s15-s5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Background Interaction of proteins with other molecules plays an important role in many biological activities. As many structures of protein-DNA complexes and protein-RNA complexes have been determined in the past years, several databases have been constructed to provide structure data of the complexes. However, the information on the binding sites between proteins and nucleic acids is not readily available from the structure data since the data consists mostly of the three-dimensional coordinates of the atoms in the complexes. Results We analyzed the huge amount of structure data for the hydrogen bonding interactions between proteins and nucleic acids and developed a database called DBBP (DataBase of Binding Pairs in protein-nucleic acid interactions, http://bclab.inha.ac.kr/dbbp). DBBP contains 44,955 hydrogen bonds (H-bonds) of protein-DNA interactions and 77,947 H-bonds of protein-RNA interactions. Conclusions Analysis of the huge amount of structure data of protein-nucleic acid complexes is labor-intensive, yet provides useful information for studying protein-nucleic acid interactions. DBBP provides the detailed information of hydrogen-bonding interactions between proteins and nucleic acids at various levels from the atomic level to the residue level. The binding information can be used as a valuable resource for developing a computational method aiming at predicting new binding sites in proteins or nucleic acids.
Collapse
|
43
|
Li S, Yamashita K, Amada KM, Standley DM. Quantifying sequence and structural features of protein-RNA interactions. Nucleic Acids Res 2014; 42:10086-98. [PMID: 25063293 PMCID: PMC4150784 DOI: 10.1093/nar/gku681] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Increasing awareness of the importance of protein–RNA interactions has motivated many approaches to predict residue-level RNA binding sites in proteins based on sequence or structural characteristics. Sequence-based predictors are usually high in sensitivity but low in specificity; conversely structure-based predictors tend to have high specificity, but lower sensitivity. Here we quantified the contribution of both sequence- and structure-based features as indicators of RNA-binding propensity using a machine-learning approach. In order to capture structural information for proteins without a known structure, we used homology modeling to extract the relevant structural features. Several novel and modified features enhanced the accuracy of residue-level RNA-binding propensity beyond what has been reported previously, including by meta-prediction servers. These features include: hidden Markov model-based evolutionary conservation, surface deformations based on the Laplacian norm formalism, and relative solvent accessibility partitioned into backbone and side chain contributions. We constructed a web server called aaRNA that implements the proposed method and demonstrate its use in identifying putative RNA binding sites.
Collapse
Affiliation(s)
- Songling Li
- Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan
| | - Kazuo Yamashita
- Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan
| | - Karlou Mar Amada
- Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan
| | - Daron M Standley
- Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan
| |
Collapse
|
44
|
Li Y, Chen YY, Wang F, Xu ZS, Jiang Q, Xiong AS. Isolation and characterization of the Agvip1 gene and response to abiotic and metal ions stresses in three celery cultivars. Mol Biol Rep 2014; 41:6003-11. [DOI: 10.1007/s11033-014-3478-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2014] [Accepted: 06/14/2014] [Indexed: 10/25/2022]
|
45
|
Flach K, Ramminger E, Hilbrich I, Arsalan-Werner A, Albrecht F, Herrmann L, Goedert M, Arendt T, Holzer M. Axotrophin/MARCH7 acts as an E3 ubiquitin ligase and ubiquitinates tau protein in vitro impairing microtubule binding. Biochim Biophys Acta Mol Basis Dis 2014; 1842:1527-38. [PMID: 24905733 PMCID: PMC4311138 DOI: 10.1016/j.bbadis.2014.05.029] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2013] [Revised: 05/05/2014] [Accepted: 05/28/2014] [Indexed: 12/11/2022]
Abstract
Tau is the major microtubule-associated protein in neurons involved in microtubule stabilization in the axonal compartment. Changes in tau gene expression, alternative splicing and posttranslational modification regulate tau function and in tauopathies can result in tau mislocalization and dysfunction, causing tau aggregation and cell death. To uncover proteins involved in the development of tauopathies, a yeast two-hybrid system was used to screen for tau-interacting proteins. We show that axotrophin/MARCH7, a RING-variant domain containing protein with similarity to E3 ubiquitin ligases interacts with tau. We defined the tau binding domain to amino acids 552–682 of axotrophin comprising the RING-variant domain. Co-immunoprecipitation and co-localization confirmed the specificity of the interaction. Intracellular localization of axotrophin is determined by an N-terminal nuclear targeting signal and a C-terminal nuclear export signal. In AD brain nuclear localization is lost and axotrophin is rather associated with neurofibrillary tangles. We find here that tau becomes mono-ubiquitinated by recombinant tau-interacting RING-variant domain, which diminishes its microtubule-binding. In vitro ubiquitination of four-repeat tau results in incorporation of up to four ubiquitin molecules compared to two molecules in three-repeat tau. In summary, we present a novel tau modification occurring preferentially on 4-repeat tau protein which modifies microtubule-binding and may impact on the pathogenesis of tauopathies. We search for tau-interacting proteins using a cytotrap yeast two-hybrid assay. MARCH7 was identified as a tau-binding protein and confirmed by several methods. Recombinant MARCH7 Ring-variant domain uses Ubc5 for E3 self-ubiquitinating activity. MARCH7 Ring-variant domain mono-ubiquitinates tau protein at multiple sites including the microtubule-binding domain. Mono-ubiquitination of tau protein diminishes its microtubule-binding.
Collapse
Affiliation(s)
- Katharina Flach
- Paul Flechsig Institute of Brain Research, Department of Molecular and Cellular Mechanisms of Neurodegeneration, University of Leipzig, 04109 Leipzig, Germany
| | - Ellen Ramminger
- Paul Flechsig Institute of Brain Research, Department of Molecular and Cellular Mechanisms of Neurodegeneration, University of Leipzig, 04109 Leipzig, Germany
| | - Isabel Hilbrich
- Paul Flechsig Institute of Brain Research, Department of Molecular and Cellular Mechanisms of Neurodegeneration, University of Leipzig, 04109 Leipzig, Germany
| | - Annika Arsalan-Werner
- Paul Flechsig Institute of Brain Research, Department of Molecular and Cellular Mechanisms of Neurodegeneration, University of Leipzig, 04109 Leipzig, Germany
| | - Franziska Albrecht
- Paul Flechsig Institute of Brain Research, Department of Molecular and Cellular Mechanisms of Neurodegeneration, University of Leipzig, 04109 Leipzig, Germany
| | - Lydia Herrmann
- Paul Flechsig Institute of Brain Research, Department of Molecular and Cellular Mechanisms of Neurodegeneration, University of Leipzig, 04109 Leipzig, Germany
| | - Michel Goedert
- MRC, Laboratory of Molecular Biology, Neurobiology Division, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Thomas Arendt
- Paul Flechsig Institute of Brain Research, Department of Molecular and Cellular Mechanisms of Neurodegeneration, University of Leipzig, 04109 Leipzig, Germany
| | - Max Holzer
- Paul Flechsig Institute of Brain Research, Department of Molecular and Cellular Mechanisms of Neurodegeneration, University of Leipzig, 04109 Leipzig, Germany.
| |
Collapse
|
46
|
Walia RR, Xue LC, Wilkins K, El-Manzalawy Y, Dobbs D, Honavar V. RNABindRPlus: a predictor that combines machine learning and sequence homology-based methods to improve the reliability of predicted RNA-binding residues in proteins. PLoS One 2014; 9:e97725. [PMID: 24846307 PMCID: PMC4028231 DOI: 10.1371/journal.pone.0097725] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2014] [Accepted: 04/08/2014] [Indexed: 01/18/2023] Open
Abstract
Protein-RNA interactions are central to essential cellular processes such as protein synthesis and regulation of gene expression and play roles in human infectious and genetic diseases. Reliable identification of protein-RNA interfaces is critical for understanding the structural bases and functional implications of such interactions and for developing effective approaches to rational drug design. Sequence-based computational methods offer a viable, cost-effective way to identify putative RNA-binding residues in RNA-binding proteins. Here we report two novel approaches: (i) HomPRIP, a sequence homology-based method for predicting RNA-binding sites in proteins; (ii) RNABindRPlus, a new method that combines predictions from HomPRIP with those from an optimized Support Vector Machine (SVM) classifier trained on a benchmark dataset of 198 RNA-binding proteins. Although highly reliable, HomPRIP cannot make predictions for the unaligned parts of query proteins and its coverage is limited by the availability of close sequence homologs of the query protein with experimentally determined RNA-binding sites. RNABindRPlus overcomes these limitations. We compared the performance of HomPRIP and RNABindRPlus with that of several state-of-the-art predictors on two test sets, RB44 and RB111. On a subset of proteins for which homologs with experimentally determined interfaces could be reliably identified, HomPRIP outperformed all other methods achieving an MCC of 0.63 on RB44 and 0.83 on RB111. RNABindRPlus was able to predict RNA-binding residues of all proteins in both test sets, achieving an MCC of 0.55 and 0.37, respectively, and outperforming all other methods, including those that make use of structure-derived features of proteins. More importantly, RNABindRPlus outperforms all other methods for any choice of tradeoff between precision and recall. An important advantage of both HomPRIP and RNABindRPlus is that they rely on readily available sequence and sequence-derived features of RNA-binding proteins. A webserver implementation of both methods is freely available at http://einstein.cs.iastate.edu/RNABindRPlus/.
Collapse
Affiliation(s)
- Rasna R. Walia
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, Iowa, United States of America
- Department of Computer Science, Iowa State University, Ames, Iowa, United States of America
| | - Li C. Xue
- College of Information Sciences and Technology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Katherine Wilkins
- Department of Plant Pathology and Plant-Microbe Biology, Cornell University, Ithaca, New York, United States of America
- Graduate Field of Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Yasser El-Manzalawy
- Department of Systems and Computer Engineering, Al-Azhar University, Cairo, Egypt
| | - Drena Dobbs
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, Iowa, United States of America
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, Iowa, United States of America
| | - Vasant Honavar
- College of Information Sciences and Technology, Pennsylvania State University, University Park, Pennsylvania, United States of America
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania, United States of America
- The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, United States of America
| |
Collapse
|
47
|
Livi CM, Blanzieri E. Protein-specific prediction of mRNA binding using RNA sequences, binding motifs and predicted secondary structures. BMC Bioinformatics 2014; 15:123. [PMID: 24780077 PMCID: PMC4098778 DOI: 10.1186/1471-2105-15-123] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2013] [Accepted: 04/16/2014] [Indexed: 12/14/2022] Open
Abstract
Background RNA-binding proteins interact with specific RNA molecules to regulate important cellular processes. It is therefore necessary to identify the RNA interaction partners in order to understand the precise functions of such proteins. Protein-RNA interactions are typically characterized using in vivo and in vitro experiments but these may not detect all binding partners. Therefore, computational methods that capture the protein-dependent nature of such binding interactions could help to predict potential binding partners in silico. Results We have developed three methods to predict whether an RNA can interact with a particular RNA-binding protein using support vector machines and different features based on the sequence (the Oli method), the motif score (the OliMo method) and the secondary structure (the OliMoSS method). We applied these approaches to different experimentally-derived datasets and compared the predictions with RNAcontext and RPISeq. Oli outperformed OliMoSS and RPISeq, confirming our protein-specific predictions and suggesting that tetranucleotide frequencies are appropriate discriminative features. Oli and RNAcontext were the most competitive methods in terms of the area under curve. A precision-recall curve analysis achieved higher precision values for Oli. On a second experimental dataset including real negative binding information, Oli outperformed RNAcontext with a precision of 0.73 vs. 0.59. Conclusions Our experiments showed that features based on primary sequence information are sufficiently discriminating to predict specific RNA-protein interactions. Sequence motifs and secondary structure information were not necessary to improve these predictions. Finally we confirmed that protein-specific experimental data concerning RNA-protein interactions are valuable sources of information that can be used for the efficient training of models for in silico predictions. The scripts are available upon request to the corresponding author.
Collapse
Affiliation(s)
- Carmen M Livi
- Department of Information Engineering and Computer Science, University of Trento, Via Sommarive 5, Trento, Italy.
| | | |
Collapse
|
48
|
Fang C, Noguchi T, Yamana H. Simplified sequence-based method for ATP-binding prediction using contextual local evolutionary conservation. Algorithms Mol Biol 2014; 9:7. [PMID: 24618258 PMCID: PMC3995811 DOI: 10.1186/1748-7188-9-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Accepted: 03/05/2014] [Indexed: 12/23/2022] Open
Abstract
Background Identifying ligand-binding sites is a key step to annotate the protein functions and to find applications in drug design. Now, many sequence-based methods adopted various predicted results from other classifiers, such as predicted secondary structure, predicted solvent accessibility and predicted disorder probabilities, to combine with position-specific scoring matrix (PSSM) as input for binding sites prediction. These predicted features not only easily result in high-dimensional feature space, but also greatly increased the complexity of algorithms. Moreover, the performances of these predictors are also largely influenced by the other classifiers. Results In order to verify that conservation is the most powerful attribute in identifying ligand-binding sites, and to show the importance of revising PSSM to match the detailed conservation pattern of functional site in prediction, we have analyzed the Adenosine-5'-triphosphate (ATP) ligand as an example, and proposed a simple method for ATP-binding sites prediction, named as CLCLpred (Contextual Local evolutionary Conservation-based method for Ligand-binding prediction). Our method employed no predicted results from other classifiers as input; all used features were extracted from PSSM only. We tested our method on 2 separate data sets. Experimental results showed that, comparing with other 9 existing methods on the same data sets, our method achieved the best performance. Conclusions This study demonstrates that: 1) exploiting the signal from the detailed conservation pattern of residues will largely facilitate the prediction of protein functional sites; and 2) the local evolutionary conservation enables accurate prediction of ATP-binding sites directly from protein sequence.
Collapse
|
49
|
Klus P, Bolognesi B, Agostini F, Marchese D, Zanzoni A, Tartaglia GG. The cleverSuite approach for protein characterization: predictions of structural properties, solubility, chaperone requirements and RNA-binding abilities. ACTA ACUST UNITED AC 2014; 30:1601-8. [PMID: 24493033 PMCID: PMC4029037 DOI: 10.1093/bioinformatics/btu074] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Motivation: The recent shift towards high-throughput screening is posing new challenges for the interpretation of experimental results. Here we propose the cleverSuite approach for large-scale characterization of protein groups. Description: The central part of the cleverSuite is the cleverMachine (CM), an algorithm that performs statistics on protein sequences by comparing their physico-chemical propensities. The second element is called cleverClassifier and builds on top of the models generated by the CM to allow classification of new datasets. Results: We applied the cleverSuite to predict secondary structure properties, solubility, chaperone requirements and RNA-binding abilities. Using cross-validation and independent datasets, the cleverSuite reproduces experimental findings with great accuracy and provides models that can be used for future investigations. Availability: The intuitive interface for dataset exploration, analysis and prediction is available at http://s.tartaglialab.com/clever_suite. Contact:gian.tartaglia@crg.es Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Petr Klus
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Benedetta Bolognesi
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Federico Agostini
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Domenica Marchese
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Andreas Zanzoni
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Gian Gaetano Tartaglia
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| |
Collapse
|
50
|
Incorporating significant amino acid pairs and protein domains to predict RNA splicing-related proteins with functional roles. J Comput Aided Mol Des 2014; 28:49-60. [PMID: 24442949 DOI: 10.1007/s10822-014-9706-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2013] [Accepted: 01/07/2014] [Indexed: 12/20/2022]
Abstract
Machinery of pre-mRNA splicing is carried out through the interaction of RNA sequence elements and a variety of RNA splicing-related proteins (SRPs) (e.g. spliceosome and splicing factors). Alternative splicing, which is an important post-transcriptional regulation in eukaryotes, gives rise to multiple mature mRNA isoforms, which encodes proteins with functional diversities. However, the regulation of RNA splicing is not yet fully elucidated, partly because SRPs have not yet been exhaustively identified and the experimental identification is labor-intensive. Therefore, we are motivated to design a new method for identifying SRPs with their functional roles in the regulation of RNA splicing. The experimentally verified SRPs were manually curated from research articles. According to the functional annotation of Splicing Related Gene Database, the collected SRPs were further categorized into four functional groups including small nuclear Ribonucleoprotein, Splicing Factor, Splicing Regulation Factor and Novel Spliceosome Protein. The composition of amino acid pairs indicates that there are remarkable differences among four functional groups of SRPs. Then, support vector machines (SVMs) were utilized to learn the predictive models for identifying SRPs as well as their functional roles. The cross-validation evaluation presents that the SVM models trained with significant amino acid pairs and functional domains could provide a better predictive performance. In addition, the independent testing demonstrates that the proposed method could accurately identify SRPs in mammals/plants as well as effectively distinguish between SRPs and RNA-binding proteins. This investigation provides a practical means to identifying potential SRPs and a perspective for exploring the regulation of RNA splicing.
Collapse
|