1
|
Manea I, Casian M, Hosu-Stancioiu O, de-Los-Santos-Álvarez N, Lobo-Castañón MJ, Cristea C. A review on magnetic beads-based SELEX technologies: Applications from small to large target molecules. Anal Chim Acta 2024; 1297:342325. [PMID: 38438246 DOI: 10.1016/j.aca.2024.342325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/18/2024] [Accepted: 02/01/2024] [Indexed: 03/06/2024]
Abstract
This review summarizes the stepwise strategy and key points for magnetic beads (MBs)-based aptamer selection which is suitable for isolating aptamers against small and large molecules via systematic evolution of ligands by exponential enrichment (SELEX). Particularities, if any, are discussed according to the target size. Examples targeting small molecules (<1000 Da) such as xenobiotics, toxins, pesticides, herbicides, illegal additives, hormones, and large targets such as proteins (biomarkers, pathogens) are discussed and presented in tabular formats. Of special interest are the latest advances in more efficient alternatives, which are based on novel instrumentation, materials or microelectronics, such as fluorescence MBs-SELEX or microfluidic chip system-assisted MBs-SELEX. Limitations and perspectives of MBs-SELEX are also reviewed. Taken together, this review aims to provide practical insights into MBs-SELEX technologies and their ability to screen multiple potential aptamers against targets from small to large molecules.
Collapse
Affiliation(s)
- Ioana Manea
- Department of Analytical Chemistry, Faculty of Pharmacy, "Iuliu Haţieganu" University of Medicine and Pharmacy, 4 Pasteur Street, 400349, Cluj-Napoca, Romania
| | - Magdolna Casian
- Department of Analytical Chemistry, Faculty of Pharmacy, "Iuliu Haţieganu" University of Medicine and Pharmacy, 4 Pasteur Street, 400349, Cluj-Napoca, Romania; Departamento de Química Física y Analítica, Universidad de Oviedo, Av. Julián Clavería 8, 33006, Oviedo, Spain
| | - Oana Hosu-Stancioiu
- Department of Analytical Chemistry, Faculty of Pharmacy, "Iuliu Haţieganu" University of Medicine and Pharmacy, 4 Pasteur Street, 400349, Cluj-Napoca, Romania.
| | - Noemí de-Los-Santos-Álvarez
- Departamento de Química Física y Analítica, Universidad de Oviedo, Av. Julián Clavería 8, 33006, Oviedo, Spain; Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), Av. de Roma s/n, 33011, Oviedo, Spain
| | - María Jesús Lobo-Castañón
- Departamento de Química Física y Analítica, Universidad de Oviedo, Av. Julián Clavería 8, 33006, Oviedo, Spain; Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), Av. de Roma s/n, 33011, Oviedo, Spain
| | - Cecilia Cristea
- Department of Analytical Chemistry, Faculty of Pharmacy, "Iuliu Haţieganu" University of Medicine and Pharmacy, 4 Pasteur Street, 400349, Cluj-Napoca, Romania.
| |
Collapse
|
2
|
Basu S, Zhao B, Biró B, Faraggi E, Gsponer J, Hu G, Kloczkowski A, Malhis N, Mirdita M, Söding J, Steinegger M, Wang D, Wang K, Xu D, Zhang J, Kurgan L. DescribePROT in 2023: more, higher-quality and experimental annotations and improved data download options. Nucleic Acids Res 2024; 52:D426-D433. [PMID: 37933852 PMCID: PMC10767971 DOI: 10.1093/nar/gkad985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/12/2023] [Accepted: 10/16/2023] [Indexed: 11/08/2023] Open
Abstract
The DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeutics. DescribePROT can be freely accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Bi Zhao
- Genomics Program, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Bálint Biró
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
- Department of Animal Biotechnology, Hungarian University of Agriculture and Life Sciences, Gödöllő, Hungary
| | - Eshel Faraggi
- Physics Department, Indiana University, Indianapolis, IN, USA
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, P.R. China
| | - Andrzej Kloczkowski
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Johannes Söding
- Quantitative and Computational Biology, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
- Institute of Molecular Biology & Genetics, Seoul National University, Seoul, Republic of Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Duolin Wang
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, USA
| | - Kui Wang
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, P.R. China
| | - Dong Xu
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, USA
| | - Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, P.R. China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
3
|
Song J, Kurgan L. Availability of web servers significantly boosts citations rates of bioinformatics methods for protein function and disorder prediction. BIOINFORMATICS ADVANCES 2023; 3:vbad184. [PMID: 38146538 PMCID: PMC10749743 DOI: 10.1093/bioadv/vbad184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 12/08/2023] [Accepted: 12/15/2023] [Indexed: 12/27/2023]
Abstract
Motivation Development of bioinformatics methods is a long, complex and resource-hungry process. Hundreds of these tools were released. While some methods are highly cited and used, many suffer relatively low citation rates. We empirically analyze a large collection of recently released methods in three diverse protein function and disorder prediction areas to identify key factors that contribute to increased citations. Results We show that provision of a working web server significantly boosts citation rates. On average, methods with working web servers generate three times as many citations compared to tools that are available as only source code, have no code and no server, or are no longer available. This observation holds consistently across different research areas and publication years. We also find that differences in predictive performance are unlikely to impact citation rates. Overall, our empirical results suggest that a relatively low-cost investment into the provision and long-term support of web servers would substantially increase the impact of bioinformatics tools.
Collapse
Affiliation(s)
- Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Clayton, VIC 3800, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| |
Collapse
|
4
|
Chen J, Gu Z, Lai L, Pei J. In silico protein function prediction: the rise of machine learning-based approaches. MEDICAL REVIEW (2021) 2023; 3:487-510. [PMID: 38282798 PMCID: PMC10808870 DOI: 10.1515/mr-2023-0038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/11/2023] [Indexed: 01/30/2024]
Abstract
Proteins function as integral actors in essential life processes, rendering the realm of protein research a fundamental domain that possesses the potential to propel advancements in pharmaceuticals and disease investigation. Within the context of protein research, an imperious demand arises to uncover protein functionalities and untangle intricate mechanistic underpinnings. Due to the exorbitant costs and limited throughput inherent in experimental investigations, computational models offer a promising alternative to accelerate protein function annotation. In recent years, protein pre-training models have exhibited noteworthy advancement across multiple prediction tasks. This advancement highlights a notable prospect for effectively tackling the intricate downstream task associated with protein function prediction. In this review, we elucidate the historical evolution and research paradigms of computational methods for predicting protein function. Subsequently, we summarize the progress in protein and molecule representation as well as feature extraction techniques. Furthermore, we assess the performance of machine learning-based algorithms across various objectives in protein function prediction, thereby offering a comprehensive perspective on the progress within this field.
Collapse
Affiliation(s)
- Jiaxiao Chen
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Zhonghui Gu
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences (2021RU014), Beijing, China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences (2021RU014), Beijing, China
| |
Collapse
|
5
|
Zhu H, Yang Y, Wang Y, Wang F, Huang Y, Chang Y, Wong KC, Li X. Dynamic characterization and interpretation for protein-RNA interactions across diverse cellular conditions using HDRNet. Nat Commun 2023; 14:6824. [PMID: 37884495 PMCID: PMC10603054 DOI: 10.1038/s41467-023-42547-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 10/13/2023] [Indexed: 10/28/2023] Open
Abstract
RNA-binding proteins play crucial roles in the regulation of gene expression, and understanding the interactions between RNAs and RBPs in distinct cellular conditions forms the basis for comprehending the underlying RNA function. However, current computational methods pose challenges to the cross-prediction of RNA-protein binding events across diverse cell lines and tissue contexts. Here, we develop HDRNet, an end-to-end deep learning-based framework to precisely predict dynamic RBP binding events under diverse cellular conditions. Our results demonstrate that HDRNet can accurately and efficiently identify binding sites, particularly for dynamic prediction, outperforming other state-of-the-art models on 261 linear RNA datasets from both eCLIP and CLIP-seq, supplemented with additional tissue data. Moreover, we conduct motif and interpretation analyses to provide fresh insights into the pathological mechanisms underlying RNA-RBP interactions from various perspectives. Our functional genomic analysis further explores the gene-human disease associations, uncovering previously uncharacterized observations for a broad range of genetic disorders.
Collapse
Affiliation(s)
- Haoran Zhu
- School of Artificial Intelligence, Jilin University, 130012, Changchun, China
| | - Yuning Yang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Yunhe Wang
- School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong, Hong Kong SAR
| | - Yujian Huang
- College of Computer Science and Cyber Security, Chengdu University of Technology, 610059, Chengdu, China
| | - Yi Chang
- School of Artificial Intelligence, Jilin University, 130012, Changchun, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong, Hong Kong SAR.
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, 130012, Changchun, China.
| |
Collapse
|
6
|
Agarwal A, Kant S, Bahadur RP. Efficient mapping of RNA-binding residues in RNA-binding proteins using local sequence features of binding site residues in protein-RNA complexes. Proteins 2023; 91:1361-1379. [PMID: 37254800 DOI: 10.1002/prot.26528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 04/13/2023] [Accepted: 05/02/2023] [Indexed: 06/01/2023]
Abstract
Protein-RNA interactions play vital roles in plethora of biological processes such as regulation of gene expression, protein synthesis, mRNA processing and biogenesis. Identification of RNA-binding residues (RBRs) in proteins is essential to understand RNA-mediated protein functioning, to perform site-directed mutagenesis and to develop novel targeted drug therapies. Moreover, the extensive gap between sequence and structural data restricts the identification of binding sites in unsolved structures. However, efficient use of computational methods demanding only sequence to identify binding residues can bridge this huge sequence-structure gap. In this study, we have extensively studied protein-RNA interface in known RNA-binding proteins (RBPs). We find that the interface is highly enriched in basic and polar residues with Gly being the most common interface neighbor. We investigated several amino acid features and developed a method to predict putative RBRs from amino acid sequence. We have implemented balanced random forest (BRF) classifier with local residue features of protein sequences for prediction. With 5-fold cross-validations, the sequence pattern derived dipeptide composition based BRF model (DCP-BRF) resulted in an accuracy of 87.9%, specificity of 88.8%, sensitivity of 82.2%, Mathew's correlation coefficient of 0.60 and AUC of 0.93, performing better than few existing methods. We further validated our prediction model on known human RBPs through RBR prediction and could map ~54% of them. Further, knowledge of binding site preferences obtained from computational predictions combined with experimental validations of potential RNA binding sites can enhance our understanding of protein-RNA interactions. This may serve to accelerate investigations on functional roles of many novel RBPs.
Collapse
Affiliation(s)
- Ankita Agarwal
- School of Bio Science, Indian Institute of Technology Kharagpur, Kharagpur, India
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Shri Kant
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| |
Collapse
|
7
|
Zhang F, Li M, Zhang J, Kurgan L. HybridRNAbind: prediction of RNA interacting residues across structure-annotated and disorder-annotated proteins. Nucleic Acids Res 2023; 51:e25. [PMID: 36629262 PMCID: PMC10018345 DOI: 10.1093/nar/gkac1253] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 11/22/2022] [Accepted: 12/15/2022] [Indexed: 01/12/2023] Open
Abstract
The sequence-based predictors of RNA-binding residues (RBRs) are trained on either structure-annotated or disorder-annotated binding regions. A recent study of predictors of protein-binding residues shows that they are plagued by high levels of cross-predictions (protein binding residues are predicted as nucleic acid binding) and that structure-trained predictors perform poorly for the disorder-annotated regions and vice versa. Consequently, we analyze a representative set of the structure and disorder trained predictors of RBRs to comprehensively assess quality of their predictions. Our empirical analysis that relies on a new and low-similarity benchmark dataset reveals that the structure-trained predictors of RBRs perform well for the structure-annotated proteins while the disorder-trained predictors provide accurate results for the disorder-annotated proteins. However, these methods work only modestly well on the opposite types of annotations, motivating the need for new solutions. Using an empirical approach, we design HybridRNAbind meta-model that generates accurate predictions and low amounts of cross-predictions when tested on data that combines structure and disorder-annotated RBRs. We release this meta-model as a convenient webserver which is available at https://www.csuligroup.com/hybridRNAbind/.
Collapse
Affiliation(s)
- Fuhao Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
8
|
Agarwal A, Bahadur RP. Modular architecture and functional annotation of human RNA-binding proteins containing RNA recognition motif. Biochimie 2023; 209:116-130. [PMID: 36716848 DOI: 10.1016/j.biochi.2023.01.017] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 01/09/2023] [Accepted: 01/23/2023] [Indexed: 01/28/2023]
Abstract
RNA-binding proteins (RBPs) are structurally and functionally diverse macromolecules with significant involvement in several post-transcriptional gene regulatory processes and human diseases. RNA recognition motif (RRM) is one of the most abundant RNA-binding domains in human RBPs. The unique modular architecture of each RBP containing RRM is crucial for its diverse target recognition and function. Genome-wide study of these structurally conserved and functionally diverse domains can enhance our understanding of their functional implications. In this study, modular architecture of RRM containing RBPs in human proteome is identified and systematically analysed. We observe that 30% of human RBPs with RNA-binding function contain RRM in single or multiple repeats or with other domains with maximum of six repeats. Zinc-fingers are the most frequently co-occurring domain partner of RRMs. Human RRM containing RBPs mostly belong to RNA metabolism class of proteins and are significantly enriched in two functional pathways including spliceosome and mRNA surveillance. Various human diseases are associated with 18% of the RRM containing RBPs. Single RRM containing RBPs are highly enriched in disorder regions. Gene ontology (GO) molecular functions including poly(A), poly(U) and miRNA binding are highly depleted in RBPs with single RRM, indicating the significance of modular nature of RRMs in specific function. The current study reports all the possible domain architectures of RRM containing human RBPs and their functional enrichment. The idea of domain architecture, and how they confer specificity and new functionalities to RBPs, can help in re-designing of modular RRM containing RBPs with re-engineered function.
Collapse
Affiliation(s)
- Ankita Agarwal
- School of Bio Science, Indian Institute of Technology Kharagpur, Kharagpur 721302, India; Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, India.
| |
Collapse
|
9
|
Wu Z, Basu S, Wu X, Kurgan L. qNABpredict: Quick, accurate, and taxonomy-aware sequence-based prediction of content of nucleic acid binding amino acids. Protein Sci 2023; 32:e4544. [PMID: 36519304 PMCID: PMC9798252 DOI: 10.1002/pro.4544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/07/2022] [Accepted: 12/08/2022] [Indexed: 12/23/2022]
Abstract
Protein sequence-based predictors of nucleic acid (NA)-binding include methods that predict NA-binding proteins and NA-binding residues. The residue-level tools produce more details but suffer high computational cost since they must predict every amino acid in the input sequence and rely on multiple sequence alignments. We propose an alternative approach that predicts content (fraction) of the NA-binding residues, offering more information than the protein-level prediction and much shorter runtime than the residue-level tools. Our first-of-its-kind content predictor, qNABpredict, relies on a small, rationally designed and fast-to-compute feature set that represents relevant characteristics extracted from the input sequence and a well-parametrized support vector regression model. We provide two versions of qNABpredict, a taxonomy-agnostic model that can be used for proteins of unknown taxonomic origin and more accurate taxonomy-aware models that are tailored to specific taxonomic kingdoms: archaea, bacteria, eukaryota, and viruses. Empirical tests on a low-similarity test dataset show that qNABpredict is 100 times faster and generates statistically more accurate content predictions when compared to the content extracted from results produced by the residue-level predictors. We also show that qNABpredict's content predictions can be used to improve results generated by the residue-level predictors. We release qNABpredict as a convenient webserver and source code at http://biomine.cs.vcu.edu/servers/qNABpredict/. This new tool should be particularly useful to predict details of protein-NA interactions for large protein families and proteomes.
Collapse
Affiliation(s)
- Zhonghua Wu
- School of Mathematical Sciences and LPMCNankai UniversityTianjinChina
| | - Sushmita Basu
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Xuantai Wu
- School of Mathematical Sciences and LPMCNankai UniversityTianjinChina
| | - Lukasz Kurgan
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| |
Collapse
|
10
|
Wang Z, Dai Q, Song J, Duan X, Yang H, Yang Z. Predicting RBP Binding Sites of RNA With High-Order Encoding Features and CNN-BLSTM Hybrid Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2409-2419. [PMID: 34038367 DOI: 10.1109/tcbb.2021.3083930] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
RNA binding protein (RBP) is extensively involved in various cellular regulatory processes through the interaction with RNAs. Capturing the RBP binding preferences is fundamental for revealing the pathogenesis of complex diseases. Many experimental detection techniques are still time-consuming and labor-intensive, therefore, it is indispensable to develop a computational method with convincing accuracy. In this study, we proposed a CNN-BLSTM hybrid deep learning framework, named DeepDW, for predicting the RBP binding sites on RNAs with high-order encoding features of RNA sequence and secondary structure. The high-order encoding strategy was used to characterize the dependencies among adjacency nucleotides. For CNN-BLSTM hybrid model, DeepDW first employed two 1-D convolutional neural networks (CNNs) for learning the local features from high-order encoded matrices of RNA sequence and structure separately, and then applied two bidirectional long short-term memory networks (BLSTMs) to capture the global information in a higher level. Moreover, a series of experiments were carried out on 31 public datasets to evaluate our proposed framework, and DeepDW achieved superior performance than the state-of-the-art methods. The results indicated that the combination of high-order encoding method and CNN-BLSTM hybrid model had advantages in identifying RBP-RNA binding sites.
Collapse
|
11
|
Peng X, Wang X, Guo Y, Ge Z, Li F, Gao X, Song J. RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins. Brief Bioinform 2022; 23:6596984. [PMID: 35649392 PMCID: PMC9294422 DOI: 10.1093/bib/bbac215] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/25/2022] [Accepted: 05/06/2022] [Indexed: 11/27/2022] Open
Abstract
RNA binding proteins (RBPs) are critical for the post-transcriptional control of RNAs and play vital roles in a myriad of biological processes, such as RNA localization and gene regulation. Therefore, computational methods that are capable of accurately identifying RBPs are highly desirable and have important implications for biomedical and biotechnological applications. Here, we propose a two-stage deep transfer learning-based framework, termed RBP-TSTL, for accurate prediction of RBPs. In the first stage, the knowledge from the self-supervised pre-trained model was extracted as feature embeddings and used to represent the protein sequences, while in the second stage, a customized deep learning model was initialized based on an annotated pre-training RBPs dataset before being fine-tuned on each corresponding target species dataset. This two-stage transfer learning framework can enable the RBP-TSTL model to be effectively trained to learn and improve the prediction performance. Extensive performance benchmarking of the RBP-TSTL models trained using the features generated by the self-supervised pre-trained model and other models trained using hand-crafting encoding features demonstrated the effectiveness of the proposed two-stage knowledge transfer strategy based on the self-supervised pre-trained models. Using the best-performing RBP-TSTL models, we further conducted genome-scale RBP predictions for Homo sapiens, Arabidopsis thaliana, Escherichia coli, and Salmonella and established a computational compendium containing all the predicted putative RBPs candidates. We anticipate that the proposed RBP-TSTL approach will be explored as a useful tool for the characterization of RNA-binding proteins and exploration of their sequence–structure–function relationships.
Collapse
Affiliation(s)
- Xinxin Peng
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Xiaoyu Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Yuming Guo
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria 3004, Australia
| | - Zongyuan Ge
- Monash e-Research Centre and Faculty of Engineering, Monash University, Melbourne, VIC 3800, Australia
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.,Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia.,College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.,KAUST Computational Bioscience Research Center, King Abdullah University of Science and Technology
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| |
Collapse
|
12
|
Ribonomics Approaches to Identify RBPome in Plants and Other Eukaryotes: Current Progress and Future Prospects. Int J Mol Sci 2022; 23:ijms23115923. [PMID: 35682602 PMCID: PMC9180120 DOI: 10.3390/ijms23115923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 05/16/2022] [Accepted: 05/20/2022] [Indexed: 02/01/2023] Open
Abstract
RNA-binding proteins (RBPs) form complex interactions with RNA to regulate the cell’s activities including cell development and disease resistance. RNA-binding proteome (RBPome) aims to profile and characterize the RNAs and proteins that interact with each other to carry out biological functions. Generally, RNA-centric and protein-centric ribonomic approaches have been successfully developed to profile RBPome in different organisms including plants and animals. Further, more and more novel methods that were firstly devised and applied in mammalians have shown great potential to unravel RBPome in plants such as RNA-interactome capture (RIC) and orthogonal organic phase separation (OOPS). Despise the development of various robust and state-of-the-art ribonomics techniques, genome-wide RBP identifications and characterizations in plants are relatively fewer than those in other eukaryotes, indicating that ribonomics techniques have great opportunities in unraveling and characterizing the RNA–protein interactions in plant species. Here, we review all the available approaches for analyzing RBPs in living organisms. Additionally, we summarize the transcriptome-wide approaches to characterize both the coding and non-coding RBPs in plants and the promising use of RBPome for booming agriculture.
Collapse
|
13
|
Biró B, Zhao B, Kurgan L. Complementarity of the residue-level protein function and structure predictions in human proteins. Comput Struct Biotechnol J 2022; 20:2223-2234. [PMID: 35615015 PMCID: PMC9118482 DOI: 10.1016/j.csbj.2022.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 05/02/2022] [Accepted: 05/02/2022] [Indexed: 11/24/2022] Open
Abstract
Sequence-based predictors of the residue-level protein function and structure cover a broad spectrum of characteristics including intrinsic disorder, secondary structure, solvent accessibility and binding to nucleic acids. They were catalogued and evaluated in numerous surveys and assessments. However, methods focusing on a given characteristic are studied separately from predictors of other characteristics, while they are typically used on the same proteins. We fill this void by studying complementarity of a representative collection of methods that target different predictions using a large, taxonomically consistent, and low similarity dataset of human proteins. First, we bridge the gap between the communities that develop structure-trained vs. disorder-trained predictors of binding residues. Motivated by a recent study of the protein-binding residue predictions, we empirically find that combining the structure-trained and disorder-trained predictors of the DNA-binding and RNA-binding residues leads to substantial improvements in predictive quality. Second, we investigate whether diverse predictors generate results that accurately reproduce relations between secondary structure, solvent accessibility, interaction sites, and intrinsic disorder that are present in the experimental data. Our empirical analysis concludes that predictions accurately reflect all combinations of these relations. Altogether, this study provides unique insights that support combining results produced by diverse residue-level predictors of protein function and structure.
Collapse
Affiliation(s)
- Bálint Biró
- Institute of Genetics and Biotechnology, Hungarian University of Agriculture and Life Sciences, Gödöllő, Hungary
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
14
|
Liu J, Gu Q, Du W, Feng Z, Zhang Q, Tian Y, Luo K, Gong Q, Tian X. Nucleolar RNA in action: Ultrastructure revealed during protein translation through a terpyridyl manganese(II) complex. Biosens Bioelectron 2022; 203:114058. [DOI: 10.1016/j.bios.2022.114058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 12/21/2021] [Accepted: 01/28/2022] [Indexed: 11/02/2022]
|
15
|
Chalupová E, Vaculík O, Poláček J, Jozefov F, Majtner T, Alexiou P. ENNGene: an Easy Neural Network model building tool for Genomics. BMC Genomics 2022; 23:248. [PMID: 35361122 PMCID: PMC8973509 DOI: 10.1186/s12864-022-08414-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 02/23/2022] [Indexed: 11/17/2022] Open
Abstract
Background The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. Results Here we present ENNGene—Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. Conclusions As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08414-x.
Collapse
Affiliation(s)
- Eliška Chalupová
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia.,Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
| | - Ondřej Vaculík
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia.,Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
| | - Jakub Poláček
- Faculty of Informatics, Masaryk University, Brno, Czechia
| | - Filip Jozefov
- Faculty of Informatics, Masaryk University, Brno, Czechia
| | - Tomáš Majtner
- Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
| | - Panagiotis Alexiou
- Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia.
| |
Collapse
|
16
|
Wei J, Chen S, Zong L, Gao X, Li Y. Protein-RNA interaction prediction with deep learning: structure matters. Brief Bioinform 2022; 23:bbab540. [PMID: 34929730 PMCID: PMC8790951 DOI: 10.1093/bib/bbab540] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 11/14/2021] [Accepted: 11/22/2021] [Indexed: 12/11/2022] Open
Abstract
Protein-RNA interactions are of vital importance to a variety of cellular activities. Both experimental and computational techniques have been developed to study the interactions. Because of the limitation of the previous database, especially the lack of protein structure data, most of the existing computational methods rely heavily on the sequence data, with only a small portion of the methods utilizing the structural information. Recently, AlphaFold has revolutionized the entire protein and biology field. Foreseeably, the protein-RNA interaction prediction will also be promoted significantly in the upcoming years. In this work, we give a thorough review of this field, surveying both the binding site and binding preference prediction problems and covering the commonly used datasets, features and models. We also point out the potential challenges and opportunities in this field. This survey summarizes the development of the RNA-binding protein-RNA interaction field in the past and foresees its future development in the post-AlphaFold era.
Collapse
Affiliation(s)
- Junkang Wei
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
| | - Siyuan Chen
- Computational Bioscience Research Center (CBRC),
King Abdullah University of Science and Technology (KAUST),
23955-6900, Thuwal, Saudi Arabia
| | - Licheng Zong
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
| | - Xin Gao
- Computational Bioscience Research Center (CBRC),
King Abdullah University of Science and Technology (KAUST),
23955-6900, Thuwal, Saudi Arabia
| | - Yu Li
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
- The CUHK Shenzhen Research Institute, Hi-Tech Park, 518057,
Shenzhen, China
| |
Collapse
|
17
|
Niu M, Wu J, Zou Q, Liu Z, Xu L. rBPDL:Predicting RNA-Binding Proteins Using Deep Learning. IEEE J Biomed Health Inform 2021; 25:3668-3676. [PMID: 33780344 DOI: 10.1109/jbhi.2021.3069259] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
RNA-binding protein (RBP) is a powerful and wide-ranging regulator that plays an important role in cell development, differentiation, metabolism, health and disease. The prediction of RBPs provides valuable guidance for biologists. Although experimental methods have made great progress in predicting RBP, they are time-consuming and not flexible. Therefore, we developed a network model, rBPDL, by combining a convolutional neural network and long short-term memory for multilabel classification of RBPs. Moreover, to achieve better prediction results, we used a voting algorithm for ensemble learning of the model. We compared rBPDL with state-of-the-art methods and found that rBPDL significantly improved identification performance for the RBP68 dataset, with a macro-Area Under Curve (AUC), micro-AUC, and weighted AUC of 0.936, 0.962, and 0.946, respectively. Furthermore, through AUC statistical analysis of the RBP domain, we analyzed the performance of rBPDL and found that the RBP identification performance in the same domain was similar. In addition, we analyzed the performance preferences and physicochemical properties of the binding protein amino acids and explored the characteristics that affect the binding by using the RBP86 dataset.
Collapse
|
18
|
Mishra A, Khanal R, Kabir WU, Hoque T. AIRBP: Accurate identification of RNA-binding proteins using machine learning techniques. Artif Intell Med 2021; 113:102034. [PMID: 33685590 DOI: 10.1016/j.artmed.2021.102034] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2020] [Revised: 01/19/2021] [Accepted: 02/09/2021] [Indexed: 12/25/2022]
Abstract
Identification of RNA-binding proteins (RBPs) that bind to ribonucleic acid molecules is an important problem in Computational Biology and Bioinformatics. It becomes indispensable to identify RBPs as they play crucial roles in post-transcriptional control of RNAs and RNA metabolism as well as have diverse roles in various biological processes such as splicing, mRNA stabilization, mRNA localization, and translation, RNA synthesis, folding-unfolding, modification, processing, and degradation. The existing experimental techniques for identifying RBPs are time-consuming and expensive. Therefore, identifying RBPs directly from the sequence using computational methods can be useful to annotate RBPs and assist the experimental design efficiently. In this work, we present a method called AIRBP, which is designed using an advanced machine learning technique, called stacking, to effectively predict RBPs by utilizing features extracted from evolutionary information, physiochemical properties, and disordered properties. Moreover, our method, AIRBP, use the majority vote from RBPPred, DeepRBPPred, and the stacking model for the prediction for RBPs. The results show that AIRBP attains Accuracy (ACC), Balanced Accuracy (BACC), F1-score, and Mathews Correlation Coefficient (MCC) of 95.84 %, 94.71 %, 0.928, and 0.899, respectively, based on the training dataset, using 10-fold cross-validation (CV). Further evaluation of AIRBP on independent test set reveals that it achieves ACC, BACC, F1-score, and MCC of 94.36 %, 94.28 %, 0.897, and 0.860, for Human test set; 91.25 %, 93.00 %, 0.896, and 0.835 for S. cerevisiae test set; and 90.60 %, 90.41 %, 0.934, and 0.775 for A. thaliana test set, respectively. These results indicate that the AIRBP outperforms the existing Deep- and TriPepSVM methods. Therefore, the proposed better-performing AIRBP can be useful for accurate identification and annotation of RBPs directly from the sequence and help gain valuable insight to treat critical diseases. Availability: Code-data is available here: http://cs.uno.edu/∼tamjid/Software/AIRBP/code_data.zip.
Collapse
Affiliation(s)
- Avdesh Mishra
- Department of Electrical Engineering and Computer Science, Texas A&M University-Kingsville, Kingsville, TX, USA
| | - Reecha Khanal
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA
| | - Wasi Ul Kabir
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA
| | - Tamjidul Hoque
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA.
| |
Collapse
|
19
|
Zhao B, Katuwawala A, Oldfield CJ, Dunker AK, Faraggi E, Gsponer J, Kloczkowski A, Malhis N, Mirdita M, Obradovic Z, Söding J, Steinegger M, Zhou Y, Kurgan L. DescribePROT: database of amino acid-level protein structure and function predictions. Nucleic Acids Res 2021; 49:D298-D308. [PMID: 33119734 PMCID: PMC7778963 DOI: 10.1093/nar/gkaa931] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/11/2020] [Accepted: 10/05/2020] [Indexed: 12/30/2022] Open
Abstract
We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | | | - A Keith Dunker
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Eshel Faraggi
- Battelle Center for Mathematical Medicine at the Nationwide Children's Hospital, and Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Andrzej Kloczkowski
- Battelle Center for Mathematical Medicine at the Nationwide Children's Hospital, and Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Milot Mirdita
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Zoran Obradovic
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Johannes Söding
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Martin Steinegger
- School of Biological Sciences and Institute of Molecular Biology & Genetics, Seoul National University, Seoul, Republic of Korea
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Gold Coast, Queensland, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
20
|
Li K, Guo ZW, Zhai XM, Yang XX, Wu YS, Liu TC. RBPTD: a database of cancer-related RNA-binding proteins in humans. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5734253. [PMID: 32047888 PMCID: PMC7012770 DOI: 10.1093/database/baz156] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Revised: 12/05/2019] [Accepted: 12/23/2019] [Indexed: 12/12/2022]
Abstract
RNA-binding proteins (RBPs) play important roles in regulating the expression of genes involved in human physiological and pathological processes, especially in cancers. Many RBPs have been found to be dysregulated in cancers; however, there was no tool to incorporate high-throughput data from different dimensions to systematically identify cancer-related RBPs and to explore their causes of abnormality and their potential functions. Therefore, we developed a database named RBPTD to identify cancer-related RBPs in humans and systematically explore their functions and abnormalities by integrating different types of data, including gene expression profiles, prognosis data and DNA copy number variation (CNV), among 28 cancers. We found a total of 454 significantly differentially expressed RBPs, 1970 RBPs with significant prognostic value, and 53 dysregulated RBPs correlated with CNV abnormality. Functions of 26 cancer-related RBPs were explored by analysing high-throughput RNA sequencing data obtained by crosslinking immunoprecipitation, and the remaining RBP functions were predicted by calculating their correlation coefficient with other genes. Finally, we developed the RBPTD for users to explore functions and abnormalities of cancer-related RBPs to improve our understanding of their roles in tumorigenesis. Database URL: http: //www.rbptd.com
Collapse
Affiliation(s)
- Kun Li
- Institute of Antibody Engineering, School of Laboratory Medicine and Biotechnology, Southern Medical University, 1838 N. Guangzhou Ave, Guangzhou, 510515, China
| | - Zhi-Wei Guo
- Institute of Antibody Engineering, School of Laboratory Medicine and Biotechnology, Southern Medical University, 1838 N. Guangzhou Ave, Guangzhou, 510515, China
| | - Xiang-Ming Zhai
- Institute of Antibody Engineering, School of Laboratory Medicine and Biotechnology, Southern Medical University, 1838 N. Guangzhou Ave, Guangzhou, 510515, China
| | - Xue-Xi Yang
- Institute of Antibody Engineering, School of Laboratory Medicine and Biotechnology, Southern Medical University, 1838 N. Guangzhou Ave, Guangzhou, 510515, China
| | - Ying-Song Wu
- Institute of Antibody Engineering, School of Laboratory Medicine and Biotechnology, Southern Medical University, 1838 N. Guangzhou Ave, Guangzhou, 510515, China
| | - Tian-Cai Liu
- Institute of Antibody Engineering, School of Laboratory Medicine and Biotechnology, Southern Medical University, 1838 N. Guangzhou Ave, Guangzhou, 510515, China
| |
Collapse
|
21
|
Navien TN, Thevendran R, Hamdani HY, Tang TH, Citartan M. In silico molecular docking in DNA aptamer development. Biochimie 2020; 180:54-67. [PMID: 33086095 DOI: 10.1016/j.biochi.2020.10.005] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 09/23/2020] [Accepted: 10/14/2020] [Indexed: 12/21/2022]
Abstract
Aptamers are single-stranded DNA or RNA oligonucleotides generated by SELEX that exhibit binding affinity and specificity against a wide variety of target molecules. Compared to RNA aptamers, DNA aptamers are much more stable and therefore are widely adopted in a number of applications especially in diagnostics. The tediousness and rigor associated with certain steps of the SELEX intensify the efforts to adopt in silico molecular docking approaches together with in vitro SELEX procedures in developing DNA aptamers. Inspired by these endeavors, we carry out an overview of the in silico molecular docking approaches in DNA aptamer generation, by detailing the stepwise procedures as well as shedding some light on the various softwares used. The in silico maturation strategy and the limitations of the in silico approaches are also underscored.
Collapse
Affiliation(s)
- Tholasi Nadhan Navien
- Advanced Medical & Dental Institute (AMDI), Universiti Sains Malaysia, Bertam, 13200, Kepala Batas, Penang, Malaysia
| | - Ramesh Thevendran
- Advanced Medical & Dental Institute (AMDI), Universiti Sains Malaysia, Bertam, 13200, Kepala Batas, Penang, Malaysia
| | - Hazrina Yusof Hamdani
- Advanced Medical & Dental Institute (AMDI), Universiti Sains Malaysia, Bertam, 13200, Kepala Batas, Penang, Malaysia
| | - Thean-Hock Tang
- Advanced Medical & Dental Institute (AMDI), Universiti Sains Malaysia, Bertam, 13200, Kepala Batas, Penang, Malaysia.
| | - Marimuthu Citartan
- Advanced Medical & Dental Institute (AMDI), Universiti Sains Malaysia, Bertam, 13200, Kepala Batas, Penang, Malaysia.
| |
Collapse
|
22
|
PRIME-3D2D is a 3D2D model to predict binding sites of protein-RNA interaction. Commun Biol 2020; 3:384. [PMID: 32678300 PMCID: PMC7366699 DOI: 10.1038/s42003-020-1114-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 06/29/2020] [Indexed: 11/08/2022] Open
Abstract
Protein-RNA interaction participates in many biological processes. So, studying protein–RNA interaction can help us to understand the function of protein and RNA. Although the protein–RNA 3D3D model, like PRIME, was useful in building 3D structural complexes, it can’t be used genome-wide, due to lacking RNA 3D structures. To take full advantage of RNA secondary structures revealed from high-throughput sequencing, we present PRIME-3D2D to predict binding sites of protein–RNA interaction. PRIME-3D2D is almost as good as PRIME at modeling protein–RNA complexes. PRIME-3D2D can be used to predict binding sites on PDB data (MCC = 0.75/0.70 for binding sites in protein/RNA) and transcription-wide (MCC = 0.285 for binding sites in RNA). Testing on PDB and yeast transcription-wide data show that PRIME-3D2D performs better than other binding sites predictor. So, PRIME-3D2D can be used to predict the binding sites both on PDB and genome-wide, and it’s freely available. Xie et al. report a new computational method PRIME-3D2D to predict binding sites of protein–RNA interaction by considering protein 3D structure and RNA 2D structure. It is freely available, performs better than other binding sites predictor and is as good as PRIME to model protein–RNA complex.
Collapse
|
23
|
Song J, Tian S, Yu L, Xing Y, Yang Q, Duan X, Dai Q. AC-Caps: Attention Based Capsule Network for Predicting RBP Binding Sites of LncRNA. Interdiscip Sci 2020; 12:414-423. [PMID: 32572768 DOI: 10.1007/s12539-020-00379-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Revised: 05/18/2020] [Accepted: 05/30/2020] [Indexed: 01/03/2023]
Abstract
Long non-coding RNA(lncRNA) is one of the non-coding RNAs longer than 200 nucleotides and it has no protein encoding function. LncRNA plays a key role in many biological processes. Studying the RNA-binding protein (RBP) binding sites on the lncRNA chain helps to reveal epigenetic and post-transcriptional mechanisms, to explore the physiological and pathological processes of cancer, and to discover new therapeutic breakthroughs. To improve the recognition rate of RBP binding sites and reduce the experimental time and cost, many calculation methods based on domain knowledge to predict RBP binding sites have emerged. However, these prediction methods are independent of nucleotides and do not take into account nucleotide statistics. In this paper, we use a high-order statistical-based encoding scheme, then the encoded lncRNA sequences are fed into a hybrid deep learning architecture named AC-Caps. It consists of a joint processing layer(composed of attention mechanism and convolutional neural network) and a capsule network. The AC-Caps model was evaluated using 31 independent experimental data sets from 12 lncRNA-binding proteins. In experiments, our method achieves excellent performance, with an average area under the curve (AUC) of 0.967 and an average accuracy (ACC) of 92.5%, which are 0.014, 2.3%, 0.261, 28.9%, 0.189, and 21.8% higher than HOCCNNLB, iDeepS, and DeepBind, respectively. The results show that the AC-Caps method can reliably process the large-scale RBP binding site data on the lncRNA chain, and the prediction performance is better than existing deep-learning models. The source code of AC-Caps and the datasets used in this paper are available at https://github.com/JinmiaoS/AC-Caps .
Collapse
Affiliation(s)
- Jinmiao Song
- School of Information Science and Engineering, Xinjiang University, Urumqi, 830008, China
- Dalian Key Lab of Digital Technology for National Culture, Dalian Minzu University, Dalian, 116600, China
| | - Shengwei Tian
- School of Software, Xinjiang University, Urumqi, 830046, China.
| | - Long Yu
- Network Center, Xinjiang University, Urumqi, 830046, China
| | - Yan Xing
- Imaging Center, Xinjiang Medical University Affiliated First Hospital, Urumqi, 830011, China.
| | - Qimeng Yang
- School of Information Science and Engineering, Xinjiang University, Urumqi, 830008, China
| | - Xiaodong Duan
- Dalian Key Lab of Digital Technology for National Culture, Dalian Minzu University, Dalian, 116600, China
| | - Qiguo Dai
- Dalian Key Lab of Digital Technology for National Culture, Dalian Minzu University, Dalian, 116600, China
| |
Collapse
|
24
|
Qiu J, Bernhofer M, Heinzinger M, Kemper S, Norambuena T, Melo F, Rost B. ProNA2020 predicts protein-DNA, protein-RNA, and protein-protein binding proteins and residues from sequence. J Mol Biol 2020; 432:2428-2443. [PMID: 32142788 DOI: 10.1016/j.jmb.2020.02.026] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 02/17/2020] [Accepted: 02/23/2020] [Indexed: 11/29/2022]
Abstract
The intricate details of how proteins bind to proteins, DNA, and RNA are crucial for the understanding of almost all biological processes. Disease-causing sequence variants often affect binding residues. Here, we described a new, comprehensive system of in silico methods that take only protein sequence as input to predict binding of protein to DNA, RNA, and other proteins. Firstly, we needed to develop several new methods to predict whether or not proteins bind (per-protein prediction). Secondly, we developed independent methods that predict which residues bind (per-residue). Not requiring three-dimensional information, the system can predict the actual binding residue. The system combined homology-based inference with machine learning and motif-based profile-kernel approaches with word-based (ProtVec) solutions to machine learning protein level predictions. This achieved an overall non-exclusive three-state accuracy of 77% ± 1% (±one standard error) corresponding to a 1.8 fold improvement over random (best classification for protein-protein with F1 = 91 ± 0.8%). Standard neural networks for per-residue binding residue predictions appeared best for DNA-binding (Q2 = 81 ± 0.9%) followed by RNA-binding (Q2 = 80 ± 1%) and worst for protein-protein binding (Q2 = 69 ± 0.8%). The new method, dubbed ProNA2020, is available as code through github (https://github.com/Rostlab/ProNA2020.git) and through PredictProtein (www.predictprotein.org).
Collapse
Affiliation(s)
- Jiajun Qiu
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany.
| | - Michael Bernhofer
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany
| | - Michael Heinzinger
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany
| | - Sofie Kemper
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany
| | - Tomas Norambuena
- Molecular Bioinformatics Laboratory, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Francisco Melo
- Molecular Bioinformatics Laboratory, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile; Institute of Biological and Medical Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Burkhard Rost
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; Columbia University, Department of Biochemistry and Molecular Biophysics, 701 West, 168th Street, New York, NY, 10032, USA; Institute of Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany; Germany & Institute for Food and Plant Sciences (WZW) Weihenstephan, Alte Akademie 8, 85354 Freising, Germany
| |
Collapse
|
25
|
Protein-assisted RNA fragment docking (RnaX) for modeling RNA-protein interactions using ModelX. Proc Natl Acad Sci U S A 2019; 116:24568-24573. [PMID: 31732673 PMCID: PMC6900601 DOI: 10.1073/pnas.1910999116] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Protein–RNA interactions, key in biological processes, remained refractory to prediction algorithms. Here we present a new extension of the ModelX tool suite designed for this purpose. RNA–protein complexes in the Protein Data Bank were decomposed into small peptide–oligonucleotide interacting fragment pairs and used as building blocks to assemble big scaffolds representing complex RNA–protein interactions. This method has already been successful for designing DNA–protein and protein–protein interfaces. Areas under the curve up to 0.86 were achieved on binding site prediction showing the accuracy and coverage of our approach over established and in-house benchmarking sets. Together with FoldX protein design tool suite we were able to engineer backbone- and side chain-compatible interfaces using naked protein structures as input. RNA–protein interactions are crucial for such key biological processes as regulation of transcription, splicing, translation, and gene silencing, among many others. Knowing where an RNA molecule interacts with a target protein and/or engineering an RNA molecule to specifically bind to a protein could allow for rational interference with these cellular processes and the design of novel therapies. Here we present a robust RNA–protein fragment pair-based method, termed RnaX, to predict RNA-binding sites. This methodology, which is integrated into the ModelX tool suite (http://modelx.crg.es), takes advantage of the structural information present in all released RNA–protein complexes. This information is used to create an exhaustive database for docking and a statistical forcefield for fast discrimination of true backbone-compatible interactions. RnaX, together with the protein design forcefield FoldX, enables us to predict RNA–protein interfaces and, when sufficient crystallographic information is available, to reengineer the interface at the sequence-specificity level by mimicking those conformational changes that occur on protein and RNA mutagenesis. These results, obtained at just a fraction of the computational cost of methods that simulate conformational dynamics, open up perspectives for the engineering of RNA–protein interfaces.
Collapse
|
26
|
Sagar A, Xue B. Recent Advances in Machine Learning Based Prediction of RNA-protein Interactions. Protein Pept Lett 2019; 26:601-619. [PMID: 31215361 DOI: 10.2174/0929866526666190619103853] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 04/04/2019] [Accepted: 06/01/2019] [Indexed: 12/18/2022]
Abstract
The interactions between RNAs and proteins play critical roles in many biological processes. Therefore, characterizing these interactions becomes critical for mechanistic, biomedical, and clinical studies. Many experimental methods can be used to determine RNA-protein interactions in multiple aspects. However, due to the facts that RNA-protein interactions are tissuespecific and condition-specific, as well as these interactions are weak and frequently compete with each other, those experimental techniques can not be made full use of to discover the complete spectrum of RNA-protein interactions. To moderate these issues, continuous efforts have been devoted to developing high quality computational techniques to study the interactions between RNAs and proteins. Many important progresses have been achieved with the application of novel techniques and strategies, such as machine learning techniques. Especially, with the development and application of CLIP techniques, more and more experimental data on RNA-protein interaction under specific biological conditions are available. These CLIP data altogether provide a rich source for developing advanced machine learning predictors. In this review, recent progresses on computational predictors for RNA-protein interaction were summarized in the following aspects: dataset, prediction strategies, and input features. Possible future developments were also discussed at the end of the review.
Collapse
Affiliation(s)
- Amit Sagar
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620, United States
| | - Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620, United States
| |
Collapse
|
27
|
Vanmeert M, Razzokov J, Mirza MU, Weeks SD, Schepers G, Bogaerts A, Rozenski J, Froeyen M, Herdewijn P, Pinheiro VB, Lescrinier E. Rational design of an XNA ligase through docking of unbound nucleic acids to toroidal proteins. Nucleic Acids Res 2019; 47:7130-7142. [PMID: 31334814 PMCID: PMC6649754 DOI: 10.1093/nar/gkz551] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 05/24/2019] [Accepted: 06/12/2019] [Indexed: 02/06/2023] Open
Abstract
Xenobiotic nucleic acids (XNA) are nucleic acid analogues not present in nature that can be used for the storage of genetic information. In vivo XNA applications could be developed into novel biocontainment strategies, but are currently limited by the challenge of developing XNA processing enzymes such as polymerases, ligases and nucleases. Here, we present a structure-guided modelling-based strategy for the rational design of those enzymes essential for the development of XNA molecular biology. Docking of protein domains to unbound double-stranded nucleic acids is used to generate a first approximation of the extensive interaction of nucleic acid processing enzymes with their substrate. Molecular dynamics is used to optimise that prediction allowing, for the first time, the accurate prediction of how proteins that form toroidal complexes with nucleic acids interact with their substrate. Using the Chlorella virus DNA ligase as a proof of principle, we recapitulate the ligase's substrate specificity and successfully predict how to convert it into an XNA-templated XNA ligase.
Collapse
Affiliation(s)
- Michiel Vanmeert
- Medicinal Chemistry, Rega Institute for Medical Research, KU Leuven, Herestraat 49, box 1041, 3000 Leuven, Belgium
| | - Jamoliddin Razzokov
- Research group PLASMANT, Department of Chemistry, University of Antwerp, Universiteitsplein 1, B-2610 Antwerp, Belgium
| | - Muhammad Usman Mirza
- Medicinal Chemistry, Rega Institute for Medical Research, KU Leuven, Herestraat 49, box 1041, 3000 Leuven, Belgium
- Centre for Research in Molecular Medicine (CRiMM), University of Lahore, Pakistan
| | - Stephen D Weeks
- Biocrystallography, KU Leuven, Herestraat 49, box 822, 3000 Leuven, Belgium
| | - Guy Schepers
- Medicinal Chemistry, Rega Institute for Medical Research, KU Leuven, Herestraat 49, box 1041, 3000 Leuven, Belgium
| | - Annemie Bogaerts
- Research group PLASMANT, Department of Chemistry, University of Antwerp, Universiteitsplein 1, B-2610 Antwerp, Belgium
| | - Jef Rozenski
- Medicinal Chemistry, Rega Institute for Medical Research, KU Leuven, Herestraat 49, box 1041, 3000 Leuven, Belgium
| | - Mathy Froeyen
- Medicinal Chemistry, Rega Institute for Medical Research, KU Leuven, Herestraat 49, box 1041, 3000 Leuven, Belgium
| | - Piet Herdewijn
- Medicinal Chemistry, Rega Institute for Medical Research, KU Leuven, Herestraat 49, box 1041, 3000 Leuven, Belgium
| | - Vitor B Pinheiro
- Medicinal Chemistry, Rega Institute for Medical Research, KU Leuven, Herestraat 49, box 1041, 3000 Leuven, Belgium
- University College London, Department of Structural and Molecular Biology, Gower Street, London, WC1E 6BT, UK
| | - Eveline Lescrinier
- Medicinal Chemistry, Rega Institute for Medical Research, KU Leuven, Herestraat 49, box 1041, 3000 Leuven, Belgium
| |
Collapse
|
28
|
Katuwawala A, Ghadermarzi S, Kurgan L. Computational prediction of functions of intrinsically disordered regions. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2019; 166:341-369. [PMID: 31521235 DOI: 10.1016/bs.pmbts.2019.04.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Intrinsically disorder regions (IDRs) are abundant in nature, particularly among Eukaryotes. While they facilitate a wide spectrum of cellular functions including signaling, molecular assembly and recognition, translation, transcription and regulation, only several hundred IDRs are annotated functionally. This annotation gap motivates the development of fast and accurate computational methods that predict IDR functions directly from protein sequences. We introduce and describe a comprehensive collection of 25 methods that provide accurate predictions of IDRs that interact with proteins and nucleic acids, that function as flexible linkers and that moonlight multiple functions. Virtually all of these predictors can be accessed online and many were developed in the last few years. They utilize a wide range of predictive architectures and take advantage of modern machine learning algorithms. Our empirical analysis shows that predictors that are available as webservers enjoy high rates of citations, attesting to their practical value and popularity. The most cited methods include DISOPRED3, ANCHOR, alpha-MoRFpred, MoRFpred, fMoRFpred and MoRFCHiBi. We present two case studies to demonstrate that predictions produced by these computational tools are relatively easy to interpret and that they deliver valuable functional clues. However, the current computational tools cover a relatively narrow range of disorder functions. Further development efforts that would cover a broader range of functions should be pursued. We demonstrate that a sufficient amount of functionally annotated IDRs that are associated with several other disorder functions is already available and can be used to design and validate novel predictors.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States.
| |
Collapse
|
29
|
Insights into Telomerase/hTERT Alternative Splicing Regulation Using Bioinformatics and Network Analysis in Cancer. Cancers (Basel) 2019; 11:cancers11050666. [PMID: 31091669 PMCID: PMC6562651 DOI: 10.3390/cancers11050666] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2019] [Revised: 05/10/2019] [Accepted: 05/13/2019] [Indexed: 01/08/2023] Open
Abstract
The reactivation of telomerase in cancer cells remains incompletely understood. The catalytic component of telomerase, hTERT, is thought to be the limiting component in cancer cells for the formation of active enzymes. hTERT gene expression is regulated at several levels including chromatin, DNA methylation, transcription factors, and RNA processing events. Of these regulatory events, RNA processing has received little attention until recently. RNA processing and alternative splicing regulation have been explored to understand how hTERT is regulated in cancer cells. The cis- and trans-acting factors that regulate the alternative splicing choice of hTERT in the reverse transcriptase domain have been investigated. Further, it was discovered that the splicing factors that promote the production of full-length hTERT were also involved in cancer cell growth and survival. The goals are to review telomerase regulation via alternative splicing and the function of hTERT splicing variants and to point out how bioinformatics approaches are leading the way in elucidating the networks that regulate hTERT splicing choice and ultimately cancer growth.
Collapse
|
30
|
Poursheikhali Asghari M, Abdolmaleki P. Prediction of RNA- and DNA-Binding Proteins Using Various Machine Learning Classifiers. Avicenna J Med Biotechnol 2019; 11:104-111. [PMID: 30800250 PMCID: PMC6359699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Nucleic acid-binding proteins play major roles in different biological processes, such as transcription, splicing and translation. Therefore, the nucleic acid-binding function prediction of proteins is a step toward full functional annotation of proteins. The aim of our research was the improvement of nucleic-acid binding function prediction. METHODS In the current study, nine machine-learning algorithms were used to predict RNA- and DNA-binding proteins and also to discriminate between RNA-binding proteins and DNA-binding proteins. The electrostatic features were utilized for prediction of each function in corresponding adapted protein datasets. The leave-one-out cross-validation process was used to measure the performance of employed classifiers. RESULTS Radial basis function classifier gave the best results in predicting RNA- and DNA-binding proteins in comparison with other classifiers applied. In discriminating between RNA- and DNA-binding proteins, multilayer perceptron classifier was the best one. CONCLUSION Our findings show that the prediction of nucleic acid-binding function based on these simple electrostatic features can be improved by applied classifiers. Moreover, a reasonable progress to distinguish between RNA- and DNA-binding proteins has been achieved.
Collapse
Affiliation(s)
| | - Parviz Abdolmaleki
- Corresponding author: Parviz Abdolmaleki, Ph.D., Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran, Tel: +98 21 82883404, Fax: +98 21 82884457, E-mail:,
| |
Collapse
|
31
|
Song J, Liu G, Wang R, Sun L, Zhang P. A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods. BIOTECHNOL BIOTEC EQ 2019. [DOI: 10.1080/13102818.2019.1612275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
Affiliation(s)
- Jiazhi Song
- Department of Computational intelligence College of Computer Science and Technology, Jilin University, Changchun, PR China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, PR China
| | - Guixia Liu
- Department of Computational intelligence College of Computer Science and Technology, Jilin University, Changchun, PR China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, PR China
| | - Rongquan Wang
- Department of Computational intelligence College of Computer Science and Technology, Jilin University, Changchun, PR China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, PR China
| | - Liyan Sun
- Department of Computational intelligence College of Computer Science and Technology, Jilin University, Changchun, PR China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, PR China
| | - Ping Zhang
- Department of Computational intelligence College of Computer Science and Technology, Jilin University, Changchun, PR China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, PR China
| |
Collapse
|
32
|
Jung Y, El-Manzalawy Y, Dobbs D, Honavar VG. Partner-specific prediction of RNA-binding residues in proteins: A critical assessment. Proteins 2018; 87:198-211. [PMID: 30536635 PMCID: PMC6389706 DOI: 10.1002/prot.25639] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 10/10/2018] [Accepted: 11/29/2018] [Indexed: 01/06/2023]
Abstract
RNA-protein interactions play essential roles in regulating gene expression. While some RNA-protein interactions are "specific", that is, the RNA-binding proteins preferentially bind to particular RNA sequence or structural motifs, others are "non-RNA specific." Deciphering the protein-RNA recognition code is essential for comprehending the functional implications of these interactions and for developing new therapies for many diseases. Because of the high cost of experimental determination of protein-RNA interfaces, there is a need for computational methods to identify RNA-binding residues in proteins. While most of the existing computational methods for predicting RNA-binding residues in RNA-binding proteins are oblivious to the characteristics of the partner RNA, there is growing interest in methods for partner-specific prediction of RNA binding sites in proteins. In this work, we assess the performance of two recently published partner-specific protein-RNA interface prediction tools, PS-PRIP, and PRIdictor, along with our own new tools. Specifically, we introduce a novel metric, RNA-specificity metric (RSM), for quantifying the RNA-specificity of the RNA binding residues predicted by such tools. Our results show that the RNA-binding residues predicted by previously published methods are oblivious to the characteristics of the putative RNA binding partner. Moreover, when evaluated using partner-agnostic metrics, RNA partner-specific methods are outperformed by the state-of-the-art partner-agnostic methods. We conjecture that either (a) the protein-RNA complexes in PDB are not representative of the protein-RNA interactions in nature, or (b) the current methods for partner-specific prediction of RNA-binding residues in proteins fail to account for the differences in RNA partner-specific versus partner-agnostic protein-RNA interactions, or both.
Collapse
Affiliation(s)
- Yong Jung
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania.,Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, Pennsylvania.,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania
| | - Yasser El-Manzalawy
- Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, Pennsylvania.,Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, Pennsylvania.,College of Information Sciences and Technology, Pennsylvania State University, Pennsylvania
| | - Drena Dobbs
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, Iowa.,Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, Iowa
| | - Vasant G Honavar
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania.,Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, Pennsylvania.,Institute for Cyberscience, Pennsylvania State University, University Park, Pennsylvania.,Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, Pennsylvania.,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania.,College of Information Sciences and Technology, Pennsylvania State University, Pennsylvania
| |
Collapse
|
33
|
Hu W, Qin L, Li M, Pu X, Guo Y. Individually double minimum-distance definition of protein-RNA binding residues and application to structure-based prediction. J Comput Aided Mol Des 2018; 32:1363-1373. [PMID: 30478757 DOI: 10.1007/s10822-018-0177-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Accepted: 11/14/2018] [Indexed: 01/01/2023]
Abstract
Identifying protein-RNA binding residues is essential for understanding the mechanism of protein-RNA interactions. So far, rigid distance thresholds are commonly used to define protein-RNA binding residues. However, after investigating 182 non-redundant protein-RNA complexes, we find that it would be unsuitable for a certain amount of complexes since the distances between proteins and RNAs vary widely. In this work, a novel definition method was proposed based on a flexible distance cutoff. This method can fully consider the individual differences among complexes by setting a variable tolerance limit of protein-RNA interactions, i.e. the double minimum-distance by which different distance thresholds are achieved for different complexes. In order to validate our method, a comprehensive comparison between our flexible method and traditional rigid methods was implemented in terms of interface structure, amino acid composition, interface area and interaction force, etc. The results indicate that this method is more reasonable because it incorporates the specificity of different complexes by extracting the important residues lost by rigid distance methods and discarding some redundant residues. Finally, to further test our double minimum-distance definition strategy, we developed a classifier to predict those binding sites derived from our new method by using structural features and a random forest machine learning algorithm. The model achieved a satisfactory prediction performance and the accuracy on independent data sets reaches to 85.0%. To the best of our knowledge, it is the first prediction model to define positive and negative samples using a flexible cutoff. So the comparison analysis and modeling results have demonstrated that our method would be a very promising strategy for more precisely defining protein-RNA binding sites.
Collapse
Affiliation(s)
- Wen Hu
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, People's Republic of China
| | - Liu Qin
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, People's Republic of China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, People's Republic of China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, People's Republic of China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, People's Republic of China.
| |
Collapse
|
34
|
Moore KS, 't Hoen PAC. Computational approaches for the analysis of RNA-protein interactions: A primer for biologists. J Biol Chem 2018; 294:1-9. [PMID: 30455357 DOI: 10.1074/jbc.rev118.004842] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
RNA-binding proteins (RBPs) play important roles in the control of gene expression and the coordination of different layers of post-transcriptional regulation. Interactions between certain RBPs and mRNA transcripts are notoriously difficult to predict, as any given protein-RNA interaction may rely not only on RNA sequence, but also on three-dimensional RNA structures, competitive inhibition from other RBPs, and input from cellular signaling pathways. Advanced and high-throughput technologies for the identification of RNA-protein interactions have come to the rescue, but the identification of binding sites and downstream functional effects of RBPs from the resulting data can be challenging. In this review, we discuss statistical inference and machine-learning approaches and tools relevant for the study of RBPs and the analysis of large-scale RNA-protein interaction datasets. This primer is intended for life scientists who are interested in incorporating these tools into their own research. We begin with the demystification of regression models, as used in the analysis of next-generation sequencing data, and progress to a discussion of Hidden Markov Models, which are of particular value in analyzing cross-linking followed by immunoprecipitation data. We then continue with examples of machine learning techniques, such as support vector machines and gradient tree boosting. We close with a brief discussion of current trends in the field, including deep learning architectures.
Collapse
Affiliation(s)
- Kat S Moore
- Department of Hematopoiesis, Sanquin, and Landsteiner Laboratory AMC/UvA, 1066 CX Amsterdam
| | - Peter A C 't Hoen
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, 6500 HB Nijmegen, The Netherlands.
| |
Collapse
|
35
|
Abbasi WA, Asif A, Ben-Hur A, Minhas FUAA. Learning protein binding affinity using privileged information. BMC Bioinformatics 2018; 19:425. [PMID: 30442086 PMCID: PMC6238365 DOI: 10.1186/s12859-018-2448-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Accepted: 10/25/2018] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Determining protein-protein interactions and their binding affinity are important in understanding cellular biological processes, discovery and design of novel therapeutics, protein engineering, and mutagenesis studies. Due to the time and effort required in wet lab experiments, computational prediction of binding affinity from sequence or structure is an important area of research. Structure-based methods, though more accurate than sequence-based techniques, are limited in their applicability due to limited availability of protein structure data. RESULTS In this study, we propose a novel machine learning method for predicting binding affinity that uses protein 3D structure as privileged information at training time while expecting only protein sequence information during testing. Using the method, which is based on the framework of learning using privileged information (LUPI), we have achieved improved performance over corresponding sequence-based binding affinity prediction methods that do not have access to privileged information during training. Our experiments show that with the proposed framework which uses structure only during training, it is possible to achieve classification performance comparable to that which is obtained using structure-based features. Evaluation on an independent test set shows improved performance over the PPA-Pred2 method as well. CONCLUSIONS The proposed method outperforms several baseline learners and a state-of-the-art binding affinity predictor not only in cross-validation, but also on an additional validation dataset, demonstrating the utility of the LUPI framework for problems that would benefit from classification using structure-based features. The implementation of LUPI developed for this work is expected to be useful in other areas of bioinformatics as well.
Collapse
Affiliation(s)
- Wajid Arshad Abbasi
- Biomedical Informatics Research Laboratory (BIRL), Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, ISL, 45650, Pakistan
- Information Technology Center (ITC), University of Azad Jammu & Kashmir, Muzaffarabad, Azad Kashmir, 13100, Pakistan
- Department of Computer Science, Colorado State University (CSU), Fort Collins, CO, 80523, USA
| | - Amina Asif
- Biomedical Informatics Research Laboratory (BIRL), Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, ISL, 45650, Pakistan
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University (CSU), Fort Collins, CO, 80523, USA.
| | - Fayyaz Ul Amir Afsar Minhas
- Biomedical Informatics Research Laboratory (BIRL), Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, ISL, 45650, Pakistan.
| |
Collapse
|
36
|
Zagrovic B, Bartonek L, Polyansky AA. RNA-protein interactions in an unstructured context. FEBS Lett 2018; 592:2901-2916. [PMID: 29851074 PMCID: PMC6175095 DOI: 10.1002/1873-3468.13116] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Revised: 05/12/2018] [Accepted: 05/13/2018] [Indexed: 02/02/2023]
Abstract
Despite their importance, our understanding of noncovalent RNA-protein interactions is incomplete. This especially concerns the binding between RNA and unstructured protein regions, a widespread class of such interactions. Here, we review the recent experimental and computational work on RNA-protein interactions in an unstructured context with a particular focus on how such interactions may be shaped by the intrinsic interaction affinities between individual nucleobases and protein side chains. Specifically, we articulate the claim that the universal genetic code reflects the binding specificity between nucleobases and protein side chains and that, in turn, the code may be seen as the Rosetta stone for understanding RNA-protein interactions in general.
Collapse
Affiliation(s)
- Bojan Zagrovic
- Department of Structural and Computational BiologyMax F. Perutz LaboratoriesUniversity of ViennaAustria
| | - Lukas Bartonek
- Department of Structural and Computational BiologyMax F. Perutz LaboratoriesUniversity of ViennaAustria
| | - Anton A. Polyansky
- Department of Structural and Computational BiologyMax F. Perutz LaboratoriesUniversity of ViennaAustria,MM Shemyakin and Yu A Ovchinnikov Institute of Bioorganic ChemistryRussian Academy of SciencesMoscowRussia
| |
Collapse
|
37
|
Chen F, Sun H, Wang J, Zhu F, Liu H, Wang Z, Lei T, Li Y, Hou T. Assessing the performance of MM/PBSA and MM/GBSA methods. 8. Predicting binding free energies and poses of protein-RNA complexes. RNA (NEW YORK, N.Y.) 2018; 24:1183-1194. [PMID: 29930024 PMCID: PMC6097651 DOI: 10.1261/rna.065896.118] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 06/13/2018] [Indexed: 05/10/2023]
Abstract
Molecular docking provides a computationally efficient way to predict the atomic structural details of protein-RNA interactions (PRI), but accurate prediction of the three-dimensional structures and binding affinities for PRI is still notoriously difficult, partly due to the unreliability of the existing scoring functions for PRI. MM/PBSA and MM/GBSA are more theoretically rigorous than most scoring functions for protein-RNA docking, but their prediction performance for protein-RNA systems remains unclear. Here, we systemically evaluated the capability of MM/PBSA and MM/GBSA to predict the binding affinities and recognize the near-native binding structures for protein-RNA systems with different solvent models and interior dielectric constants (εin). For predicting the binding affinities, the predictions given by MM/GBSA based on the minimized structures in explicit solvent and the GBGBn1 model with εin = 2 yielded the highest correlation with the experimental data. Moreover, the MM/GBSA calculations based on the minimized structures in implicit solvent and the GBGBn1 model distinguished the near-native binding structures within the top 10 decoys for 117 out of the 148 protein-RNA systems (79.1%). This performance is better than all docking scoring functions studied here. Therefore, the MM/GBSA rescoring is an efficient way to improve the prediction capability of scoring functions for protein-RNA systems.
Collapse
Affiliation(s)
- Fu Chen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China
- College of Life and Environmental Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Huiyong Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Junmei Wang
- Department of Pharmaceutical Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, USA
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Hui Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Zhe Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Tailong Lei
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Youyong Li
- Institute of Functional Nano and Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
38
|
Krüger A, Zimbres FM, Kronenberger T, Wrenger C. Molecular Modeling Applied to Nucleic Acid-Based Molecule Development. Biomolecules 2018; 8:E83. [PMID: 30150587 PMCID: PMC6163985 DOI: 10.3390/biom8030083] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 08/12/2018] [Accepted: 08/16/2018] [Indexed: 12/15/2022] Open
Abstract
Molecular modeling by means of docking and molecular dynamics (MD) has become an integral part of early drug discovery projects, enabling the screening and enrichment of large libraries of small molecules. In the past decades, special emphasis was drawn to nucleic acid (NA)-based molecules in the fields of therapy, diagnosis, and drug delivery. Research has increased dramatically with the advent of the SELEX (systematic evolution of ligands by exponential enrichment) technique, which results in single-stranded DNA or RNA sequences that bind with high affinity and specificity to their targets. Herein, we discuss the role and contribution of docking and MD to the development and optimization of new nucleic acid-based molecules. This review focuses on the different approaches currently available for molecular modeling applied to NA interaction with proteins. We discuss topics ranging from structure prediction to docking and MD, highlighting their main advantages and limitations and the influence of flexibility on their calculations.
Collapse
Affiliation(s)
- Arne Krüger
- Unit for Drug Discovery, Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, SP 05508-000, Brazil.
| | - Flávia M Zimbres
- Department of Biochemistry and Molecular Biology and Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA.
| | - Thales Kronenberger
- Department of Internal Medicine VIII, University Hospital of Tübingen, 72076 Tübingen, Germany.
| | - Carsten Wrenger
- Unit for Drug Discovery, Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, SP 05508-000, Brazil.
| |
Collapse
|
39
|
Chowdhury S, Zhang J, Kurgan L. In Silico Prediction and Validation of Novel RNA Binding Proteins and Residues in the Human Proteome. Proteomics 2018; 18:e1800064. [PMID: 29806170 DOI: 10.1002/pmic.201800064] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Revised: 05/05/2018] [Indexed: 12/22/2022]
Abstract
Deciphering a complete landscape of protein-RNA interactions in the human proteome remains an elusive challenge. We computationally elucidate RNA binding proteins (RBPs) using an approach that complements previous efforts. We employ two modern complementary sequence-based methods that provide accurate predictions from the structured and the intrinsically disordered sequences, even in the absence of sequence similarity to the known RBPs. We generate and analyze putative RNA binding residues on the whole proteome scale. Using a conservative setting that ensures low, 5% false positive rate, we identify 1511 putative RBPs that include 281 known RBPs and 166 RBPs that were previously predicted. We empirically demonstrate that these overlaps are statistically significant. We also validate the putative RBPs based on two major hallmarks of their RNA binding residues: high levels of evolutionary conservation and enrichment in charged amino acids. Moreover, we show that the novel RBPs are significantly under-annotated functionally which coincides with the fact that they were not yet found to interact with RNAs. We provide two examples of our novel putative RBPs for which there is recent evidence of their interactions with RNAs. The dataset of novel putative RBPs and RNA binding residues for the future hypothesis generation is provided in the Supporting Information.
Collapse
Affiliation(s)
- Shomeek Chowdhury
- Dr. Vikram Sarabhai Institute of Cell and Molecular Biology, Maharaja Sayajirao University of Baroda, Gujarat, 390005, India.,Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - Jian Zhang
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA.,School of Computer and Information Technology, Xinyang Normal University, Xinyang, 464000, P. R. China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
40
|
Transcriptome-wide discovery of coding and noncoding RNA-binding proteins. Proc Natl Acad Sci U S A 2018; 115:E3879-E3887. [PMID: 29636419 DOI: 10.1073/pnas.1718406115] [Citation(s) in RCA: 110] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Transcriptome-wide identification of RNA-binding proteins (RBPs) is a prerequisite for understanding the posttranscriptional gene regulation networks. However, proteomic profiling of RBPs has been mostly limited to polyadenylated mRNA-binding proteins, leaving RBPs on nonpoly(A) RNAs, including most noncoding RNAs (ncRNAs) and pre-mRNAs, largely undiscovered. Here we present a click chemistry-assisted RNA interactome capture (CARIC) strategy, which enables unbiased identification of RBPs, independent of the polyadenylation state of RNAs. CARIC combines metabolic labeling of RNAs with an alkynyl uridine analog and in vivo RNA-protein photocross-linking, followed by click reaction with azide-biotin, affinity enrichment, and proteomic analysis. Applying CARIC, we identified 597 RBPs in HeLa cells, including 130 previously unknown RBPs. These newly discovered RBPs can likely bind ncRNAs, thus uncovering potential involvement of ncRNAs in processes previously unknown to be ncRNA-related, such as proteasome function and intermediary metabolism. The CARIC strategy should be broadly applicable across various organisms to complete the census of RBPs.
Collapse
|
41
|
Gomes CP, Salgado-Somoza A, Creemers EE, Dieterich C, Lustrek M, Devaux Y. Circular RNAs in the cardiovascular system. Noncoding RNA Res 2018; 3:1-11. [PMID: 30159434 PMCID: PMC6084836 DOI: 10.1016/j.ncrna.2018.02.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 01/16/2018] [Accepted: 02/22/2018] [Indexed: 02/06/2023] Open
Abstract
Until recently considered as rare, circular RNAs (circRNAs) are emerging as important regulators of gene expression. They are ubiquitously expressed and represent a novel branch of the family of non-coding RNAs. Recent investigations showed that circRNAs are regulated in the cardiovascular system and participate in its physiological and pathological development. In this review article, we will provide an overview of the role of circRNAs in cardiovascular health and disease. After a description of the biogenesis of circRNAs, we will summarize what is known of the expression, regulation and function of circRNAs in the cardiovascular system. We will then address some technical aspects of circRNAs research, discussing how artificial intelligence may aid in circRNAs research. Finally, the potential of circRNAs as biomarkers of cardiovascular disease will be addressed and directions for future research will be proposed.
Collapse
Key Words
- Artificial intelligence
- Biomarker
- CRISPR, clustered regularly interspaced short palindromic repeats
- CV, cardiovascular
- Cardiovascular disease
- Cardiovascular system
- Circular RNAs
- DCM, dilated cardiomyopathy
- EMT, epithelial-mesenchymal transition
- Non-coding RNAs
- RNA-seq, RNA sequencing
- RPAD, RNase R treatment followed by polyadenylation and poly(A)+ RNA depletion
- RT-qPCR, reverse transcription quantitative polymerase chain reaction
- circRNAs, circular RNAs
- lncRNAs, long non-coding RNAs
- miRNAs, microRNAs
- ncRNAs, non-coding RNAs
Collapse
Affiliation(s)
- Clarissa P.C. Gomes
- Cardiovascular Research Unit, Luxembourg Institute of Health, Luxembourg, Luxembourg
| | | | - Esther E. Creemers
- Experimental Cardiology, Academic Medical Center, Amsterdam, The Netherlands
| | - Christoph Dieterich
- German Center for Cardiovascular Research, University Hospital Heidelberg, Heidelberg, Germany
| | - Mitja Lustrek
- Department of Intelligent Systems, Jozef Stefan Institute, Ljubljana, Slovenia
| | - Yvan Devaux
- Cardiovascular Research Unit, Luxembourg Institute of Health, Luxembourg, Luxembourg
| | | |
Collapse
|
42
|
Hu W, Qin L, Li M, Pu X, Guo Y. A structural dissection of protein–RNA interactions based on different RNA base areas of interfaces. RSC Adv 2018; 8:10582-10592. [PMID: 35540439 PMCID: PMC9078961 DOI: 10.1039/c8ra00598b] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2018] [Accepted: 03/05/2018] [Indexed: 11/21/2022] Open
Abstract
Protein–RNA interactions are very common cellular processes, but the mechanisms of interactions are not fully understood, mainly due to the complicated RNA structures. By the elaborate investigation on RNA structures of protein–RNA complexes, it was firstly found in this paper that RNAs in these complexes could be clearly classified into three classes (high, medium and low) based on the different levels of Pbase (the percentage of base area buried in the RNA interface). In view of the three RNA classes, more detailed analyses on protein–RNA interactions were comprehensively performed from various aspects, including interface area, structure, composition and interaction force, so as to achieve a deeper understanding of the recognition specificity for the three classes of protein–RNA interactions. According to our classification strategy, the three complex classes have significant differences in terms of almost all properties. Complexes in the high class have short and extended RNA structures and behave like protein–ssDNA interactions. Their hydrogen bonds and hydrophobic interactions are strong. For complexes in low class, their RNA structures are mainly double-stranded, like protein–dsDNA interactions, and electrostatic interactions frequently occur. The complexes in medium class have the longest RNA chains and largest average interface area. Meanwhile, they do not show any preference for the interaction force. On average, in terms of composition, secondary structures and intermolecular physicochemical properties, significant feature preferences can be observed in high and low complexes, but no highly specific features are found for medium complexes. We found that our proposed Pbase is an important parameter which can be used as a new determinant to distinguish protein–RNA complexes. For high and low complexes, we can more easily understand the specificity of the recognition process from the interface features than for medium complexes. In the future, medium complexes should be our research focus to further structurally analyze from more feature aspects. Overall, this study may contribute to further understanding of the mechanism of protein–RNA interactions on a more detailed level. Qualitative and quantitative measurements of the influence of structure and composition of RNA interfaces on protein–RNA interactions.![]()
Collapse
Affiliation(s)
- Wen Hu
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| | - Liu Qin
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| | - Menglong Li
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| | - Xuemei Pu
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| | - Yanzhi Guo
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| |
Collapse
|
43
|
Sharan M, Förstner KU, Eulalio A, Vogel J. APRICOT: an integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins. Nucleic Acids Res 2017; 45:e96. [PMID: 28334975 PMCID: PMC5499795 DOI: 10.1093/nar/gkx137] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 02/27/2017] [Indexed: 11/14/2022] Open
Abstract
RNA-binding proteins (RBPs) have been established as core components of several post-transcriptional gene regulation mechanisms. Experimental techniques such as cross-linking and co-immunoprecipitation have enabled the identification of RBPs, RNA-binding domains (RBDs) and their regulatory roles in the eukaryotic species such as human and yeast in large-scale. In contrast, our knowledge of the number and potential diversity of RBPs in bacteria is poorer due to the technical challenges associated with the existing global screening approaches. We introduce APRICOT, a computational pipeline for the sequence-based identification and characterization of proteins using RBDs known from experimental studies. The pipeline identifies functional motifs in protein sequences using position-specific scoring matrices and Hidden Markov Models of the functional domains and statistically scores them based on a series of sequence-based features. Subsequently, APRICOT identifies putative RBPs and characterizes them by several biological properties. Here we demonstrate the application and adaptability of the pipeline on large-scale protein sets, including the bacterial proteome of Escherichia coli. APRICOT showed better performance on various datasets compared to other existing tools for the sequence-based prediction of RBPs by achieving an average sensitivity and specificity of 0.90 and 0.91 respectively. The command-line tool and its documentation are available at https://pypi.python.org/pypi/bio-apricot.
Collapse
Affiliation(s)
- Malvika Sharan
- Institute of Molecular Infection Biology, University of Würzburg, 97080 Würzburg, Germany
| | - Konrad U Förstner
- Core Unit Systems Medicine, University of Würzburg, 97080 Würzburg, Germany
| | - Ana Eulalio
- Institute of Molecular Infection Biology, University of Würzburg, 97080 Würzburg, Germany
| | - Jörg Vogel
- Institute of Molecular Infection Biology, University of Würzburg, 97080 Würzburg, Germany
| |
Collapse
|
44
|
Tawk C, Sharan M, Eulalio A, Vogel J. A systematic analysis of the RNA-targeting potential of secreted bacterial effector proteins. Sci Rep 2017; 7:9328. [PMID: 28839189 PMCID: PMC5570926 DOI: 10.1038/s41598-017-09527-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 06/27/2017] [Indexed: 12/15/2022] Open
Abstract
Many pathogenic bacteria utilize specialized secretion systems to deliver proteins called effectors into eukaryotic cells for manipulation of host pathways. The vast majority of known effector targets are host proteins, whereas a potential targeting of host nucleic acids remains little explored. There is only one family of effectors known to target DNA directly, and effectors binding host RNA are unknown. Here, we take a two-pronged approach to search for RNA-binding effectors, combining biocomputational prediction of RNA-binding domains (RBDs) in a newly assembled comprehensive dataset of bacterial secreted proteins, and experimental screening for RNA binding in mammalian cells. Only a small subset of effectors were predicted to carry an RBD, indicating that if RNA targeting was common, it would likely involve new types of RBDs. Our experimental evaluation of effectors with predicted RBDs further argues for a general paucity of RNA binding activities amongst bacterial effectors. We obtained evidence that PipB2 and Lpg2844, effector proteins of Salmonella and Legionella species, respectively, may harbor novel biochemical activities. Our study presenting the first systematic evaluation of the RNA-targeting potential of bacterial effectors offers a basis for discussion of whether or not host RNA is a prominent target of secreted bacterial proteins.
Collapse
Affiliation(s)
- Caroline Tawk
- Institute for Molecular Infection Biology, University of Würzburg, Würzburg, Germany
| | - Malvika Sharan
- Institute for Molecular Infection Biology, University of Würzburg, Würzburg, Germany
- European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Ana Eulalio
- Institute for Molecular Infection Biology, University of Würzburg, Würzburg, Germany
| | - Jörg Vogel
- Institute for Molecular Infection Biology, University of Würzburg, Würzburg, Germany.
- Helmholtz Institute for RNA-based Infection Research (HIRI), Würzburg, Germany.
| |
Collapse
|
45
|
Review: Regulation of the cancer epigenome by long non-coding RNAs. Cancer Lett 2017; 407:106-112. [PMID: 28400335 DOI: 10.1016/j.canlet.2017.03.040] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Revised: 03/22/2017] [Accepted: 03/29/2017] [Indexed: 12/31/2022]
Abstract
Long non-coding RNAs have emerged as highly versatile players in the regulation of gene expression in development and human disease, particularly cancer. Hundreds of lncRNAs become dysregulated across tumor types, and multiple lncRNAs have demonstrated functions as tumor-suppressors or oncogenes. Furthermore, studies have demonstrated that dysregulation of lncRNAs results in alterations of the epigenome in cancer cells, potentially providing a novel mechanism for the massive epigenomic alterations observed in many tumors. Here, we highlight and provide some illustrious examples of lncRNAs in various epigenetic regulatory processes, including coordination of chromatin dynamics, regulation of DNA methylation, modulation of other non-coding RNAs and mRNA stability, and control of epigenetic substrate availability through altered tumor metabolism. In light of all these known and emerging functions in epigenetic regulation of tumorigenesis and cancer progression, lncRNAs represent attractive targets for future therapeutic strategies in cancer.
Collapse
|
46
|
Sequence-based discrimination of protein-RNA interacting residues using a probabilistic approach. J Theor Biol 2017; 418:77-83. [DOI: 10.1016/j.jtbi.2017.01.040] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2016] [Revised: 01/06/2017] [Accepted: 01/27/2017] [Indexed: 11/22/2022]
|
47
|
Mihailovic MK, Chen A, Gonzalez-Rivera JC, Contreras LM. Defective Ribonucleoproteins, Mistakes in RNA Processing, and Diseases. Biochemistry 2017; 56:1367-1382. [PMID: 28206738 DOI: 10.1021/acs.biochem.6b01134] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Ribonucleoproteins (RNPs) are vital to many cellular events. To this end, many neurodegenerative diseases and cancers have been linked to RNP malfunction, particularly as this relates to defective processing of cellular RNA. The connection of RNPs and diseases has also propagated a shift of focus onto RNA targeting from traditional protein targeting treatments. However, therapeutic development in this area has been limited by incomplete molecular insight into the specific contributions of RNPs to disease. This review outlines the role of several RNPs in diseases, focusing on molecular defects in processes that affect proper RNA handling in the cell. This work also evaluates the contributions of recently developed methods to understanding RNP association and function. We review progress in this area by focusing on molecular malfunctions of RNPs associated with the onset and progression of several neurodegenerative diseases and cancer and conclude with a brief discussion of RNA-based therapeutic efforts.
Collapse
Affiliation(s)
- Mia K Mihailovic
- McKetta Department of Chemical Engineering, University of Texas at Austin , 200 East. Dean Keeton Street, Stop C0400, Austin, Texas 78712, United States
| | - Angela Chen
- McKetta Department of Chemical Engineering, University of Texas at Austin , 200 East. Dean Keeton Street, Stop C0400, Austin, Texas 78712, United States
| | - Juan C Gonzalez-Rivera
- McKetta Department of Chemical Engineering, University of Texas at Austin , 200 East. Dean Keeton Street, Stop C0400, Austin, Texas 78712, United States
| | - Lydia M Contreras
- McKetta Department of Chemical Engineering, University of Texas at Austin , 200 East. Dean Keeton Street, Stop C0400, Austin, Texas 78712, United States
| |
Collapse
|
48
|
Walia RR, El-Manzalawy Y, Honavar VG, Dobbs D. Sequence-Based Prediction of RNA-Binding Residues in Proteins. Methods Mol Biol 2017; 1484:205-235. [PMID: 27787829 PMCID: PMC5796408 DOI: 10.1007/978-1-4939-6406-2_15] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Identifying individual residues in the interfaces of protein-RNA complexes is important for understanding the molecular determinants of protein-RNA recognition and has many potential applications. Recent technical advances have led to several high-throughput experimental methods for identifying partners in protein-RNA complexes, but determining RNA-binding residues in proteins is still expensive and time-consuming. This chapter focuses on available computational methods for identifying which amino acids in an RNA-binding protein participate directly in contacting RNA. Step-by-step protocols for using three different web-based servers to predict RNA-binding residues are described. In addition, currently available web servers and software tools for predicting RNA-binding sites, as well as databases that contain valuable information about known protein-RNA complexes, RNA-binding motifs in proteins, and protein-binding recognition sites in RNA are provided. We emphasize sequence-based methods that can reliably identify interfacial residues without the requirement for structural information regarding either the RNA-binding protein or its RNA partner.
Collapse
Affiliation(s)
| | - Yasser El-Manzalawy
- College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Vasant G Honavar
- College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Drena Dobbs
- Genetics, Development and Cell Biology Department, Iowa State University, 3112 Molecular Biology Building, Ames, IA, 50011-3650, USA.
| |
Collapse
|
49
|
De-novo protein function prediction using DNA binding and RNA binding proteins as a test case. Nat Commun 2016; 7:13424. [PMID: 27869118 PMCID: PMC5121330 DOI: 10.1038/ncomms13424] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Accepted: 10/03/2016] [Indexed: 12/14/2022] Open
Abstract
Of the currently identified protein sequences, 99.6% have never been observed in the laboratory as proteins and their molecular function has not been established experimentally. Predicting the function of such proteins relies mostly on annotated homologs. However, this has resulted in some erroneous annotations, and many proteins have no annotated homologs. Here we propose a de-novo function prediction approach based on identifying biophysical features that underlie function. Using our approach, we discover DNA and RNA binding proteins that cannot be identified based on homology and validate these predictions experimentally. For example, FGF14, which belongs to a family of secreted growth factors was predicted to bind DNA. We verify this experimentally and also show that FGF14 is localized to the nucleus. Mutating the predicted binding site on FGF14 abrogated DNA binding. These results demonstrate the feasibility of automated de-novo function prediction based on identifying function-related biophysical features. Identification of the function of proteins is difficult when there are no structurally or biochemically characterized homologs. Here, the authors present an approach that allows the prediction of nucleic-acid binding proteins based on sequence alone, and they are able to experimentally validate their method.
Collapse
|
50
|
Abstract
RNA-binding proteins play a variety of roles in cellular physiology. Some regulate mRNA processing, mRNA abundance, and translation efficiency. Some fight off invader RNA through small RNA-driven silencing pathways. Others sense foreign sequences in the form of double-stranded RNA and activate the innate immune response. Yet others, for example cytoplasmic aconitase, act as bi-functional proteins, processing metabolites in one conformation and regulating metabolic gene expression in another. Not all are involved in gene regulation. Some play structural roles, for example, connecting the translational machinery to the endoplasmic reticulum outer membrane. Despite their pervasive role and relative importance, it has remained difficult to identify new RNA-binding proteins in a systematic, unbiased way. A recent body of literature from several independent labs has defined robust, easily adaptable protocols for mRNA interactome discovery. In this review, I summarize the methods and review some of the intriguing findings from their application to a wide variety of biological systems.
Collapse
Affiliation(s)
- Sean P Ryder
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA, 01605, USA
| |
Collapse
|