1
|
Han Y, Zhang SW. Docsubty: FLAncRPI-LGAT: Prediction of ncRNA-Protein Interactions with Line Graph Attention Network Framework. Comput Struct Biotechnol J 2023; 21:2286-2295. [PMID: 37035546 PMCID: PMC10073990 DOI: 10.1016/j.csbj.2023.03.027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 03/11/2023] [Accepted: 03/16/2023] [Indexed: 03/19/2023] Open
Abstract
Identification of ncRNA-protein interactions (ncRPIs) through wet experiments is still time-consuming and highly-costly. Although several computational approaches have been developed to predict ncRPIs using the structure and sequence information of ncRNAs and proteins, the prediction accuracy needs to be improved, and the results lack interpretability. In this work, we proposed a novel computational method (called ncRPI-LGAT) to predict the ncRNA-Protein Interactions by transforming the link prediction (i.e., subgraph classification) task into a node classification task in the line network, and introducing a Line Graph ATtention network framework. ncRPI-LGAT first extracts the ncRNA/protein attributes using node2vec, and then generates the local enclosing subgraph of a target ncRNA-protein pair with SEAL. Because using the pooling operations in local enclosing subgraphs to learn a fixed-size feature vector for representing ncRNAs/proteins will cause the information loss, ncRPI-LGAT converts the local enclosing subgraphs into their corresponding line graphs, in which the node corresponds to the edge (i.e., ncRNA-protein pair) of the local enclosing subgraphs. Then, the attention mechanism-based graph neural network GATv2 is used on these line graphs to efficiently learn the embedding features of the target nodes (i.e., ncRNA-protein pairs) by focusing on learning the significance of one ncRNA-protein pair to another ncRNA-protein pair. These embedding features of one ncRNA-protein pair obtained from multi-head attention are concatenated in series and then fed them into a fully connected network to predict ncRPIs. Compared with other state-of-the-art methods in the 5CV test, ncRPI-LGAT shows superior performance on three benchmark datasets, demonstrating the effectiveness of our ncRPI-LGAT method in predicting ncRNA-protein interactions.
Collapse
|
2
|
Han S, Yang X, Sun H, Yang H, Zhang Q, Peng C, Fang W, Li Y. LION: an integrated R package for effective prediction of ncRNA-protein interaction. Brief Bioinform 2022; 23:6713512. [PMID: 36155620 DOI: 10.1093/bib/bbac420] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/03/2022] [Accepted: 08/30/2022] [Indexed: 12/14/2022] Open
Abstract
Understanding ncRNA-protein interaction is of critical importance to unveil ncRNAs' functions. Here, we propose an integrated package LION which comprises a new method for predicting ncRNA/lncRNA-protein interaction as well as a comprehensive strategy to meet the requirement of customisable prediction. Experimental results demonstrate that our method outperforms its competitors on multiple benchmark datasets. LION can also improve the performance of some widely used tools and build adaptable models for species- and tissue-specific prediction. We expect that LION will be a powerful and efficient tool for the prediction and analysis of ncRNA/lncRNA-protein interaction. The R Package LION is available on GitHub at https://github.com/HAN-Siyu/LION/.
Collapse
Affiliation(s)
- Siyu Han
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, in Jilin University, China
| | - Xiao Yang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Hang Sun
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Hu Yang
- 964 Hospital of Joint Logistic Support Force of the Chinese People's Liberation Army
| | - Qi Zhang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Cheng Peng
- School of Software, Tsinghua University, Beijing, China
| | - Wensi Fang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Ying Li
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
3
|
Pepe G, Appierdo R, Carrino C, Ballesio F, Helmer-Citterich M, Gherardini PF. Artificial intelligence methods enhance the discovery of RNA interactions. Front Mol Biosci 2022; 9:1000205. [PMID: 36275611 PMCID: PMC9585310 DOI: 10.3389/fmolb.2022.1000205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/20/2022] [Indexed: 11/13/2022] Open
Abstract
Understanding how RNAs interact with proteins, RNAs, or other molecules remains a challenge of main interest in biology, given the importance of these complexes in both normal and pathological cellular processes. Since experimental datasets are starting to be available for hundreds of functional interactions between RNAs and other biomolecules, several machine learning and deep learning algorithms have been proposed for predicting RNA-RNA or RNA-protein interactions. However, most of these approaches were evaluated on a single dataset, making performance comparisons difficult. With this review, we aim to summarize recent computational methods, developed in this broad research area, highlighting feature encoding and machine learning strategies adopted. Given the magnitude of the effect that dataset size and quality have on performance, we explored the characteristics of these datasets. Additionally, we discuss multiple approaches to generate datasets of negative examples for training. Finally, we describe the best-performing methods to predict interactions between proteins and specific classes of RNA molecules, such as circular RNAs (circRNAs) and long non-coding RNAs (lncRNAs), and methods to predict RNA-RNA or RNA-RBP interactions independently of the RNA type.
Collapse
Affiliation(s)
- G Pepe
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - R Appierdo
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - C Carrino
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - F Ballesio
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - M Helmer-Citterich
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - PF Gherardini
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| |
Collapse
|
4
|
Dzuvor CKO, Tettey EL, Danquah MK. Aptamers as promising nanotheranostic tools in the COVID-19 pandemic era. WILEY INTERDISCIPLINARY REVIEWS. NANOMEDICINE AND NANOBIOTECHNOLOGY 2022; 14:e1785. [PMID: 35238490 PMCID: PMC9111085 DOI: 10.1002/wnan.1785] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 02/02/2022] [Accepted: 02/07/2022] [Indexed: 12/13/2022]
Abstract
The emergence of SARS-COV-2, the causative agent of new coronavirus disease (COVID-19) has become a pandemic threat. Early and precise detection of the virus is vital for effective diagnosis and treatment. Various testing kits and assays, including nucleic acid detection methods, antigen tests, serological tests, and enzyme-linked immunosorbent assay (ELISA), have been implemented or are being explored to detect the virus and/or characterize cellular and antibody responses to the infection. However, these approaches have inherent drawbacks such as nonspecificity, high cost, are characterized by long turnaround times for test results, and can be labor-intensive. Also, the circulating SARS-COV-2 variant of concerns, reduced antibody sensitivity and/or neutralization, and possible antibody-dependent enhancement (ADE) have warranted the search for alternative potent therapeutics. Aptamers, which are single-stranded oligonucleotides, generated artificially by SELEX (Evolution of Ligands by Exponential Enrichment) may offer the capacity to generate high-affinity neutralizers and/or bioprobes for monitoring relevant SARS-COV-2 and COVID-19 biomarkers. This article reviews and discusses the prospects of implementing aptamers for rapid point-of-care detection and treatment of SARS-COV-2. We highlight other SARS-COV-2 targets (N protein, spike protein stem-helix), SELEX augmented with competition assays and in silico technologies for rapid discovery and isolation of theranostic aptamers against COVID-19 and future pandemics. It further provides an overview on site-specific bioconjugation approaches, customizable molecular scaffolding strategies, and nanotechnology platforms to engineer these aptamers into ultrapotent blockers, multivalent therapeutics, and vaccines to boost both humoral and cellular immunity against the virus. This article is categorized under: Therapeutic Approaches and Drug Discovery > Emerging Technologies Diagnostic Tools > Biosensing Therapeutic Approaches and Drug Discovery > Nanomedicine for Infectious Disease Therapeutic Approaches and Drug Discovery > Nanomedicine for Respiratory Disease.
Collapse
Affiliation(s)
- Christian K. O. Dzuvor
- Bioengineering Laboratory, Department of Chemical and Biological EngineeringMonash UniversityClaytonVictoriaAustralia
| | | | - Michael K. Danquah
- Department of Chemical EngineeringUniversity of TennesseeChattanoogaTennesseeUSA
| |
Collapse
|
5
|
3D Modeling of Non-coding RNA Interactions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1385:281-317. [DOI: 10.1007/978-3-031-08356-3_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
6
|
Shirafkan F, Gharaghani S, Rahimian K, Sajedi RH, Zahiri J. Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods. BMC Bioinformatics 2021; 22:261. [PMID: 34030624 PMCID: PMC8142502 DOI: 10.1186/s12859-021-04194-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 05/13/2021] [Indexed: 12/18/2022] Open
Abstract
Background Moonlighting proteins (MPs) are a subclass of multifunctional proteins in which more than one independent or usually distinct function occurs in a single polypeptide chain. Identification of unknown cellular processes, understanding novel protein mechanisms, improving the prediction of protein functions, and gaining information about protein evolution are the main reasons to study MPs. They also play an important role in disease pathways and drug-target discovery. Since detecting MPs experimentally is quite a challenge, most of them are detected randomly. Therefore, introducing an appropriate computational approach to predict MPs seems reasonable. Results In this study, we introduced a competent model for detecting moonlighting and non-MPs through extracted features from protein sequences. We attempted to set up a well-judged scheme for detecting outlier proteins. Consequently, 37 distinct feature vectors were utilized to study each protein’s impact on detecting MPs. Furthermore, 8 different classification methods were assessed to find the best performance. To detect outliers, each one of the classifications was executed 100 times by tenfold cross-validation on feature vectors; proteins which misclassified 90 times or more were grouped. This process was applied to every single feature vector and eventually the intersection of these groups was determined as the outlier proteins. The results of tenfold cross-validation on a dataset of 351 samples (containing 215 moonlighting and 136 non-moonlighting proteins) reveal that the SVM method on all feature vectors has the highest performance among all methods in this study and other available methods. Besides, the study of outliers showed that 57 of 351 proteins in the dataset could be an appropriate candidate for the outlier. Among the outlier proteins, there were non-MPs (such as P69797) that have been misclassified in 8 different classification methods with 16 different feature vectors. Because these proteins have been obtained by computational methods, the results of this study could reduce the likelihood of hypothesizing whether these proteins are non-moonlighting at all. Conclusions MPs are difficult to be identified through experimentation. Using distinct feature vectors, our method enabled identification of novel moonlighting proteins. The study also pinpointed that a number of non-MPs are likely to be moonlighting. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04194-5.
Collapse
Affiliation(s)
- Farshid Shirafkan
- Laboratory of Bioinformatics and Drug Design, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Sajjad Gharaghani
- Laboratory of Bioinformatics and Drug Design, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| | - Karim Rahimian
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Reza Hasan Sajedi
- Department of Biochemistry, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Javad Zahiri
- Department of Neuroscience, University of California San Diego, La Jolla, CA, USA. .,Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
7
|
Li Y, Sun H, Feng S, Zhang Q, Han S, Du W. Capsule-LPI: a LncRNA-protein interaction predicting tool based on a capsule network. BMC Bioinformatics 2021; 22:246. [PMID: 33985444 PMCID: PMC8120853 DOI: 10.1186/s12859-021-04171-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 05/05/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) play important roles in multiple biological processes. Identifying LncRNA-protein interactions (LPIs) is key to understanding lncRNA functions. Although some LPIs computational methods have been developed, the LPIs prediction problem remains challenging. How to integrate multimodal features from more perspectives and build deep learning architectures with better recognition performance have always been the focus of research on LPIs. RESULTS We present a novel multichannel capsule network framework to integrate multimodal features for LPI prediction, Capsule-LPI. Capsule-LPI integrates four groups of multimodal features, including sequence features, motif information, physicochemical properties and secondary structure features. Capsule-LPI is composed of four feature-learning subnetworks and one capsule subnetwork. Through comprehensive experimental comparisons and evaluations, we demonstrate that both multimodal features and the architecture of the multichannel capsule network can significantly improve the performance of LPI prediction. The experimental results show that Capsule-LPI performs better than the existing state-of-the-art tools. The precision of Capsule-LPI is 87.3%, which represents a 1.7% improvement. The F-value of Capsule-LPI is 92.2%, which represents a 1.4% improvement. CONCLUSIONS This study provides a novel and feasible LPI prediction tool based on the integration of multimodal features and a capsule network. A webserver ( http://csbg-jlu.site/lpc/predict ) is developed to be convenient for users.
Collapse
Affiliation(s)
- Ying Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Hang Sun
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Shiyao Feng
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Qi Zhang
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Siyu Han
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
- Department of Computer Science, Faculty of Engineering, University of Bristol, Bristol, BS8 1UB, UK
| | - Wei Du
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China.
| |
Collapse
|
8
|
Wang J, Zhao Y, Gong W, Liu Y, Wang M, Huang X, Tan J. EDLMFC: an ensemble deep learning framework with multi-scale features combination for ncRNA-protein interaction prediction. BMC Bioinformatics 2021; 22:133. [PMID: 33740884 PMCID: PMC7980572 DOI: 10.1186/s12859-021-04069-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2021] [Accepted: 03/05/2021] [Indexed: 11/29/2022] Open
Abstract
Background Non-coding RNA (ncRNA) and protein interactions play essential roles in various physiological and pathological processes. The experimental methods used for predicting ncRNA–protein interactions are time-consuming and labor-intensive. Therefore, there is an increasing demand for computational methods to accurately and efficiently predict ncRNA–protein interactions. Results In this work, we presented an ensemble deep learning-based method, EDLMFC, to predict ncRNA–protein interactions using the combination of multi-scale features, including primary sequence features, secondary structure sequence features, and tertiary structure features. Conjoint k-mer was used to extract protein/ncRNA sequence features, integrating tertiary structure features, then fed into an ensemble deep learning model, which combined convolutional neural network (CNN) to learn dominating biological information with bi-directional long short-term memory network (BLSTM) to capture long-range dependencies among the features identified by the CNN. Compared with other state-of-the-art methods under five-fold cross-validation, EDLMFC shows the best performance with accuracy of 93.8%, 89.7%, and 86.1% on RPI1807, NPInter v2.0, and RPI488 datasets, respectively. The results of the independent test demonstrated that EDLMFC can effectively predict potential ncRNA–protein interactions from different organisms. Furtherly, EDLMFC is also shown to predict hub ncRNAs and proteins presented in ncRNA–protein networks of Mus musculus successfully. Conclusions In general, our proposed method EDLMFC improved the accuracy of ncRNA–protein interaction predictions and anticipated providing some helpful guidance on ncRNA functions research. The source code of EDLMFC and the datasets used in this work are available at https://github.com/JingjingWang-87/EDLMFC. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04069-9.
Collapse
Affiliation(s)
- Jingjing Wang
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Yanpeng Zhao
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Weikang Gong
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Yang Liu
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Mei Wang
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Xiaoqian Huang
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Jianjun Tan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China.
| |
Collapse
|
9
|
Shaw D, Chen H, Xie M, Jiang T. DeepLPI: a multimodal deep learning method for predicting the interactions between lncRNAs and protein isoforms. BMC Bioinformatics 2021; 22:24. [PMID: 33461501 PMCID: PMC7814738 DOI: 10.1186/s12859-020-03914-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 11/30/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) regulate diverse biological processes via interactions with proteins. Since the experimental methods to identify these interactions are expensive and time-consuming, many computational methods have been proposed. Although these computational methods have achieved promising prediction performance, they neglect the fact that a gene may encode multiple protein isoforms and different isoforms of the same gene may interact differently with the same lncRNA. RESULTS In this study, we propose a novel method, DeepLPI, for predicting the interactions between lncRNAs and protein isoforms. Our method uses sequence and structure data to extract intrinsic features and expression data to extract topological features. To combine these different data, we adopt a hybrid framework by integrating a multimodal deep learning neural network and a conditional random field. To overcome the lack of known interactions between lncRNAs and protein isoforms, we apply a multiple instance learning (MIL) approach. In our experiment concerning the human lncRNA-protein interactions in the NPInter v3.0 database, DeepLPI improved the prediction performance by 4.7% in term of AUC and 5.9% in term of AUPRC over the state-of-the-art methods. Our further correlation analyses between interactive lncRNAs and protein isoforms also illustrated that their co-expression information helped predict the interactions. Finally, we give some examples where DeepLPI was able to outperform the other methods in predicting mouse lncRNA-protein interactions and novel human lncRNA-protein interactions. CONCLUSION Our results demonstrated that the use of isoforms and MIL contributed significantly to the improvement of performance in predicting lncRNA and protein interactions. We believe that such an approach would find more applications in predicting other functional roles of RNAs and proteins.
Collapse
Affiliation(s)
- Dipan Shaw
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
- Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing, China
| |
Collapse
|
10
|
Suravajhala R, Gupta S, Kumar N, Suravajhala P. Deciphering LncRNA-protein interactions using docking complexes. J Biomol Struct Dyn 2020; 40:3769-3776. [PMID: 33280525 DOI: 10.1080/07391102.2020.1850354] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Deciphering RNA-protein interactions are important to study principal biological mechanisms including transcription and translation regulation, gene silencing, among others. Predicting RNA molecule interaction with the target protein could allow us to understand important cellular processes and design novel treatment therapies for various diseases. As non-coding RNAs do not have coding potential our knowledge about their functions is still limited. Therefore, RNA-binding proteins of non-coding RNAs regulating functions, viz. including cellular maturation, nuclear export and stability may play a very important role. Keeping in view of the need for refined methods to understand protein-RNA interactions, we have attempted a docking model to infer binding sites between lncRNA NONHSAT02007 and protein KIF13A for a rare disease phenotype that we are studying in our lab.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Renuka Suravajhala
- Department of Chemistry, School of Basic Science, Manipal University, Manipal, India
| | - Sonal Gupta
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research (BISR), Jaipur, India.,Department of Biotechnology, Amity University Rajasthan, Jaipur, India
| | - Narayan Kumar
- Department of Biotechnology and Bioinformatics, NIIT University, Neemrana, India
| | - Prashanth Suravajhala
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research (BISR), Jaipur, India.,Bioclues.org, India
| |
Collapse
|
11
|
Emami N, Pakchin PS, Ferdousi R. Computational predictive approaches for interaction and structure of aptamers. J Theor Biol 2020; 497:110268. [PMID: 32311376 DOI: 10.1016/j.jtbi.2020.110268] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 03/27/2020] [Accepted: 04/02/2020] [Indexed: 02/07/2023]
Abstract
Aptamers are short single-strand sequences that can bind to their specific targets with high affinity and specificity. Usually, aptamers are selected experimentally via systematic evolution of ligands by exponential enrichment (SELEX), an evolutionary process that consists of multiple cycles of selection and amplification. The SELEX process is expensive, time-consuming, and its success rates are relatively low. To overcome these difficulties, in recent years, several computational techniques have been developed in aptamer sciences that bring together different disciplines and branches of technologies. In this paper, a complementary review on computational predictive approaches of the aptamer has been organized. Generally, the computational prediction approaches of aptamer have been proposed to carry out in two main categories: interaction-based prediction and structure-based predictions. Furthermore, the available software packages and toolkits in this scope were reviewed. The aim of describing computational methods and tools in aptamer science is that aptamer scientists might take advantage of these computational techniques to develop more accurate and more sensitive aptamers.
Collapse
Affiliation(s)
- Neda Emami
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Parvin Samadi Pakchin
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran; Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.
| |
Collapse
|
12
|
Park B, Han K. Discovering protein-binding RNA motifs with a generative model of RNA sequences. Comput Biol Chem 2020; 84:107171. [DOI: 10.1016/j.compbiolchem.2019.107171] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 10/19/2019] [Accepted: 11/19/2019] [Indexed: 01/01/2023]
|
13
|
Abstract
BACKGROUND Interactions between protein and nucleic acid molecules are essential to a variety of cellular processes. A large amount of interaction data generated by high-throughput technologies have triggered the development of several computational methods either to predict binding sites in a sequence or to determine whether a pair of sequences interacts or not. Most of these methods treat the problem of the interaction of nucleic acids with proteins as a classification problem rather than a generation problem. RESULTS We developed a generative model for constructing single-stranded nucleic acids binding to a target protein using a long short-term memory (LSTM) neural network. Experimental results of the generative model are promising in the sense that DNA and RNA sequences generated by the model for several target proteins show high specificity and that motifs present in the generated sequences are similar to known protein-binding motifs. CONCLUSIONS Although these are preliminary results of our ongoing research, our approach can be used to generate nucleic acid sequences binding to a target protein. In particular, it will help design efficient in vitro experiments by constructing an initial pool of potential aptamers that bind to a target protein with high affinity and specificity.
Collapse
Affiliation(s)
- Jinho Im
- Department of Computer Engineering, Inha University, Incheon, 22212, South Korea
| | - Byungkyu Park
- Department of Computer Engineering, Inha University, Incheon, 22212, South Korea
| | - Kyungsook Han
- Department of Computer Engineering, Inha University, Incheon, 22212, South Korea.
| |
Collapse
|
14
|
LPI-BLS: Predicting lncRNA–protein interactions with a broad learning system-based stacked ensemble classifier. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.08.084] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
15
|
Pan X, Yang Y, Xia C, Mirza AH, Shen H. Recent methodology progress of deep learning for RNA–protein interaction prediction. WILEY INTERDISCIPLINARY REVIEWS-RNA 2019; 10:e1544. [DOI: 10.1002/wrna.1544] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 04/07/2019] [Accepted: 04/11/2019] [Indexed: 12/17/2022]
Affiliation(s)
- Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing Ministry of Education of China Shanghai China
- IDLab, Department for Electronics and Information Systems Ghent University Ghent Belgium
- BASF Agriculture Solution Ghent Belgium
| | - Yang Yang
- Department of Computer Science Shanghai Jiao Tong University, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering Shanghai China
| | - Chun‐Qiu Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing Ministry of Education of China Shanghai China
| | - Aashiq H. Mirza
- Department of Pharmacology Weill Cornell Medicine New York New York
| | - Hong‐Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing Ministry of Education of China Shanghai China
- Department of Computer Science Shanghai Jiao Tong University, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering Shanghai China
| |
Collapse
|
16
|
Long Noncoding RNA and Protein Interactions: From Experimental Results to Computational Models Based on Network Methods. Int J Mol Sci 2019; 20:ijms20061284. [PMID: 30875752 PMCID: PMC6471543 DOI: 10.3390/ijms20061284] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Revised: 03/09/2019] [Accepted: 03/11/2019] [Indexed: 01/13/2023] Open
Abstract
Non-coding RNAs with a length of more than 200 nucleotides are long non-coding RNAs (lncRNAs), which have gained tremendous attention in recent decades. Many studies have confirmed that lncRNAs have important influence in post-transcriptional gene regulation; for example, lncRNAs affect the stability and translation of splicing factor proteins. The mutations and malfunctions of lncRNAs are closely related to human disorders. As lncRNAs interact with a variety of proteins, predicting the interaction between lncRNAs and proteins is a significant way to depth exploration functions and enrich annotations of lncRNAs. Experimental approaches for lncRNA–protein interactions are expensive and time-consuming. Computational approaches to predict lncRNA–protein interactions can be grouped into two broad categories. The first category is based on sequence, structural information and physicochemical property. The second category is based on network method through fusing heterogeneous data to construct lncRNA related heterogeneous network. The network-based methods can capture the implicit feature information in the topological structure of related biological heterogeneous networks containing lncRNAs, which is often ignored by sequence-based methods. In this paper, we summarize and discuss the materials, interaction score calculation algorithms, advantages and disadvantages of state-of-the-art algorithms of lncRNA–protein interaction prediction based on network methods to assist researchers in selecting a suitable method for acquiring more dependable results. All the related different network data are also collected and processed in convenience of users, and are available at https://github.com/HAN-Siyu/APINet/.
Collapse
|
17
|
Wang H, Wu P. Prediction of RNA-protein interactions using conjoint triad feature and chaos game representation. Bioengineered 2019; 9:242-251. [PMID: 30117758 PMCID: PMC6984769 DOI: 10.1080/21655979.2018.1470721] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
RNA-protein interactions (RPIs) play a very important role in a wide range of post-transcriptional regulations, and identifying whether a given RNA-protein pair can form interactions or not is a vital prerequisite for dissecting the regulatory mechanisms of functional RNAs. Currently, expensive and time-consuming biological assays can only determine a very small portion of all RPIs, which calls for computational approaches to help biologists efficiently and correctly find candidate RPIs. Here, we integrated a successful computing algorithm, conjoint triad feature (CTF), and another method, chaos game representation (CGR), for representing RNA-protein pairs and by doing so developed a prediction model based on these representations and random forest (RF) classifiers. When testing two benchmark datasets, RPI369 and RPI2241, the combined method (CTF+CGR) showed some superiority compared with four existing tools. Especially on RPI2241, the CTF+CGR method improved prediction accuracy (ACC) from 0.91 (the best record of all published works) to 0.95. When independently testing a newly constructed dataset, RPI1449, which only contained experimentally validated RPIs released between 2014 and 2016, our method still showed some generalization capability with an ACC of 0.75. Accordingly, we believe that our hybrid CTF+CGR method will be an important tool for predicting RPIs in the future.
Collapse
Affiliation(s)
- Hongchu Wang
- a Department of Mathematics , South China Normal University , Guangzhou P.R. of China
| | - Pengfei Wu
- b College of Informatics , Huazhong Agricultural University , Wuhan P.R. of China
| |
Collapse
|
18
|
Zhan ZH, You ZH, Li LP, Zhou Y, Yi HC. Accurate Prediction of ncRNA-Protein Interactions From the Integration of Sequence and Evolutionary Information. Front Genet 2018; 9:458. [PMID: 30349558 PMCID: PMC6186793 DOI: 10.3389/fgene.2018.00458] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Accepted: 09/19/2018] [Indexed: 12/18/2022] Open
Abstract
Non-coding RNA (ncRNA) plays a crucial role in numerous biological processes including gene expression and post-transcriptional gene regulation. The biological function of ncRNA is mostly realized by binding with related proteins. Therefore, an accurate understanding of interactions between ncRNA and protein has a significant impact on current biological research. The major challenge at this stage is the waste of a great deal of redundant time and resource consumed on classification in traditional interaction pattern prediction methods. Fortunately, an efficient classifier named LightGBM can solve this difficulty of long time consumption. In this study, we employed LightGBM as the integrated classifier and proposed a novel computational model for predicting ncRNA and protein interactions. More specifically, the pseudo-Zernike Moments and singular value decomposition algorithm are employed to extract the discriminative features from protein and ncRNA sequences. On four widely used datasets RPI369, RPI488, RPI1807, and RPI2241, we evaluated the performance of LGBM and obtained an superior performance with AUC of 0.799, 0.914, 0.989, and 0.762, respectively. The experimental results of 10-fold cross-validation shown that the proposed method performs much better than existing methods in predicting ncRNA-protein interaction patterns, which could be used as a useful tool in proteomics research.
Collapse
Affiliation(s)
- Zhao-Hui Zhan
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Li-Ping Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Yong Zhou
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| | - Hai-Cheng Yi
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| |
Collapse
|
19
|
Hoseini ASH, Mirzarezaee M. Prediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks. IRANIAN JOURNAL OF BIOTECHNOLOGY 2018; 16:e1933. [PMID: 31457027 PMCID: PMC6697825 DOI: 10.15171/ijb.1933] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Revised: 01/11/2018] [Accepted: 01/13/2018] [Indexed: 01/09/2023]
Abstract
Background Prediction of the protein localization is among the most important issues in the bioinformatics that is used for the prediction of the proteins in the cells and organelles such as mitochondria. In this study, several machine learning algorithms are applied for the prediction of the intracellular protein locations. These algorithms use the features extracted from protein sequences. In contrast, protein interactions have been less investigated. Objectives As protein interactions usually occur in the same or adjacent places, using this feature to find the location would be efficient and impressive. This study did not aim at increasing the total accuracy of the conducted research. The study has focused on the features of the proteins’ interaction and their employment which lead to a higher accuracy. Materials and Methods In this study, we have examined the protein interaction network as one of the features for prediction of the protein localization and its effects on the prediction results. In this regards, we have gathered some of the most common features including Amino Acid Composition, Dipeptide Compositions, Pseudo Amino Acid Compositions (PseAAC), Position Specific Scoring Matrix (PSSM), Functional Domain, Gene Ontology information, and the Pair-wise sequence alignment. The results of the classification are compared to the ones using protein interactions. For achieving this goal different machine learning algorithms were tested. Results The best-obtained results of using single feature set obtained using SVM classifier for PseAAC feature. The accuracy of combining all features with PPI data, using the Decision Tree and Random Forest classifiers, was 82.49% and 83.35%, respectively. In another experiment, using just protein interaction data with the different cutting points resulted in obtaining an accuracy of 93.035% for the protein location prediction. Conclusion In total, it was shown that protein(s) interaction has a significant impact on the prediction of the mitochondrial proteins’ location. This feature can separately distinguish the locations well. Using this feature the accuracy of the results is raised up to 5%.
Collapse
Affiliation(s)
| | - Mitra Mirzarezaee
- Department of Computer Engineering, Science and Research branch, Islamic Azad University, Tehran, Iran.,School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| |
Collapse
|
20
|
Yang C, Yang L, Zhou M, Xie H, Zhang C, Wang MD, Zhu H. LncADeep: anab initiolncRNA identification and functional annotation tool based on deep learning. Bioinformatics 2018; 34:3825-3834. [DOI: 10.1093/bioinformatics/bty428] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 05/23/2018] [Indexed: 12/15/2022] Open
Affiliation(s)
- Cheng Yang
- Department of Biomedical Engineering, College of Engineering, and Centre for Quantitative Biology, Peking University, Beijing, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Longshu Yang
- Department of Biomedical Engineering, College of Engineering, and Centre for Quantitative Biology, Peking University, Beijing, China
| | - Man Zhou
- Department of Biomedical Engineering, College of Engineering, and Centre for Quantitative Biology, Peking University, Beijing, China
| | - Haoling Xie
- Department of Biomedical Engineering, College of Engineering, and Centre for Quantitative Biology, Peking University, Beijing, China
- Peking University-Tsinghua University-National Institute of Biological Sciences (PTN) Joint PhD Program and College of Life Sciences, Peking University, Beijing, China
| | - Chengjiu Zhang
- Department of Biomedical Engineering, College of Engineering, and Centre for Quantitative Biology, Peking University, Beijing, China
| | - May D Wang
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Huaiqiu Zhu
- Department of Biomedical Engineering, College of Engineering, and Centre for Quantitative Biology, Peking University, Beijing, China
- Peking University-Tsinghua University-National Institute of Biological Sciences (PTN) Joint PhD Program and College of Life Sciences, Peking University, Beijing, China
| |
Collapse
|
21
|
Shen WJ, Cui W, Chen D, Zhang J, Xu J. RPiRLS: Quantitative Predictions of RNA Interacting with Any Protein of Known Sequence. Molecules 2018; 23:molecules23030540. [PMID: 29495575 PMCID: PMC6017498 DOI: 10.3390/molecules23030540] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Revised: 02/24/2018] [Accepted: 02/25/2018] [Indexed: 02/05/2023] Open
Abstract
RNA-protein interactions (RPIs) have critical roles in numerous fundamental biological processes, such as post-transcriptional gene regulation, viral assembly, cellular defence and protein synthesis. As the number of available RNA-protein binding experimental data has increased rapidly due to high-throughput sequencing methods, it is now possible to measure and understand RNA-protein interactions by computational methods. In this study, we integrate a sequence-based derived kernel with regularized least squares to perform prediction. The derived kernel exploits the contextual information around an amino acid or a nucleic acid as well as the repetitive conserved motif information. We propose a novel machine learning method, called RPiRLS to predict the interaction between any RNA and protein of known sequences. For the RPiRLS classifier, each protein sequence comprises up to 20 diverse amino acids but for the RPiRLS-7G classifier, each protein sequence is represented by using 7-letter reduced alphabets based on their physiochemical properties. We evaluated both methods on a number of benchmark data sets and compared their performances with two newly developed and state-of-the-art methods, RPI-Pred and IPMiner. On the non-redundant benchmark test sets extracted from the PRIDB, the RPiRLS method outperformed RPI-Pred and IPMiner in terms of accuracy, specificity and sensitivity. Further, RPiRLS achieved an accuracy of 92% on the prediction of lncRNA-protein interactions. The proposed method can also be extended to construct RNA-protein interaction networks. The RPiRLS web server is freely available at http://bmc.med.stu.edu.cn/RPiRLS.
Collapse
Affiliation(s)
- Wen-Jun Shen
- Department of Bioinformatics, Shantou University Medical College, Shantou 515000, Guangdong, China.
| | - Wenjuan Cui
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China.
| | - Danze Chen
- Department of Bioinformatics, Shantou University Medical College, Shantou 515000, Guangdong, China.
| | - Jieming Zhang
- Department of Bioinformatics, Shantou University Medical College, Shantou 515000, Guangdong, China.
| | - Jianzhen Xu
- Department of Bioinformatics, Shantou University Medical College, Shantou 515000, Guangdong, China.
| |
Collapse
|
22
|
Tamaki S, Tomita M, Suzuki H, Kanai A. Systematic Analysis of the Binding Surfaces between tRNAs and Their Respective Aminoacyl tRNA Synthetase Based on Structural and Evolutionary Data. Front Genet 2018; 8:227. [PMID: 29358943 PMCID: PMC5766645 DOI: 10.3389/fgene.2017.00227] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 12/15/2017] [Indexed: 12/23/2022] Open
Abstract
To determine the mechanism underlying the flow of genetic information, it is important to understand the relationship between a tRNA and its binding enzyme, a member of the aminoacyl-tRNA synthetase (aaRS) family. We have developed a novel method to project the interacting regions of tRNA-aaRS complexes, obtained from their three-dimensional structures, onto two-dimensional space. The interacting surface between each tRNA and its aaRS was successfully identified by determining these interactions with an atomic distance threshold of 3.3 Å. We analyzed their interactions, using 60 mainly bacterial and eukaryotic tRNA-aaRS complexes, and showed that the tRNA sequence regions that interacted most strongly with each aaRS are the anticodon loop and the CCA terminal region, followed by the D-stem. A sequence conservation analysis of the canonical tRNAs was conducted in 83 bacterial, 182 archaeal, and 150 eukaryotic species. Our results show that the three tRNA regions that interact with the aaRS and two additional loop regions (D-loop and TΨC-loop) known to be important for formation of the tRNA L-shaped structure are broadly conserved. We also found sequence conservations near the tRNA discriminator in the Bacteria and Archaea, and an enormous number of noncanonical tRNAs in the Eukaryotes. This is the first global view of tRNA evolution based on its structure and an unprecedented number of sequence data.
Collapse
Affiliation(s)
- Satoshi Tamaki
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.,Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa, Japan
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.,Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa, Japan.,Faculty of Environment and Information Studies, Keio University, Fujisawa, Japan
| | - Haruo Suzuki
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.,Faculty of Environment and Information Studies, Keio University, Fujisawa, Japan
| | - Akio Kanai
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.,Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa, Japan.,Faculty of Environment and Information Studies, Keio University, Fujisawa, Japan
| |
Collapse
|
23
|
Zhang X, Liu S. RBPPred: predicting RNA-binding proteins from sequence using SVM. Bioinformatics 2016; 33:854-862. [DOI: 10.1093/bioinformatics/btw730] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2016] [Accepted: 11/16/2016] [Indexed: 11/13/2022] Open
|