1
|
Pandey U, Behara SM, Sharma S, Patil RS, Nambiar S, Koner D, Bhukya H. DeePNAP: A Deep Learning Method to Predict Protein-Nucleic Acid Binding Affinity from Their Sequences. J Chem Inf Model 2024; 64:1806-1815. [PMID: 38458968 DOI: 10.1021/acs.jcim.3c01151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
Predicting the protein-nucleic acid (PNA) binding affinity solely from their sequences is of paramount importance for the experimental design and analysis of PNA interactions (PNAIs). A large number of currently developed models for binding affinity prediction are limited to specific PNAIs while also relying on the sequence and structural information of the PNA complexes for both training and testing, and also as inputs. As the PNA complex structures available are scarce, this significantly limits the diversity and generalizability due to the small training data set. Additionally, a majority of the tools predict a single parameter, such as binding affinity or free energy changes upon mutations, rendering a model less versatile for usage. Hence, we propose DeePNAP, a machine learning-based model built from a vast and heterogeneous data set with 14,401 entries (from both eukaryotes and prokaryotes) from the ProNAB database, consisting of wild-type and mutant PNA complex binding parameters. Our model precisely predicts the binding affinity and free energy changes due to the mutation(s) of PNAIs exclusively from their sequences. While other similar tools extract features from both sequence and structure information, DeePNAP employs sequence-based features to yield high correlation coefficients between the predicted and experimental values with low root mean squared errors for PNA complexes in predicting KD and ΔΔG, implying the generalizability of DeePNAP. Additionally, we have also developed a web interface hosting DeePNAP that can serve as a powerful tool to rapidly predict binding affinities for a myriad of PNAIs with high precision toward developing a deeper understanding of their implications in various biological systems. Web interface: http://14.139.174.41:8080/.
Collapse
Affiliation(s)
- Uddeshya Pandey
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Sasi M Behara
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Siddhant Sharma
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Rachit S Patil
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Souparnika Nambiar
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Debasish Koner
- Department of Chemistry, Indian Institute of Technology Hyderabad, Kandi 502284, India
| | - Hussain Bhukya
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| |
Collapse
|
2
|
Harini K, Sekijima M, Gromiha MM. PRA-Pred: Structure-based prediction of protein-RNA binding affinity. Int J Biol Macromol 2024; 259:129490. [PMID: 38224813 DOI: 10.1016/j.ijbiomac.2024.129490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 01/10/2024] [Accepted: 01/12/2024] [Indexed: 01/17/2024]
Abstract
Understanding crucial factors that affect the binding affinity of protein-RNA complexes is vital for comprehending their recognition mechanisms. This study involved compiling experimentally measured binding affinity (ΔG) values of 217 protein-RNA complexes and extracting numerous structure-based features, considering RNA, protein, and interactions between protein and RNA. Our findings indicate the significance of RNA base-step parameters, interaction energies, number of atomic contacts in the complex, hydrogen bonds, and contact potentials in understanding the binding affinity. Further, we observed that these factors are influenced by the type of RNA strand and the function of the protein in a protein-RNA complex. Multiple regression equations were developed for different classes of complexes to perform the prediction of the binding affinity between the protein and RNA. We evaluated the models using the jack-knife test and achieved an overall correlation 0.77 between the experimental and predicted binding affinities with a mean absolute error of 1.02 kcal/mol. Furthermore, we introduced a web server, PRA-Pred, intended for the prediction of protein-RNA binding affinity, and it is freely accessible through https://web.iitm.ac.in/bioinfo2/prapred/. We propose that our approach could function as a potential resource for investigating protein-RNA recognitions and developing therapeutic strategies.
Collapse
Affiliation(s)
- K Harini
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - M Sekijima
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Japan
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India; International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama, 226-8501, Japan; Department of Computer Science, National University of Singapore, Singapore.
| |
Collapse
|
3
|
Zhang X, Mei LC, Gao YY, Hao GF, Song BA. Web tools support predicting protein-nucleic acid complexes stability with affinity changes. WILEY INTERDISCIPLINARY REVIEWS. RNA 2023; 14:e1781. [PMID: 36693636 DOI: 10.1002/wrna.1781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 11/10/2022] [Accepted: 11/28/2022] [Indexed: 01/26/2023]
Abstract
Numerous biological processes, such as transcription, replication, and translation, rely on protein-nucleic acid interactions (PNIs). Demonstrating the binding stability of protein-nucleic acid complexes is vital to deciphering the code for PNIs. Numerous web-based tools have been developed to attach importance to protein-nucleic acid stability, facilitating the prediction of PNIs characteristics rapidly. However, the data and tools are dispersed and lack comprehensive integration to understand the stability of PNIs better. In this review, we first summarize existing databases for evaluating the stability of protein-nucleic acid binding. Then, we compare and evaluate the pros and cons of web tools for forecasting the interaction energies of protein-nucleic acid complexes. Finally, we discuss the application of combining models and capabilities of PNIs. We may hope these web-based tools will facilitate the discovery of recognition mechanisms for protein-nucleic acid binding stability. This article is categorized under: RNA Interactions with Proteins and Other Molecules > Protein-RNA Recognition RNA Interactions with Proteins and Other Molecules > RNA-Protein Complexes RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications.
Collapse
Affiliation(s)
- Xiao Zhang
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang, China
| | - Long-Can Mei
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
| | - Yang-Yang Gao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang, China
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang, China
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
| | - Bao-An Song
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang, China
| |
Collapse
|
4
|
Hong X, Tong X, Xie J, Liu P, Liu X, Song Q, Liu S, Liu S. An updated dataset and a structure-based prediction model for protein-RNA binding affinity. Proteins 2023; 91:1245-1253. [PMID: 37186412 DOI: 10.1002/prot.26503] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 03/08/2023] [Accepted: 04/12/2023] [Indexed: 05/17/2023]
Abstract
Understanding the process of protein-RNA interaction is essential for structural biology. The thermodynamic process is an important part to uncover the protein-RNA interaction mechanism. The regulatory networks between protein and RNA in organisms are dominated by the binding or dissociation in the cells. Therefore, determining the binding affinity for protein-RNA complexes can help us to understand the regulation mechanism of protein-RNA interaction. Since it is time-consuming and labor-intensive to determine the binding affinity for protein-RNA complexes by experimental methods, it is necessary and urgent to develop computational methods to predict that. To develop a binding affinity prediction model, first we update the dataset of protein-RNA binding affinity benchmark (PRBAB), which includes 145 complexes now. Second, we extract the structural features based on complex structure, and then we analyze and select the representative structural features to train the regression model. Third, we random select the subset from the PRBAB2.0 to fit the protein-RNA binding affinity determined by experiment. In the end, we tested our model on the nonredundant PDBbind dataset, and the results showed that Pearson correlation coefficient r = .57 and RMSE = 2.51 kcal/mol. The Pearson correlation coefficient achieves 0.7 while removing 5 complex structures with modified residues/nucleotides and metal ions. While testing on ProNAB, the results showed that 71.60% of the prediction achieves Pearson correlation coefficient r = .61 and RMSE = 1.56 kcal/mol with experiment values.
Collapse
Affiliation(s)
- Xu Hong
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Xiaoxue Tong
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Juan Xie
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Pinyu Liu
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Xudong Liu
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Qi Song
- Key Laboratory of Fermentation Engineering (Ministry of Education), Hubei University of Technology, Wuhan, China
| | - Sen Liu
- Key Laboratory of Fermentation Engineering (Ministry of Education), Hubei University of Technology, Wuhan, China
| | - Shiyong Liu
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| |
Collapse
|
5
|
Harini K, Kihara D, Michael Gromiha M. PDA-Pred: Predicting the binding affinity of protein-DNA complexes using machine learning techniques and structural features. Methods 2023; 213:10-17. [PMID: 36924867 PMCID: PMC10563387 DOI: 10.1016/j.ymeth.2023.03.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 02/17/2023] [Accepted: 03/11/2023] [Indexed: 03/17/2023] Open
Abstract
Protein-DNA interactions play an important role in various biological processes such as gene expression, replication, and transcription. Understanding the important features that dictate the binding affinity of protein-DNA complexes and predicting their affinities is important for elucidating their recognition mechanisms. In this work, we have collected the experimental binding free energy (ΔG) for a set of 391 Protein-DNA complexes and derived several structure-based features such as interaction energy, contact potentials, volume and surface area of binding site residues, base step parameters of the DNA and contacts between different types of atoms. Our analysis on relationship between binding affinity and structural features revealed that the important factors mainly depend on the number of DNA strands as well as functional and structural classes of proteins. Specifically, binding site properties such as number of atom contacts between the DNA and protein, volume of protein binding sites and interaction-based features such as interaction energies and contact potentials are important to understand the binding affinity. Further, we developed multiple regression equations for predicting the binding affinity of protein-DNA complexes belonging to different structural and functional classes. Our method showed an average correlation and mean absolute error of 0.78 and 0.98 kcal/mol, respectively, between the experimental and predicted binding affinities on a jack-knife test. We have developed a webserver, PDA-PreD (Protein-DNA Binding affinity predictor), for predicting the affinity of protein-DNA complexes and it is freely available at https://web.iitm.ac.in/bioinfo2/pdapred/.
Collapse
Affiliation(s)
- K Harini
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States; Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India; International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama 226-8501, Japan.
| |
Collapse
|
6
|
Li H, Zhu D, Yang Y, Ma Y, Chen Y, Xue P, Chen J, Qin M, Xu D, Cai C, Cheng H. Determinants of DNMT2/TRDMT1 preference for substrates tRNA and DNA during the evolution. RNA Biol 2023; 20:875-892. [PMID: 37966982 PMCID: PMC10653749 DOI: 10.1080/15476286.2023.2272473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/12/2023] [Indexed: 11/17/2023] Open
Abstract
RNA methyltransferase DNMT2/TRDMT1 is the most conserved member of the DNMT family from bacteria to plants and mammals. In previous studies, we found some determinants for tRNA recognition of DNMT2/TRDMT1, but the preference mechanism of this enzyme for substrates tRNA and DNA remains to be explored. In the present study, CFT-containing target recognition domain (TRD) and target recognition extension domain (TRED) in DNMT2/TRDMT1 play a crucial role in the substrate DNA and RNA selection during the evolution. Moreover, the classical substrate tRNA for DNMT2/TRDMT1 had a characteristic sequence CUXXCAC in the anticodon loop. Position 35 was occupied by U, making cytosine-38 (C38) twist into the loop, whereas C, G or A was located at position 35, keeping the C38-flipping state. Hence, the substrate preference could be modulated by the easily flipped state of target cytosine in tRNA, as well as TRD and TRED. Additionally, DNMT2/TRDMT1 cancer mutant activity was collectively mediated by five enzymatic characteristics, which might impact gene expressions. Importantly, G155C, G155V and G155S mutations reduced enzymatic activities and showed significant associations with diseases using seven prediction methods. Altogether, these findings will assist in illustrating the substrate preference mechanism of DNMT2/TRDMT1 and provide a promising therapeutic strategy for cancer.
Collapse
Affiliation(s)
- Huari Li
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Daiyun Zhu
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Yapeng Yang
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Yunfei Ma
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Yong Chen
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Pingfang Xue
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Juan Chen
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Mian Qin
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Dandan Xu
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Chao Cai
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Hongjing Cheng
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, Hubei, China
| |
Collapse
|
7
|
Wu Z, Basu S, Wu X, Kurgan L. qNABpredict: Quick, accurate, and taxonomy-aware sequence-based prediction of content of nucleic acid binding amino acids. Protein Sci 2023; 32:e4544. [PMID: 36519304 PMCID: PMC9798252 DOI: 10.1002/pro.4544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/07/2022] [Accepted: 12/08/2022] [Indexed: 12/23/2022]
Abstract
Protein sequence-based predictors of nucleic acid (NA)-binding include methods that predict NA-binding proteins and NA-binding residues. The residue-level tools produce more details but suffer high computational cost since they must predict every amino acid in the input sequence and rely on multiple sequence alignments. We propose an alternative approach that predicts content (fraction) of the NA-binding residues, offering more information than the protein-level prediction and much shorter runtime than the residue-level tools. Our first-of-its-kind content predictor, qNABpredict, relies on a small, rationally designed and fast-to-compute feature set that represents relevant characteristics extracted from the input sequence and a well-parametrized support vector regression model. We provide two versions of qNABpredict, a taxonomy-agnostic model that can be used for proteins of unknown taxonomic origin and more accurate taxonomy-aware models that are tailored to specific taxonomic kingdoms: archaea, bacteria, eukaryota, and viruses. Empirical tests on a low-similarity test dataset show that qNABpredict is 100 times faster and generates statistically more accurate content predictions when compared to the content extracted from results produced by the residue-level predictors. We also show that qNABpredict's content predictions can be used to improve results generated by the residue-level predictors. We release qNABpredict as a convenient webserver and source code at http://biomine.cs.vcu.edu/servers/qNABpredict/. This new tool should be particularly useful to predict details of protein-NA interactions for large protein families and proteomes.
Collapse
Affiliation(s)
- Zhonghua Wu
- School of Mathematical Sciences and LPMCNankai UniversityTianjinChina
| | - Sushmita Basu
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Xuantai Wu
- School of Mathematical Sciences and LPMCNankai UniversityTianjinChina
| | - Lukasz Kurgan
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| |
Collapse
|
8
|
Gaither J, Lin YH, Bundschuh R. RBPBind: Quantitative prediction of Protein-RNA interactions. J Mol Biol 2022; 434:167515. [DOI: 10.1016/j.jmb.2022.167515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 02/21/2022] [Accepted: 02/22/2022] [Indexed: 10/19/2022]
|
9
|
Stefanovic L, Gordon BH, Silvers R, Stefanovic B. Characterization of Sequence-Specific Binding of LARP6 to the 5' Stem-Loop of Type I Collagen mRNAs and Implications for Rational Design of Antifibrotic Drugs. J Mol Biol 2022; 434:167394. [PMID: 34896113 PMCID: PMC8752511 DOI: 10.1016/j.jmb.2021.167394] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 11/29/2021] [Accepted: 12/01/2021] [Indexed: 02/01/2023]
Abstract
Excessive synthesis of type I collagen is a hallmark of fibrotic diseases. Binding of La-related protein 6 (LARP6) to the 5' stem-loop (5'SL) of collagen mRNAs regulates their translation leading to an unnaturally elevated rate of collagen biosynthesis in fibrosis. Previous work suggested that LARP6 needs two domains to form stable complex with 5'SL RNA, the La domain and the juxtaposed RNA recognition motif (RRM), jointly called the La-module. Here we describe that La domain of LARP6 is necessary and sufficient for recognition of 5'SL in RNA sequence specific manner. A three-amino-acid motif located in the flexible loop connecting the second α-helix to the β-sheet of the La domain, called the RNK-motif, is critical for binding. Mutation of any of these three amino acids abolishes the binding of the La domain to 5'SL. The major site of crosslinking of LARP6 to 5'SL RNA was mapped to this motif, as well. The RNK-motif is not found in other LARPs, which cannot bind 5'SL. Presence of RRM increases the stability of complex between La domain and 5'SL RNA and RRM domain does not make extensive contacts with 5'SL RNA. We propose a model in which the initial recognition of 5'SL by LARP6 is mediated by the RNK epitope and further stabilized by the RRM domain. This discovery suggests that the interaction between LARP6 and collagen mRNAs can be blocked by small molecules that target the RNK epitope and will help rational design of the LARP6 binding inhibitors as specific antifibrotic drugs.
Collapse
Affiliation(s)
- Lela Stefanovic
- Department of Biomedical Sciences, College of Medicine, Florida State University, Tallahassee, FL 32306, USA
| | - Blaine H Gordon
- Department of Chemistry and Biochemistry, College of Arts and Sciences, Florida State University, Tallahassee, FL 32306, USA; Institute of Molecular Biophysics, College of Arts and Sciences, Florida State University, Tallahassee, FL 32306, USA
| | - Robert Silvers
- Department of Chemistry and Biochemistry, College of Arts and Sciences, Florida State University, Tallahassee, FL 32306, USA; Institute of Molecular Biophysics, College of Arts and Sciences, Florida State University, Tallahassee, FL 32306, USA
| | - Branko Stefanovic
- Department of Biomedical Sciences, College of Medicine, Florida State University, Tallahassee, FL 32306, USA.
| |
Collapse
|
10
|
Zhou L, Wang Z, Tian X, Peng L. LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA-protein interaction identification. BMC Bioinformatics 2021; 22:479. [PMID: 34607567 PMCID: PMC8489074 DOI: 10.1186/s12859-021-04399-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 07/14/2021] [Indexed: 12/31/2022] Open
Abstract
Background Long noncoding RNAs (lncRNAs) play important roles in various biological and pathological processes. Discovery of lncRNA–protein interactions (LPIs) contributes to understand the biological functions and mechanisms of lncRNAs. Although wet experiments find a few interactions between lncRNAs and proteins, experimental techniques are costly and time-consuming. Therefore, computational methods are increasingly exploited to uncover the possible associations. However, existing computational methods have several limitations. First, majority of them were measured based on one simple dataset, which may result in the prediction bias. Second, few of them are applied to identify relevant data for new lncRNAs (or proteins). Finally, they failed to utilize diverse biological information of lncRNAs and proteins. Results Under the feed-forward deep architecture based on gradient boosting decision trees (LPI-deepGBDT), this work focuses on classify unobserved LPIs. First, three human LPI datasets and two plant LPI datasets are arranged. Second, the biological features of lncRNAs and proteins are extracted by Pyfeat and BioProt, respectively. Thirdly, the features are dimensionally reduced and concatenated as a vector to represent an lncRNA–protein pair. Finally, a deep architecture composed of forward mappings and inverse mappings is developed to predict underlying linkages between lncRNAs and proteins. LPI-deepGBDT is compared with five classical LPI prediction models (LPI-BLS, LPI-CatBoost, PLIPCOM, LPI-SKF, and LPI-HNM) under three cross validations on lncRNAs, proteins, lncRNA–protein pairs, respectively. It obtains the best average AUC and AUPR values under the majority of situations, significantly outperforming other five LPI identification methods. That is, AUCs computed by LPI-deepGBDT are 0.8321, 0.6815, and 0.9073, respectively and AUPRs are 0.8095, 0.6771, and 0.8849, respectively. The results demonstrate the powerful classification ability of LPI-deepGBDT. Case study analyses show that there may be interactions between GAS5 and Q15717, RAB30-AS1 and O00425, and LINC-01572 and P35637. Conclusions Integrating ensemble learning and hierarchical distributed representations and building a multiple-layered deep architecture, this work improves LPI prediction performance as well as effectively probes interaction data for new lncRNAs/proteins.
Collapse
Affiliation(s)
- Liqian Zhou
- School of Computer Science, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Zhao Wang
- School of Computer Science, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Xiongfei Tian
- School of Computer Science, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China. .,College of Life Sciences and Chemistry, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China.
| |
Collapse
|
11
|
Harini K, Srivastava A, Kulandaisamy A, Gromiha MM. ProNAB: database for binding affinities of protein-nucleic acid complexes and their mutants. Nucleic Acids Res 2021; 50:D1528-D1534. [PMID: 34606614 PMCID: PMC8728258 DOI: 10.1093/nar/gkab848] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 09/08/2021] [Accepted: 09/10/2021] [Indexed: 11/16/2022] Open
Abstract
Protein–nucleic acid interactions are involved in various biological processes such as gene expression, replication, transcription, translation and packaging. The binding affinities of protein–DNA and protein–RNA complexes are important for elucidating the mechanism of protein–nucleic acid recognition. Although experimental data on binding affinity are reported abundantly in the literature, no well-curated database is currently available for protein–nucleic acid binding affinity. We have developed a database, ProNAB, which contains more than 20 000 experimental data for the binding affinities of protein–DNA and protein–RNA complexes. Each entry provides comprehensive information on sequence and structural features of a protein, nucleic acid and its complex, experimental conditions, thermodynamic parameters such as dissociation constant (Kd), binding free energy (ΔG) and change in binding free energy upon mutation (ΔΔG), and literature information. ProNAB is cross-linked with GenBank, UniProt, PDB, ProThermDB, PROSITE, DisProt and Pubmed. It provides a user-friendly web interface with options for search, display, sorting, visualization, download and upload the data. ProNAB is freely available at https://web.iitm.ac.in/bioinfo2/pronab/ and it has potential applications such as understanding the factors influencing the affinity, development of prediction tools, binding affinity change upon mutation and design complexes with the desired affinity.
Collapse
Affiliation(s)
- Kannan Harini
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - Ambuj Srivastava
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - Arulsamy Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| |
Collapse
|
12
|
Tian X, Shen L, Wang Z, Zhou L, Peng L. A novel lncRNA-protein interaction prediction method based on deep forest with cascade forest structure. Sci Rep 2021; 11:18881. [PMID: 34556758 PMCID: PMC8460650 DOI: 10.1038/s41598-021-98277-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 08/18/2021] [Indexed: 02/08/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) regulate many biological processes by interacting with corresponding RNA-binding proteins. The identification of lncRNA-protein Interactions (LPIs) is significantly important to well characterize the biological functions and mechanisms of lncRNAs. Existing computational methods have been effectively applied to LPI prediction. However, the majority of them were evaluated only on one LPI dataset, thereby resulting in prediction bias. More importantly, part of models did not discover possible LPIs for new lncRNAs (or proteins). In addition, the prediction performance remains limited. To solve with the above problems, in this study, we develop a Deep Forest-based LPI prediction method (LPIDF). First, five LPI datasets are obtained and the corresponding sequence information of lncRNAs and proteins are collected. Second, features of lncRNAs and proteins are constructed based on four-nucleotide composition and BioSeq2vec with encoder-decoder structure, respectively. Finally, a deep forest model with cascade forest structure is developed to find new LPIs. We compare LPIDF with four classical association prediction models based on three fivefold cross validations on lncRNAs, proteins, and LPIs. LPIDF obtains better average AUCs of 0.9012, 0.6937 and 0.9457, and the best average AUPRs of 0.9022, 0.6860, and 0.9382, respectively, for the three CVs, significantly outperforming other methods. The results show that the lncRNA FTX may interact with the protein P35637 and needs further validation.
Collapse
Affiliation(s)
- Xiongfei Tian
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, China
| | - Ling Shen
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, China
| | - Zhenwu Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, China.
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, China.
| |
Collapse
|
13
|
Feng Y, Wang Z, Yang N, Liu S, Yan J, Song J, Yang S, Zhang Y. Identification of Biomarkers for Cervical Cancer Radiotherapy Resistance Based on RNA Sequencing Data. Front Cell Dev Biol 2021; 9:724172. [PMID: 34414195 PMCID: PMC8369412 DOI: 10.3389/fcell.2021.724172] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Accepted: 07/14/2021] [Indexed: 11/28/2022] Open
Abstract
Cervical cancer as a common gynecological malignancy threatens the health and lives of women. Resistance to radiotherapy is the primary cause of treatment failure and is mainly related to difference in the inherent vulnerability of tumors after radiotherapy. Here, we investigated signature genes associated with poor response to radiotherapy by analyzing an independent cervical cancer dataset from the Gene Expression Omnibus, including pre-irradiation and mid-irradiation information. A total of 316 differentially expressed genes were significantly identified. The correlations between these genes were investigated through the Pearson correlation analysis. Subsequently, random forest model was used in determining cancer-related genes, and all genes were ranked by random forest scoring. The top 30 candidate genes were selected for uncovering their biological functions. Functional enrichment analysis revealed that the biological functions chiefly enriched in tumor immune responses, such as cellular defense response, negative regulation of immune system process, T cell activation, neutrophil activation involved in immune response, regulation of antigen processing and presentation, and peptidyl-tyrosine autophosphorylation. Finally, the top 30 genes were screened and analyzed through literature verification. After validation, 10 genes (KLRK1, LCK, KIF20A, CD247, FASLG, CD163, ZAP70, CD8B, ZNF683, and F10) were to our objective. Overall, the present research confirmed that integrated bioinformatics methods can contribute to the understanding of the molecular mechanisms and potential therapeutic targets underlying radiotherapy resistance in cervical cancer.
Collapse
Affiliation(s)
- Yue Feng
- Department of Gynecological Radiotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| | - Zhao Wang
- Department of Gynecological Radiotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| | - Nan Yang
- Department of Gynecological Radiotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| | - Sijia Liu
- Department of Gynecological Radiotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| | - Jiazhuo Yan
- Department of Gynecological Radiotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| | - Jiayu Song
- Department of Gynecological Radiotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| | - Shanshan Yang
- Department of Gynecological Radiotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| | - Yunyan Zhang
- Department of Gynecological Radiotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| |
Collapse
|
14
|
Tang ZQ, Zhao L, Chen GX, Chen CYC. Novel and versatile artificial intelligence algorithms for investigating possible GHSR1α and DRD1 agonists for Alzheimer's disease. RSC Adv 2021; 11:6423-6446. [PMID: 35423219 PMCID: PMC8694922 DOI: 10.1039/d0ra10077c] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 01/18/2021] [Indexed: 11/21/2022] Open
Abstract
Hippocampal lesions are recognized as the earliest pathological changes in Alzheimer's disease (AD). Recent researches have shown that the co-activation of growth hormone secretagogue receptor 1α (GHSR1α) and dopamine receptor D1 (DRD1) could recover the function of hippocampal synaptic and cognition. We combined traditional virtual screening technology with artificial intelligence models to screen multi-target agonists for target proteins from TCM database and a novel boost Generalized Regression Neural Network (GRNN) model was proposed in this article to improve the poor adjustability of GRNN. R-square was chosen to evaluate the accuracy of these artificial intelligent models. For the GHSR1α agonist dataset, Adaptive Boosting (AdaBoost), Linear Ridge Regression (LRR), Support Vector Machine (SVM), and boost GRNN achieved good results; the R-square of the test set of these models reached 0.900, 0.813, 0.708, and 0.802, respectively. For the DRD1 agonist dataset, Gradient Boosting (GB), Random Forest (RF), SVM, and boost GRNN achieved good results; the R-square of the test set of these models reached 0.839, 0.781, 0.763, and 0.815, respectively. According to these values of R-square, it is obvious that boost GRNN and SVM have better adaptability for different data sets and boost GRNN is more accurate than SVM. To evaluate the reliability of screening results, molecular dynamics (MD) simulation experiments were performed to make sure that candidates were docked well in the protein binding site. By analyzing the results of these artificial intelligent models and MD experiments, we suggest that 2007_17103 and 2007_13380 are the possible dual-target drugs for Alzheimer's disease (AD).
Collapse
Affiliation(s)
- Zi-Qiang Tang
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen Guangzhou 510275 China
| | - Lu Zhao
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen Guangzhou 510275 China
- Department of Clinical Laboratory, The Sixth Affiliated Hospital, Sun Yat-sen University Guangzhou 510655 China
| | - Guan-Xing Chen
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen Guangzhou 510275 China
| | - Calvin Yu-Chian Chen
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen Guangzhou 510275 China
- Department of Medical Research, China Medical University Hospital Taichung 40447 Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University Taichung 41354 Taiwan
| |
Collapse
|
15
|
Corley M, Burns MC, Yeo GW. How RNA-Binding Proteins Interact with RNA: Molecules and Mechanisms. Mol Cell 2020; 78:9-29. [PMID: 32243832 PMCID: PMC7202378 DOI: 10.1016/j.molcel.2020.03.011] [Citation(s) in RCA: 354] [Impact Index Per Article: 88.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Revised: 01/13/2020] [Accepted: 03/09/2020] [Indexed: 12/17/2022]
Abstract
RNA-binding proteins (RBPs) comprise a large class of over 2,000 proteins that interact with transcripts in all manner of RNA-driven processes. The structures and mechanisms that RBPs use to bind and regulate RNA are incredibly diverse. In this review, we take a look at the components of protein-RNA interaction, from the molecular level to multi-component interaction. We first summarize what is known about protein-RNA molecular interactions based on analyses of solved structures. We additionally describe software currently available for predicting protein-RNA interaction and other resources useful for the study of RBPs. We then review the structure and function of seventeen known RNA-binding domains and analyze the hydrogen bonds adopted by protein-RNA structures on a domain-by-domain basis. We conclude with a summary of the higher-level mechanisms that regulate protein-RNA interactions.
Collapse
Affiliation(s)
- Meredith Corley
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Margaret C Burns
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA; Biomedical Sciences Graduate Program, University of California, San Diego, La Jolla, CA, USA
| | - Gene W Yeo
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA; Biomedical Sciences Graduate Program, University of California, San Diego, La Jolla, CA, USA; Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
16
|
PreDBA: A heterogeneous ensemble approach for predicting protein-DNA binding affinity. Sci Rep 2020; 10:1278. [PMID: 31992738 PMCID: PMC6987227 DOI: 10.1038/s41598-020-57778-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 01/06/2020] [Indexed: 11/17/2022] Open
Abstract
The interaction between protein and DNA plays an essential function in various critical natural processes, like DNA replication, transcription, splicing, and repair. Studying the binding affinity of proteins to DNA helps to understand the recognition mechanism of protein-DNA complexes. Since there are still many limitations on the protein-DNA binding affinity data measured by experiments, accurate and reliable calculation methods are necessarily required. So we put forward a computational approach in this paper, called PreDBA, that can forecast protein-DNA binding affinity effectively by using heterogeneous ensemble models. One hundred protein-DNA complexes are manually collected from the related literature as a data set for protein-DNA binding affinity. Then, 52 sequence and structural features are obtained. Based on this, the correlation between these 52 characteristics and protein-DNA binding affinity is calculated. Furthermore, we found that the protein-DNA binding affinity is affected by the DNA molecule structure of the compound. We classify all protein-DNA compounds into five classifications based on the DNA structure related to the proteins that make up the protein-DNA complexes. In each group, a stacked heterogeneous ensemble model is constructed based on the obtained features. In the end, based on the binding affinity data set, we used the leave-one-out cross-validation to evaluate the proposed method comprehensively. In the five categories, the Pearson correlation coefficient values of our recommended method range from 0.735 to 0.926. We have demonstrated the advantages of the proposed method compared to other machine learning methods and currently existing protein-DNA binding affinity prediction approach.
Collapse
|