Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Pan Y, Wang Z, Zhan W, Deng L. Computational identification of binding energy hot spots in protein-RNA complexes using an ensemble approach. Bioinformatics 2019;34:1473-1480. [PMID: 29281004 DOI: 10.1093/bioinformatics/btx822] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Accepted: 12/19/2017] [Indexed: 11/12/2022] Open

For:	Pan Y, Wang Z, Zhan W, Deng L. Computational identification of binding energy hot spots in protein-RNA complexes using an ensemble approach. Bioinformatics 2019;34:1473-1480. [PMID: 29281004 DOI: 10.1093/bioinformatics/btx822] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Accepted: 12/19/2017] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

Xiao SR, Zhang YK, Liu KY, Huang YX, Liu R. PNBACE: an ensemble algorithm to predict the effects of mutations on protein-nucleic acid binding affinity. BMC Biol 2024;22:203. [PMID: 39256728 PMCID: PMC11389284 DOI: 10.1186/s12915-024-02006-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Accepted: 09/03/2024] [Indexed: 09/12/2024] Open

Li X, Wang GA, Wei Z, Wang H, Zhu X. Protein-DNA interface hotspots prediction based on fusion features of embeddings of protein language model and handcrafted features. Comput Biol Chem 2023;107:107970. [PMID: 37866116 DOI: 10.1016/j.compbiolchem.2023.107970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 10/06/2023] [Accepted: 10/07/2023] [Indexed: 10/24/2023]

Chandra A, Sharma A, Dehzangi I, Tsunoda T, Sattar A. PepCNN deep learning tool for predicting peptide binding residues in proteins using sequence, structural, and language model features. Sci Rep 2023;13:20882. [PMID: 38016996 PMCID: PMC10684570 DOI: 10.1038/s41598-023-47624-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 11/16/2023] [Indexed: 11/30/2023] Open

Pan Y, Li R, Li W, Lv L, Guan J, Zhou S. HPC-Atlas: Computationally Constructing A Comprehensive Atlas of Human Protein Complexes. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023;21:976-990. [PMID: 37730114 PMCID: PMC10928439 DOI: 10.1016/j.gpb.2023.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 04/23/2023] [Accepted: 05/08/2023] [Indexed: 09/22/2023]

Durairaj J, de Ridder D, van Dijk AD. Beyond sequence: Structure-based machine learning. Comput Struct Biotechnol J 2022;21:630-643. [PMID: 36659927 PMCID: PMC9826903 DOI: 10.1016/j.csbj.2022.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/31/2022] Open

Leander M, Liu Z, Cui Q, Raman S. Deep mutational scanning and machine learning reveal structural and molecular rules governing allosteric hotspots in homologous proteins. eLife 2022;11:e79932. [PMID: 36226916 PMCID: PMC9662819 DOI: 10.7554/elife.79932] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 10/13/2022] [Indexed: 01/29/2023] Open

Williams ME, Cloete R. Molecular Modeling of Subtype-Specific Tat Protein Signatures to Predict Tat-TAR Interactions That May Be Involved in HIV-Associated Neurocognitive Disorders. Front Microbiol 2022;13:866611. [PMID: 35464972 PMCID: PMC9021916 DOI: 10.3389/fmicb.2022.866611] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 03/16/2022] [Indexed: 12/30/2022] Open

Zhou T, Rong J, Liu Y, Gong W, Li C. An ensemble approach to predict binding hotspots in protein-RNA interactions based on SMOTE data balancing and random grouping feature selection strategies. Bioinformatics 2022;38:2452-2458. [PMID: 35253843 DOI: 10.1093/bioinformatics/btac138] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 01/15/2022] [Accepted: 03/02/2022] [Indexed: 11/12/2022] Open

Nguyen TB, Myung Y, de Sá AGC, Pires DEV, Ascher DB. mmCSM-NA: accurately predicting effects of single and multiple mutations on protein-nucleic acid binding affinity. NAR Genom Bioinform 2021;3:lqab109. [PMID: 34805992 PMCID: PMC8600011 DOI: 10.1093/nargab/lqab109] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 09/20/2021] [Accepted: 10/27/2021] [Indexed: 02/02/2023] Open

Zhang S, Zhao L, Zheng CH, Xia J. A feature-based approach to predict hot spots in protein-DNA binding interfaces. Brief Bioinform 2021;21:1038-1046. [PMID: 30957840 DOI: 10.1093/bib/bbz037] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 02/20/2019] [Accepted: 03/07/2019] [Indexed: 12/21/2022] Open

Mei LC, Wang YL, Wu FX, Wang F, Hao GF, Yang GF. HISNAPI: a bioinformatic tool for dynamic hot spot analysis in nucleic acid-protein interface with a case study. Brief Bioinform 2021;22:bbaa373. [PMID: 33406224 PMCID: PMC7929440 DOI: 10.1093/bib/bbaa373] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 11/19/2020] [Accepted: 11/23/2020] [Indexed: 01/18/2023] Open

Liu J, Liu S, Liu C, Zhang Y, Pan Y, Wang Z, Wang J, Wen T, Deng L. Nabe: an energetic database of amino acid mutations in protein-nucleic acid binding interfaces. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021;2021:6352208. [PMID: 34389843 PMCID: PMC8363842 DOI: 10.1093/database/baab050] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 07/23/2021] [Accepted: 07/29/2021] [Indexed: 12/17/2022]

Deng L, Li W, Zhang J. LDAH2V: Exploring Meta-Paths Across Multiple Networks for lncRNA-Disease Association Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:1572-1581. [PMID: 31725386 DOI: 10.1109/tcbb.2019.2946257] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Mei LC, Hao GF, Yang GF. Computational methods for predicting hotspots at protein-RNA interfaces. WILEY INTERDISCIPLINARY REVIEWS-RNA 2021;13:e1675. [PMID: 34080311 DOI: 10.1002/wrna.1675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 05/13/2021] [Accepted: 05/14/2021] [Indexed: 11/10/2022]

Zhang S, Wang L, Zhao L, Li M, Liu M, Li K, Bin Y, Xia J. An improved DNA-binding hot spot residues prediction method by exploring interfacial neighbor properties. BMC Bioinformatics 2021;22:253. [PMID: 34000983 PMCID: PMC8130120 DOI: 10.1186/s12859-020-03871-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 11/09/2020] [Indexed: 11/29/2022] Open

Abstract

Background

DNA-binding hot spots are dominant and fundamental residues that contribute most of the binding free energy yet accounting for a small portion of protein–DNA interfaces. As experimental methods for identifying hot spots are time-consuming and costly, high-efficiency computational approaches are emerging as alternative pathways to experimental methods.

Results

Herein, we present a new computational method, termed inpPDH, for hot spot prediction. To improve the prediction performance, we extract hybrid features which incorporate traditional features and new interfacial neighbor properties. To remove redundant and irrelevant features, feature selection is employed using a two-step feature selection strategy. Finally, a subset of 7 optimal features are chosen to construct the predictor using support vector machine. The results on the benchmark dataset show that this proposed method yields significantly better prediction accuracy than those previously published methods in the literature. Moreover, a user-friendly web server for inpPDH is well established and is freely available at http://bioinfo.ahu.edu.cn/inpPDH.

Conclusions

We have developed an accurate improved prediction model, inpPDH, for hot spot residues in protein–DNA binding interfaces by given the structure of a protein–DNA complex. Moreover, we identify a comprehensive and useful feature subset including the proposed interfacial neighbor features that has an important strength for identifying hot spot residues. Our results indicate that these features are more effective than the conventional features considered previously, and that the combination of interfacial neighbor features and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues in protein–DNA complexes.

Supplementary information

Supplementary information accompanies this paper at 10.1186/s12859-020-03871-1.

Collapse

Affiliation(s)

Sijia Zhang Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
Lihua Wang Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
Le Zhao Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
Menglu Li Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
Mengya Liu Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
Ke Li Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
Yannan Bin Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China. .,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China.
Junfeng Xia Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China. .,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China.

Collapse

Yin J, Li X, Li F, Lu Y, Zeng S, Zhu F. Identification of the key target profiles underlying the drugs of narrow therapeutic index for treating cancer and cardiovascular disease. Comput Struct Biotechnol J 2021;19:2318-2328. [PMID: 33995923 PMCID: PMC8105181 DOI: 10.1016/j.csbj.2021.04.035] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 04/09/2021] [Accepted: 04/15/2021] [Indexed: 12/14/2022] Open

Jiang Y, Liu HF, Liu R. Systematic comparison and prediction of the effects of missense mutations on protein-DNA and protein-RNA interactions. PLoS Comput Biol 2021;17:e1008951. [PMID: 33872313 PMCID: PMC8084330 DOI: 10.1371/journal.pcbi.1008951] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 04/29/2021] [Accepted: 04/08/2021] [Indexed: 12/30/2022] Open

Abstract

The binding affinities of protein-nucleic acid interactions could be altered due to missense mutations occurring in DNA- or RNA-binding proteins, therefore resulting in various diseases. Unfortunately, a systematic comparison and prediction of the effects of mutations on protein-DNA and protein-RNA interactions (these two mutation classes are termed MPDs and MPRs, respectively) is still lacking. Here, we demonstrated that these two classes of mutations could generate similar or different tendencies for binding free energy changes in terms of the properties of mutated residues. We then developed regression algorithms separately for MPDs and MPRs by introducing novel geometric partition-based energy features and interface-based structural features. Through feature selection and ensemble learning, similar computational frameworks that integrated energy- and nonenergy-based models were established to estimate the binding affinity changes resulting from MPDs and MPRs, but the selected features for the final models were different and therefore reflected the specificity of these two mutation classes. Furthermore, the proposed methodology was extended to the identification of mutations that significantly decreased the binding affinities. Extensive validations indicated that our algorithm generally performed better than the state-of-the-art methods on both the regression and classification tasks. The webserver and software are freely available at http://liulab.hzau.edu.cn/PEMPNI and https://github.com/hzau-liulab/PEMPNI.

Protein-nucleic acid interactions play important roles in various cellular processes. Missense mutations occurring in DNA- or RNA-binding proteins (termed MPDs and MPRs, respectively) could change the binding affinities of these interactions. Previous studies have compared protein-DNA and protein-RNA interactions from multifaceted viewpoints, but less attention has been given to the similarities and specific differences between the effects of MPDs and MPRs and between the methodologies for predicting the affinity changes induced by the two mutation classes. Therefore, we systematically compared their impacts and demonstrated that MPDs and MPRs could have specific preferences for binding affinity changes. These observations motivated us to construct regression models separately for MPDs and MPRs by introducing novel energy and nonenergy descriptors. Although similar frameworks were developed to estimate these two categories of mutation effects, different descriptors were selected in the regression models and further revealed the specificity of mutation classes. The interplay between the energy and nonenergy modules effectively improved prediction performance. Our algorithm can also be adopted to disentangle mutations significantly decreasing binding affinities from other mutations.

Collapse

Peng J, Lu G, Shang X. A Survey of Network Representation Learning Methods for Link Prediction in Biological Network. Curr Pharm Des 2021;26:3076-3084. [PMID: 31951161 DOI: 10.2174/1381612826666200116145057] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 01/09/2020] [Indexed: 11/22/2022]

Guo Z, Wang P, Liu Z, Zhao Y. Discrimination of Thermophilic Proteins and Non-thermophilic Proteins Using Feature Dimension Reduction. Front Bioeng Biotechnol 2020;8:584807. [PMID: 33195148 PMCID: PMC7642589 DOI: 10.3389/fbioe.2020.584807] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Accepted: 09/11/2020] [Indexed: 01/19/2023] Open

Predicting Hot Spot Residues at Protein-DNA Binding Interfaces Based on Sequence Information. Interdiscip Sci 2020;13:1-11. [PMID: 33068261 DOI: 10.1007/s12539-020-00399-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 09/27/2020] [Accepted: 10/01/2020] [Indexed: 10/23/2022]

Pan Y, Zhou S, Guan J. Computationally identifying hot spots in protein-DNA binding interfaces using an ensemble approach. BMC Bioinformatics 2020;21:384. [PMID: 32938375 PMCID: PMC7495898 DOI: 10.1186/s12859-020-03675-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Abstract

BACKGROUND

Protein-DNA interaction governs a large number of cellular processes, and it can be altered by a small fraction of interface residues, i.e., the so-called hot spots, which account for most of the interface binding free energy. Accurate prediction of hot spots is critical to understand the principle of protein-DNA interactions. There are already some computational methods that can accurately and efficiently predict a large number of hot residues. However, the insufficiency of experimentally validated hot-spot residues in protein-DNA complexes and the low diversity of the employed features limit the performance of existing methods.

RESULTS

Here, we report a new computational method for effectively predicting hot spots in protein-DNA binding interfaces. This method, called PreHots (the abbreviation of Predicting Hotspots), adopts an ensemble stacking classifier that integrates different machine learning classifiers to generate a robust model with 19 features selected by a sequential backward feature selection algorithm. To this end, we constructed two new and reliable datasets (one benchmark for model training and one independent dataset for validation), which totally consist of 123 hot spots and 137 non-hot spots from 89 protein-DNA complexes. The data were manually collected from the literature and existing databases with a strict process of redundancy removal. Our method achieves a sensitivity of 0.813 and an AUC score of 0.868 in 10-fold cross-validation on the benchmark dataset, and a sensitivity of 0.818 and an AUC score of 0.820 on the independent test dataset. The results show that our approach outperforms the existing ones.

CONCLUSIONS

PreHots, which is based on stack ensemble of boosting algorithms, can reliably predict hot spots at the protein-DNA binding interface on a large scale. Compared with the existing methods, PreHots can achieve better prediction performance. Both the webserver of PreHots and the datasets are freely available at: http://dmb.tongji.edu.cn/tools/PreHots/ .

Collapse

Li K, Zhang S, Yan D, Bin Y, Xia J. Prediction of hot spots in protein-DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting. BMC Bioinformatics 2020;21:381. [PMID: 32938395 PMCID: PMC7495874 DOI: 10.1186/s12859-020-03683-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Zhuang H, Zhang Y, Yang S, Cheng L, Liu SL. A Mendelian Randomization Study on Infant Length and Type 2 Diabetes Mellitus Risk. Curr Gene Ther 2020;19:224-231. [PMID: 31553296 DOI: 10.2174/1566523219666190925115535] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/15/2019] [Accepted: 06/16/2019] [Indexed: 12/12/2022]

Zhang N, Lu H, Chen Y, Zhu Z, Yang Q, Wang S, Li M. PremPRI: Predicting the Effects of Missense Mutations on Protein-RNA Interactions. Int J Mol Sci 2020;21:ijms21155560. [PMID: 32756481 PMCID: PMC7432928 DOI: 10.3390/ijms21155560] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 07/28/2020] [Accepted: 07/30/2020] [Indexed: 12/23/2022] Open

Wahab A, Mahmoudi O, Kim J, Chong KT. DNC4mC-Deep: Identification and Analysis of DNA N4-Methylcytosine Sites Based on Different Encoding Schemes By Using Deep Learning. Cells 2020;9:E1756. [PMID: 32707969 PMCID: PMC7465362 DOI: 10.3390/cells9081756] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 07/17/2020] [Accepted: 07/17/2020] [Indexed: 11/24/2022] Open

Zhu X, Liu L, He J, Fang T, Xiong Y, Mitchell JC. iPNHOT: a knowledge-based approach for identifying protein-nucleic acid interaction hot spots. BMC Bioinformatics 2020;21:289. [PMID: 32631222 PMCID: PMC7336410 DOI: 10.1186/s12859-020-03636-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Accepted: 06/25/2020] [Indexed: 12/20/2022] Open

Abstract

Background

The interaction between proteins and nucleic acids plays pivotal roles in various biological processes such as transcription, translation, and gene regulation. Hot spots are a small set of residues that contribute most to the binding affinity of a protein-nucleic acid interaction. Compared to the extensive studies of the hot spots on protein-protein interfaces, the hot spot residues within protein-nucleic acids interfaces remain less well-studied, in part because mutagenesis data for protein-nucleic acids interaction are not as abundant as that for protein-protein interactions.

Results

In this study, we built a new computational model, iPNHOT, to effectively predict hot spot residues on protein-nucleic acids interfaces. One training data set and an independent test set were collected from dbAMEPNI and some recent literature, respectively. To build our model, we generated 97 different sequential and structural features and used a two-step strategy to select the relevant features. The final model was built based only on 7 features using a support vector machine (SVM). The features include two unique features such as ∆SASsa^1/2 and esp3, which are newly proposed in this study. Based on the cross validation results, our model gave F1 score and AUROC as 0.725 and 0.807 on the subset collected from ProNIT, respectively, compared to 0.407 and 0.670 of mCSM-NA, a state-of-the art model to predict the thermodynamic effects of protein-nucleic acid interaction. The iPNHOT model was further tested on the independent test set, which showed that our model outperformed other methods.

Conclusion

In this study, by collecting data from a recently published database dbAMEPNI, we proposed a new model, iPNHOT, to predict hotspots on both protein-DNA and protein-RNA interfaces. The results show that our model outperforms the existing state-of-art models. Our model is available for users through a webserver: http://zhulab.ahu.edu.cn/iPNHOT/.

Collapse

Deng S, Sun Y, Zhao T, Hu Y, Zang T. A Review of Drug Side Effect Identification Methods. Curr Pharm Des 2020;26:3096-3104. [PMID: 32532187 DOI: 10.2174/1381612826666200612163819] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 05/18/2020] [Indexed: 11/22/2022]

Li F, Fan C, Marquez-Lago TT, Leier A, Revote J, Jia C, Zhu Y, Smith AI, Webb GI, Liu Q, Wei L, Li J, Song J. PRISMOID: a comprehensive 3D structure database for post-translational modifications and mutations with functional impact. Brief Bioinform 2020;21:1069-1079. [PMID: 31161204 PMCID: PMC7299293 DOI: 10.1093/bib/bbz050] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 03/26/2019] [Accepted: 03/29/2019] [Indexed: 12/26/2022] Open

Abstract

Post-translational modifications (PTMs) play very important roles in various cell signaling pathways and biological process. Due to PTMs' extremely important roles, many major PTMs have been studied, while the functional and mechanical characterization of major PTMs is well documented in several databases. However, most currently available databases mainly focus on protein sequences, while the real 3D structures of PTMs have been largely ignored. Therefore, studies of PTMs 3D structural signatures have been severely limited by the deficiency of the data. Here, we develop PRISMOID, a novel publicly available and free 3D structure database for a wide range of PTMs. PRISMOID represents an up-to-date and interactive online knowledge base with specific focus on 3D structural contexts of PTMs sites and mutations that occur on PTMs and in the close proximity of PTM sites with functional impact. The first version of PRISMOID encompasses 17 145 non-redundant modification sites on 3919 related protein 3D structure entries pertaining to 37 different types of PTMs. Our entry web page is organized in a comprehensive manner, including detailed PTM annotation on the 3D structure and biological information in terms of mutations affecting PTMs, secondary structure features and per-residue solvent accessibility features of PTM sites, domain context, predicted natively disordered regions and sequence alignments. In addition, high-definition JavaScript packages are employed to enhance information visualization in PRISMOID. PRISMOID equips a variety of interactive and customizable search options and data browsing functions; these capabilities allow users to access data via keyword, ID and advanced options combination search in an efficient and user-friendly way. A download page is also provided to enable users to download the SQL file, computational structural features and PTM sites' data. We anticipate PRISMOID will swiftly become an invaluable online resource, assisting both biologists and bioinformaticians to conduct experiments and develop applications supporting discovery efforts in the sequence-structural-functional relationship of PTMs and providing important insight into mutations and PTM sites interaction mechanisms. The PRISMOID database is freely accessible at http://prismoid.erc.monash.edu/. The database and web interface are implemented in MySQL, JSP, JavaScript and HTML with all major browsers supported.

Collapse

Affiliation(s)

Fuyi Li Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
Cunshuo Fan College of Information Engineering, Northwest A&F University, Yangling, China
Tatiana T Marquez-Lago Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
André Leier Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
Jerico Revote Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia
Cangzhi Jia College of Science, Dalian Maritime University, Dalian, China School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
Yan Zhu Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, Victoria, Australia
A Ian Smith Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia
Geoffrey I Webb Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
Quanzhong Liu College of Information Engineering, Northwest A&F University, Yangling, China
Leyi Wei School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
Jian Li Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia
Jiangning Song Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia

Collapse

Zeng F, Fang G, Yao L. A Deep Neural Network for Identifying DNA N4-Methylcytosine Sites. Front Genet 2020;11:209. [PMID: 32211035 PMCID: PMC7067889 DOI: 10.3389/fgene.2020.00209] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 02/21/2020] [Indexed: 11/25/2022] Open

Huang Q, Zhang J, Wei L, Guo F, Zou Q. 6mA-RicePred: A Method for Identifying DNA N ⁶-Methyladenine Sites in the Rice Genome Based on Feature Fusion. FRONTIERS IN PLANT SCIENCE 2020;11:4. [PMID: 32076430 PMCID: PMC7006724 DOI: 10.3389/fpls.2020.00004] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 01/06/2020] [Indexed: 06/01/2023]

Wang C, Zhao N, Yuan L, Liu X. Computational Detection of Breast Cancer Invasiveness with DNA Methylation Biomarkers. Cells 2020;9:E326. [PMID: 32019269 PMCID: PMC7072524 DOI: 10.3390/cells9020326] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Revised: 01/28/2020] [Accepted: 01/28/2020] [Indexed: 12/14/2022] Open

PreDBA: A heterogeneous ensemble approach for predicting protein-DNA binding affinity. Sci Rep 2020;10:1278. [PMID: 31992738 PMCID: PMC6987227 DOI: 10.1038/s41598-020-57778-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 01/06/2020] [Indexed: 11/17/2022] Open

Tan C, Wang T, Yang W, Deng L. PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction. Molecules 2019;25:molecules25010098. [PMID: 31888057 PMCID: PMC6982935 DOI: 10.3390/molecules25010098] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2019] [Revised: 12/20/2019] [Accepted: 12/21/2019] [Indexed: 11/16/2022] Open

Deng L, Zhong G, Liu C, Luo J, Liu H. MADOKA: an ultra-fast approach for large-scale protein structure similarity searching. BMC Bioinformatics 2019;20:662. [PMID: 31870277 PMCID: PMC6929402 DOI: 10.1186/s12859-019-3235-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 11/14/2019] [Indexed: 01/22/2023] Open

Nithin C, Mukherjee S, Bahadur RP. A structure-based model for the prediction of protein-RNA binding affinity. RNA (NEW YORK, N.Y.) 2019;25:1628-1645. [PMID: 31395671 PMCID: PMC6859855 DOI: 10.1261/rna.071779.119] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2019] [Accepted: 08/05/2019] [Indexed: 05/28/2023]

Zheng N, Wang K, Zhan W, Deng L. Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches. Curr Drug Metab 2019;20:177-184. [PMID: 30156155 DOI: 10.2174/1389200219666180829121038] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/21/2018] [Accepted: 08/02/2018] [Indexed: 01/15/2023]

Wang X, Wang Y, Xu Z, Xiong Y, Wei DQ. ATC-NLSP: Prediction of the Classes of Anatomical Therapeutic Chemicals Using a Network-Based Label Space Partition Method. Front Pharmacol 2019;10:971. [PMID: 31543820 PMCID: PMC6739564 DOI: 10.3389/fphar.2019.00971] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Accepted: 07/29/2019] [Indexed: 01/12/2023] Open

Abstract

Anatomical Therapeutic Chemical (ATC) classification system proposed by the World Health Organization is a widely accepted drug classification scheme in both academic and industrial realm. It is a multilabeling system which categorizes drugs into multiple classes according to their therapeutic, pharmacological, and chemical attributes. In this study, we adopted a data-driven network-based label space partition (NLSP) method for prediction of ATC classes of a given compound within the multilabel learning framework. The proposed method ATC-NLSP is trained on the similarity-based features such as chemical–chemical interaction and structural and fingerprint similarities of a compound to other compounds belonging to the different ATC categories. The NLSP method trains predictors for each label cluster (possibly intersecting) detected by community detection algorithms and takes the ensemble labels for a compound as final prediction. Experimental evaluation based on the jackknife test on the benchmark dataset demonstrated that our method has boosted the absolute true rate, which is the most stringent evaluation metrics in this study, from 0.6330 to 0.7497, in comparison to the state-of-the-art approaches. Moreover, the community structures of the label relation graph were detected through the label propagation method. The advantage of multilabel learning over the single-label models was shown by label-wise analysis. Our study indicated that the proposed method ATC-NLSP, which adopts ideas from network research community and captures the correlation of labels in a data driven manner, is the top-performing model in the ATC prediction task. We believed that the power of NLSP remains to be unleashed for the multilabel learning tasks in drug discovery. The source codes are freely available at https://github.com/dqwei-lab/ATC.

Collapse

Lv Z, Jin S, Ding H, Zou Q. A Random Forest Sub-Golgi Protein Classifier Optimized via Dipeptide and Amino Acid Composition Features. Front Bioeng Biotechnol 2019;7:215. [PMID: 31552241 PMCID: PMC6737778 DOI: 10.3389/fbioe.2019.00215] [Citation(s) in RCA: 80] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 08/22/2019] [Indexed: 02/01/2023] Open

Abstract

To gain insight into the malfunction of the Golgi apparatus and its relationship to various genetic and neurodegenerative diseases, the identification of sub-Golgi proteins, both cis-Golgi and trans-Golgi proteins, is of great significance. In this study, a state-of-art random forests sub-Golgi protein classifier, rfGPT, was developed. The rfGPT used 2-gap dipeptide and split amino acid composition for the feature vectors and was combined with the synthetic minority over-sampling technique (SMOTE) and an analysis of variance (ANOVA) feature selection method. The rfGPT was trained on a sub-Golgi protein sequence data set (137 sequences), with sequence identity less than 25%. For the optimal rfGPT classifier with 93 features, the accuracy (ACC) was 90.5%; the Matthews correlation coefficient (MCC) was 0.811; the sensitivity (Sn) was 92.6%; and the specificity (Sp) was 88.4%. The independent testing scores for the rfGPT were ACC = 90.6%; MCC = 0.696; Sn = 96.1%; and Sp = 69.2%. Although the independent testing accuracy was 4.4% lower than that for the best reported sub-Golgi classifier trained on a data set with 40% sequence identity (304 sequences), the rfGPT is currently the top sub-Golgi protein predictor utilizing feature vectors without any position-specific scoring matrix and its derivative features. Therefore, the rfGPT is a more practical tool, because no sequence alignment is required with tens of millions of protein sequences. To date, the rfGPT is the Golgi classifier with the best independent testing scores, optimized by training on smaller benchmark data sets. Feature importance analysis proves that the non-polar and aliphatic residues composition, the (aromatic residues) + (non-polar, aliphatic residues) dipeptide and aromatic residues composition between NH2-termial and COOH-terminal of protein sequences are the three top biological features for distinguishing the sub-Golgi proteins.

Collapse

Deng L, Yang W, Liu H. PredPRBA: Prediction of Protein-RNA Binding Affinity Using Gradient Boosted Regression Trees. Front Genet 2019;10:637. [PMID: 31428122 PMCID: PMC6688581 DOI: 10.3389/fgene.2019.00637] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Accepted: 06/18/2019] [Indexed: 01/24/2023] Open

Mabonga L, Kappo AP. Protein-protein interaction modulators: advances, successes and remaining challenges. Biophys Rev 2019;11:559-581. [PMID: 31301019 PMCID: PMC6682198 DOI: 10.1007/s12551-019-00570-x] [Citation(s) in RCA: 114] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 06/24/2019] [Indexed: 12/12/2022] Open

Cheng N, Li M, Zhao L, Zhang B, Yang Y, Zheng CH, Xia J. Comparison and integration of computational methods for deleterious synonymous mutation prediction. Brief Bioinform 2019;21:970-981. [DOI: 10.1093/bib/bbz047] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 03/28/2019] [Accepted: 03/29/2019] [Indexed: 01/03/2023] Open

Abstract Abstract Synonymous mutations do not change the encoded amino acids but may alter the structure or function of an mRNA in ways that impact gene function. Advances in next generation sequencing technologies have detected numerous synonymous mutations in the human genome. Several computational models have been proposed to predict deleterious synonymous mutations, which have greatly facilitated the development of this important field. Consequently, there is an urgent need to assess the state-of-the-art computational methods for deleterious synonymous mutation prediction to further advance the existing methodologies and to improve performance. In this regard, we systematically compared a total of 10 computational methods (including specific method for deleterious synonymous mutation and general method for single nucleotide mutation) in terms of the algorithms used, calculated features, performance evaluation and software usability. In addition, we constructed two carefully curated independent test datasets and accordingly assessed the robustness and scalability of these different computational methods for the identification of deleterious synonymous mutations. In an effort to improve predictive performance, we established an ensemble model, named Prediction of Deleterious Synonymous Mutation (PrDSM), which averages the ratings generated by the three most accurate predictors. Our benchmark tests demonstrated that the ensemble model PrDSM outperformed the reviewed tools for the prediction of deleterious synonymous mutations. Using the ensemble model, we developed an accessible online predictor, PrDSM, available at http://bioinfo.ahu.edu.cn:8080/PrDSM/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for deleterious synonymous mutation prediction. Collapse

Deng L, Sui Y, Zhang J. XGBPRH: Prediction of Binding Hot Spots at Protein⁻RNA Interfaces Utilizing Extreme Gradient Boosting. Genes (Basel) 2019;10:genes10030242. [PMID: 30901953 PMCID: PMC6471955 DOI: 10.3390/genes10030242] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 03/14/2019] [Accepted: 03/15/2019] [Indexed: 01/24/2023] Open

Li Y, Niu M, Zou Q. ELM-MHC: An Improved MHC Identification Method with Extreme Learning Machine Algorithm. J Proteome Res 2019;18:1392-1401. [DOI: 10.1021/acs.jproteome.9b00012] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Deng L, Wang J, Zhang J. Predicting Gene Ontology Function of Human MicroRNAs by Integrating Multiple Networks. Front Genet 2019;10:3. [PMID: 30761178 PMCID: PMC6361788 DOI: 10.3389/fgene.2019.00003] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Accepted: 01/07/2019] [Indexed: 12/15/2022] Open

Qu K, Wei L, Yu J, Wang C. Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods. FRONTIERS IN PLANT SCIENCE 2019;9:1961. [PMID: 30687359 PMCID: PMC6335366 DOI: 10.3389/fpls.2018.01961] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 12/17/2018] [Indexed: 05/04/2023]

Liu Q, Chen P, Wang B, Zhang J, Li J. Hot spot prediction in protein-protein interactions by an ensemble system. BMC SYSTEMS BIOLOGY 2018;12:132. [PMID: 30598091 PMCID: PMC6311905 DOI: 10.1186/s12918-018-0665-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Deng L, Pan J, Xu X, Yang W, Liu C, Liu H. PDRLGB: precise DNA-binding residue prediction using a light gradient boosting machine. BMC Bioinformatics 2018;19:522. [PMID: 30598073 PMCID: PMC6311926 DOI: 10.1186/s12859-018-2527-1] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Gan Y, Li N, Zou G, Xin Y, Guan J. Identification of cancer subtypes from single-cell RNA-seq data using a consensus clustering method. BMC Med Genomics 2018;11:117. [PMID: 30598115 PMCID: PMC6311928 DOI: 10.1186/s12920-018-0433-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open

Integrating Multiple Interaction Networks for Gene Function Inference. Molecules 2018;24:molecules24010030. [PMID: 30577643 PMCID: PMC6337127 DOI: 10.3390/molecules24010030] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 12/19/2018] [Accepted: 12/20/2018] [Indexed: 01/17/2023] Open

Zeng C, Zhan W, Deng L. SDADB: a functional annotation database of protein structural domains. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018;2018:5046758. [PMID: 29961821 PMCID: PMC6025185 DOI: 10.1093/database/bay064] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Accepted: 06/04/2018] [Indexed: 12/27/2022]