1
|
Zhang Y, Yao S, Chen P. Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network. PLoS One 2023; 18:e0290899. [PMID: 37721924 PMCID: PMC10506709 DOI: 10.1371/journal.pone.0290899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 08/18/2023] [Indexed: 09/20/2023] Open
Abstract
Protein hotspot residues are key sites that mediate protein-protein interactions. Accurate identification of these residues is essential for understanding the mechanism from protein to function and for designing drug targets. Current research has mostly focused on using machine learning methods to predict hot spots from known interface residues, which artificially extract the corresponding features of amino acid residues from sequence, structure, evolution, energy, and other information to train and test machine learning models. The process is cumbersome, time-consuming and laborious to some extent. This paper proposes a novel idea that develops a pre-trained protein sequence embedding model combined with a one-dimensional convolutional neural network, called Embed-1dCNN, to predict protein hotspot residues. In order to obtain large data samples, this work integrates and extracts data from the datasets of ASEdb, BID, SKEMPI and dbMPIKT to generate a new dataset, and adopts the SMOTE algorithm to expand positive samples to form the training set. The experimental results show that the method achieves an F1 score of 0.82 on the test set. Compared with other hot spot prediction methods, our model achieved better prediction performance.
Collapse
Affiliation(s)
- Youzhi Zhang
- School of Computer and Information, Anqing Normal University, Anqing, China
- University Key Laboratory of Intelligent Perception and Computing of Anhui Province, Anqing Normal University, Anqing, China
- National Engineering Research Center for Agro-Ecological Big Data Analysis & Application, Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology & School of Internet, Anhui University, Anhui, China
| | - Sijie Yao
- National Engineering Research Center for Agro-Ecological Big Data Analysis & Application, Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology & School of Internet, Anhui University, Anhui, China
| | - Peng Chen
- School of Computer and Information, Anqing Normal University, Anqing, China
- National Engineering Research Center for Agro-Ecological Big Data Analysis & Application, Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology & School of Internet, Anhui University, Anhui, China
| |
Collapse
|
2
|
Li M, Wu Z, Wang W, Lu K, Zhang J, Zhou Y, Chen Z, Li D, Zheng S, Chen P, Wang B. Protein-Protein Interaction Sites Prediction Based on an Under-Sampling Strategy and Random Forest Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3646-3654. [PMID: 34705656 DOI: 10.1109/tcbb.2021.3123269] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The computational methods of protein-protein interaction sites prediction can effectively avoid the shortcomings of high cost and time in traditional experimental approaches. However, the serious class imbalance between interface and non-interface residues on the protein sequences limits the prediction performance of these methods. This work therefore proposed a new strategy, NearMiss-based under-sampling for unbalancing datasets and Random Forest classification (NM-RF), to predict protein interaction sites. Herein, the residues on protein sequences were represented by the PSSM-derived features, hydropathy index (HI) and relative solvent accessibility (RSA). In order to resolve the class imbalance problem, an under-sampling method based on NearMiss algorithm is adopted to remove some non-interface residues, and then the random forest algorithm is used to perform binary classification on the balanced feature datasets. Experiments show that the accuracy of NM-RF model reaches 87.6% and 84.3% on Dtestset72 and PDBtestset164 respectively, which demonstrate the effectiveness of the proposed NM-RF method in differentiating the interface or non-interface residues.
Collapse
|
3
|
Kitsiranuwat S, Suratanee A, Plaimas K. Integration of various protein similarities using random forest technique to infer augmented drug-protein matrix for enhancing drug-disease association prediction. Sci Prog 2022; 105:368504221109215. [PMID: 35801312 PMCID: PMC10358641 DOI: 10.1177/00368504221109215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
Identifying new therapeutic indications for existing drugs is a major challenge in drug repositioning. Most computational drug repositioning methods focus on known targets. Analyzing multiple aspects of various protein associations provides an opportunity to discover underlying drug-associated proteins that can be used to improve the performance of the drug repositioning approaches. In this study, machine learning models were developed based on the similarities of diversified biological features, including protein interaction, topological network, sequence alignment, and biological function to predict protein pairs associating with the same drugs. The crucial set of features was identified, and the high performances of protein pair predictions were achieved with an area under the curve (AUC) value of more than 93%. Based on drug chemical structures, the drug similarity levels of the promising protein pairs were used to quantify the inferred drug-associated proteins. Furthermore, these proteins were employed to establish an augmented drug-protein matrix to enhance the efficiency of three existing drug repositioning techniques: a similarity constrained matrix factorization for the drug-disease associations (SCMFDD), an ensemble meta-paths and singular value decomposition (EMP-SVD) model, and a topology similarity and singular value decomposition (TS-SVD) technique. The results showed that the augmented matrix helped to improve the performance up to 4% more in comparison to the original matrix for SCMFDD and EMP-SVD, and about 1% more for TS-SVD. In summary, inferring new protein pairs related to the same drugs increase the opportunity to reveal missing drug-associated proteins that are important for drug development via the drug repositioning technique.
Collapse
Affiliation(s)
- Satanat Kitsiranuwat
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
- Advanced Virtual and Intelligent Computing (AVIC) center, Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Apichat Suratanee
- Department of Mathematics, Faculty of Applied Science, King Mongkut's University of Technology North Bangkok, Bangkok, Thailand
- Intelligent and Nonlinear Dynamic Innovations Research Center, Science and Technology Research Institute, King Mongkut's University of Technology North Bangkok, Bangkok, Thailand
| | - Kitiporn Plaimas
- Advanced Virtual and Intelligent Computing (AVIC) center, Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
- Omics Sciences and Bioinformatics Center, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| |
Collapse
|
4
|
Chen YC, Chen YH, Wright JD, Lim C. PPI-Hotspot DB: Database of Protein-Protein Interaction Hot Spots. J Chem Inf Model 2022; 62:1052-1060. [PMID: 35147037 DOI: 10.1021/acs.jcim.2c00025] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Single-point mutations of certain residues (so-called hot spots) impair/disrupt protein-protein interactions (PPIs), leading to pathogenesis and drug resistance. Conventionally, a PPI-hot spot is identified when its replacement decreased the binding free energy significantly, generally by ≥2 kcal/mol. The relatively few mutations with such a significant binding free energy drop limited the number of distinct PPI-hot spots. By defining PPI-hot spots based on mutations that have been manually curated in UniProtKB to significantly impair/disrupt PPIs in addition to binding free energy changes, we have greatly expanded the number of distinct PPI-hot spots by an order of magnitude. These experimentally determined PPI-hot spots along with available structures have been collected in a database called PPI-HotspotDB. We have applied the PPI-HotspotDB to create a nonredundant benchmark, PPI-Hotspot+PDBBM, for assessing methods to predict PPI-hot spots using the free structure as input. PPI-HotspotDB will benefit the design of mutagenesis experiments and development of PPI-hot spot prediction methods. The database and benchmark are freely available at https://ppihotspot.limlab.dnsalias.org.
Collapse
Affiliation(s)
- Yao Chi Chen
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
| | - Yu-Hsien Chen
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
| | - Jon D Wright
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
| | - Carmay Lim
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan.,Department of Chemistry, National Tsing Hua University, Hsinchu 300, Taiwan
| |
Collapse
|
5
|
A two-step ensemble learning for predicting protein hot spot residues from whole protein sequence. Amino Acids 2022; 54:765-776. [DOI: 10.1007/s00726-022-03129-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 01/17/2022] [Indexed: 11/26/2022]
|
6
|
Hu J, Zhou L, Li B, Zhang X, Chen N. Improve hot region prediction by analyzing different machine learning algorithms. BMC Bioinformatics 2021; 22:522. [PMID: 34696728 PMCID: PMC8543831 DOI: 10.1186/s12859-021-04420-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 09/08/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the process of designing drugs and proteins, it is crucial to recognize hot regions in protein-protein interactions. Each hot region of protein-protein interaction is composed of at least three hot spots, which play an important role in binding. However, it takes time and labor force to identify hot spots through biological experiments. If predictive models based on machine learning methods can be trained, the drug design process can be effectively accelerated. RESULTS The results show that different machine learning algorithms perform similarly, as evaluating using the F-measure. The main differences between these methods are recall and precision. Since the key attribute of hot regions is that they are packed tightly, we used the cluster algorithm to predict hot regions. By combining Gaussian Naïve Bayes and DBSCAN, the F-measure of hot region prediction can reach 0.809. CONCLUSIONS In this paper, different machine learning models such as Gaussian Naïve Bayes, SVM, Xgboost, Random Forest, and Artificial Neural Network are used to predict hot spots. The experiment results show that the combination of hot spot classification algorithm with higher recall rate and clustering algorithm with higher precision can effectively improve the accuracy of hot region prediction.
Collapse
Affiliation(s)
- Jing Hu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China.,Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, 430065, Hubei, China
| | - Longwei Zhou
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China.,Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, 430065, Hubei, China
| | - Bo Li
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China.,Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, 430065, Hubei, China
| | - Xiaolong Zhang
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China. .,Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, 430065, Hubei, China.
| | - Nansheng Chen
- Molecular Biology and Biochemistry, Simon Fraser University, Vancouver, BC, Canada.
| |
Collapse
|
7
|
Zhang S, Zhao L, Zheng CH, Xia J. A feature-based approach to predict hot spots in protein-DNA binding interfaces. Brief Bioinform 2021; 21:1038-1046. [PMID: 30957840 DOI: 10.1093/bib/bbz037] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 02/20/2019] [Accepted: 03/07/2019] [Indexed: 12/21/2022] Open
Abstract
DNA-binding hot spot residues of proteins are dominant and fundamental interface residues that contribute most of the binding free energy of protein-DNA interfaces. As experimental methods for identifying hot spots are expensive and time consuming, computational approaches are urgently required in predicting hot spots on a large scale. In this work, we systematically assessed a wide variety of 114 features from a combination of the protein sequence, structure, network and solvent accessible information and their combinations along with various feature selection strategies for hot spot prediction. We then trained and compared four commonly used machine learning models, namely, support vector machine (SVM), random forest, Naïve Bayes and k-nearest neighbor, for the identification of hot spots using 10-fold cross-validation and the independent test set. Our results show that (1) features based on the solvent accessible surface area have significant effect on hot spot prediction; (2) different but complementary features generally enhance the prediction performance; and (3) SVM outperforms other machine learning methods on both training and independent test sets. In an effort to improve predictive performance, we developed a feature-based method, namely, PrPDH (Prediction of Protein-DNA binding Hot spots), for the prediction of hot spots in protein-DNA binding interfaces using SVM based on the selected 10 optimal features. Comparative results on benchmark data sets indicate that our predictor is able to achieve generally better performance in predicting hot spots compared to the state-of-the-art predictors. A user-friendly web server for PrPDH is well established and is freely available at http://bioinfo.ahu.edu.cn:8080/PrPDH.
Collapse
Affiliation(s)
- Sijia Zhang
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Le Zhao
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Chun-Hou Zheng
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Junfeng Xia
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| |
Collapse
|
8
|
Deng R, Tao M, Xing H, Yang X, Liu C, Liao K, Qi L. Automatic Diagnosis of Rice Diseases Using Deep Learning. FRONTIERS IN PLANT SCIENCE 2021; 12:701038. [PMID: 34490004 PMCID: PMC8416767 DOI: 10.3389/fpls.2021.701038] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 07/20/2021] [Indexed: 06/01/2023]
Abstract
Rice disease has serious negative effects on crop yield, and the correct diagnosis of rice diseases is the key to avoid these effects. However, the existing disease diagnosis methods for rice are neither accurate nor efficient, and special equipment is often required. In this study, an automatic diagnosis method was developed and implemented in a smartphone app. The method was developed using deep learning based on a large dataset that contained 33,026 images of six types of rice diseases: leaf blast, false smut, neck blast, sheath blight, bacterial stripe disease, and brown spot. The core of the method was the Ensemble Model in which submodels were integrated. Finally, the Ensemble Model was validated using a separate set of images. Results showed that the three best submodels were DenseNet-121, SE-ResNet-50, and ResNeSt-50, in terms of several attributes, such as, learning rate, precision, recall, and disease recognition accuracy. Therefore, these three submodels were selected and integrated in the Ensemble Model. The Ensemble Model minimized confusion among the different types of disease, reducing misdiagnosis of the disease. Using the Ensemble Model to diagnose six types of rice diseases, an overall accuracy of 91% was achieved, which is considered to be reasonably good, considering the appearance similarities in some types of rice disease. The smartphone app allowed the client to use the Ensemble Model on the web server through a network, which was convenient and efficient for the field diagnosis of rice leaf blast, false smut, neck blast, sheath blight, bacterial stripe disease, and brown spot.
Collapse
Affiliation(s)
- Ruoling Deng
- College of Engineering, South China Agricultural University, Guangzhou, China
| | - Ming Tao
- College of Engineering, South China Agricultural University, Guangzhou, China
| | - Hang Xing
- College of Engineering, South China Agricultural University, Guangzhou, China
| | - Xiuli Yang
- College of Engineering, South China Agricultural University, Guangzhou, China
| | - Chuang Liu
- College of Engineering, South China Agricultural University, Guangzhou, China
| | - Kaifeng Liao
- College of Engineering, South China Agricultural University, Guangzhou, China
| | - Long Qi
- College of Engineering, South China Agricultural University, Guangzhou, China
- Lingnan Guangdong Laboratory of Modern Agriculture, Guangzhou, China
| |
Collapse
|
9
|
Mahapatra S, Sahu SS. Integrating Resonant Recognition Model and Stockwell Transform for Localization of Hotspots in Tubulin. IEEE Trans Nanobioscience 2021; 20:345-353. [PMID: 33950844 DOI: 10.1109/tnb.2021.3077710] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Tubulin is a promising target for designing anti-cancer drugs. Identification of hotspots in multifunctional Tubulin protein provides insights for new drug discovery. Although machine learning techniques have shown significant results in prediction, they fail to identify the hotspots corresponding to a particular biological function. This paper presents a signal processing technique combining resonant recognition model (RRM) and Stockwell Transform (ST) for the identification of hotspots corresponding to a particular functionality. The characteristic frequency (CF) representing a specific biological function is determined using the RRM. Then the spectrum of the protein sequence is computed using ST. The CF is filtered from the ST spectrum using a time-frequency mask. The energy peaks in the filtered sequence represent the hotspots. The hotspots predicted by the proposed method are compared with the experimentally detected binding residues of Tubulin stabilizing drug Taxol and destabilizing drug Colchicine present in the Tubulin protein. Out of the 53 experimentally identified hotspots, 60% are predicted by the proposed method whereas around 20% are predicted by existing machine learning based methods. Additionally, the proposed method predicts some new hot spots, which may be investigated.
Collapse
|
10
|
Mei LC, Hao GF, Yang GF. Computational methods for predicting hotspots at protein-RNA interfaces. WILEY INTERDISCIPLINARY REVIEWS-RNA 2021; 13:e1675. [PMID: 34080311 DOI: 10.1002/wrna.1675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 05/13/2021] [Accepted: 05/14/2021] [Indexed: 11/10/2022]
Abstract
Protein-RNA interactions play essential roles in many critical biological events. A comprehensive understanding of the mechanisms underlying these interactions is helpful when studying cellular activities and therapeutic applications. Hotspots are a small portion of residues contributing much toward protein-RNA binding affinity. In pharmaceutical research, the hotspot residues are seen as the best option for designing small molecules to target proteins of therapeutic interest. With the accumulation of experimental data about protein-RNA interactions, computational methods have been produced for hotspot prediction on a large scale. In this review, we first present an overview of the existing databases for protein-RNA binding data. Furthermore, we outline the most adopted computational methods for hotspots prediction in protein-RNA interactions. Finally, we discuss the applications of hotspot prediction. This article is categorized under: RNA Interactions with Proteins and Other Molecules > Protein-RNA Recognition RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications RNA Methods > RNA Analyses In Vitro and In Silico.
Collapse
Affiliation(s)
- Long-Can Mei
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China
| | - Ge-Fei Hao
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China.,State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Research and Development Center for Fine Chemicals, Guizhou University, Guiyang, China
| | - Guang-Fu Yang
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China.,Collaborative Innovation Center of Chemical Science and Engineering, Tianjin, China
| |
Collapse
|
11
|
Shirafkan F, Gharaghani S, Rahimian K, Sajedi RH, Zahiri J. Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods. BMC Bioinformatics 2021; 22:261. [PMID: 34030624 PMCID: PMC8142502 DOI: 10.1186/s12859-021-04194-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 05/13/2021] [Indexed: 12/18/2022] Open
Abstract
Background Moonlighting proteins (MPs) are a subclass of multifunctional proteins in which more than one independent or usually distinct function occurs in a single polypeptide chain. Identification of unknown cellular processes, understanding novel protein mechanisms, improving the prediction of protein functions, and gaining information about protein evolution are the main reasons to study MPs. They also play an important role in disease pathways and drug-target discovery. Since detecting MPs experimentally is quite a challenge, most of them are detected randomly. Therefore, introducing an appropriate computational approach to predict MPs seems reasonable. Results In this study, we introduced a competent model for detecting moonlighting and non-MPs through extracted features from protein sequences. We attempted to set up a well-judged scheme for detecting outlier proteins. Consequently, 37 distinct feature vectors were utilized to study each protein’s impact on detecting MPs. Furthermore, 8 different classification methods were assessed to find the best performance. To detect outliers, each one of the classifications was executed 100 times by tenfold cross-validation on feature vectors; proteins which misclassified 90 times or more were grouped. This process was applied to every single feature vector and eventually the intersection of these groups was determined as the outlier proteins. The results of tenfold cross-validation on a dataset of 351 samples (containing 215 moonlighting and 136 non-moonlighting proteins) reveal that the SVM method on all feature vectors has the highest performance among all methods in this study and other available methods. Besides, the study of outliers showed that 57 of 351 proteins in the dataset could be an appropriate candidate for the outlier. Among the outlier proteins, there were non-MPs (such as P69797) that have been misclassified in 8 different classification methods with 16 different feature vectors. Because these proteins have been obtained by computational methods, the results of this study could reduce the likelihood of hypothesizing whether these proteins are non-moonlighting at all. Conclusions MPs are difficult to be identified through experimentation. Using distinct feature vectors, our method enabled identification of novel moonlighting proteins. The study also pinpointed that a number of non-MPs are likely to be moonlighting. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04194-5.
Collapse
Affiliation(s)
- Farshid Shirafkan
- Laboratory of Bioinformatics and Drug Design, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Sajjad Gharaghani
- Laboratory of Bioinformatics and Drug Design, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| | - Karim Rahimian
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Reza Hasan Sajedi
- Department of Biochemistry, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Javad Zahiri
- Department of Neuroscience, University of California San Diego, La Jolla, CA, USA. .,Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
12
|
Wang B, Mei C, Wang Y, Zhou Y, Cheng MT, Zheng CH, Wang L, Zhang J, Chen P, Xiong Y. Imbalance Data Processing Strategy for Protein Interaction Sites Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:985-994. [PMID: 31751283 DOI: 10.1109/tcbb.2019.2953908] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein-protein interactions play essential roles in various biological progresses. Identifying protein interaction sites can facilitate researchers to understand life activities and therefore will be helpful for drug design. However, the number of experimental determined protein interaction sites is far less than that of protein sites in protein-protein interaction or protein complexes. Therefore, the negative and positive samples are usually imbalanced, which is common but bring result bias on the prediction of protein interaction sites by computational approaches. In this work, we presented three imbalance data processing strategies to reconstruct the original dataset, and then extracted protein features from the evolutionary conservation of amino acids to build a predictor for identification of protein interaction sites. On a dataset with 10,430 surface residues but only 2,299 interface residues, the imbalance dataset processing strategies can obviously reduce the prediction bias, and therefore improve the prediction performance of protein interaction sites. The experimental results show that our prediction models can achieve a better prediction performance, such as a prediction accuracy of 0.758, or a high F-measure of 0.737, which demonstrated the effectiveness of our method.
Collapse
|
13
|
Chen P, Shen T, Zhang Y, Wang B. A Sequence-segment Neighbor Encoding Schema for Protein Hotspot Residue Prediction. Curr Bioinform 2020. [DOI: 10.2174/1574893615666200106115421] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Hotspots are those residues that contribute major free energy of binding
in protein-protein interactions. Protein functions are frequently dependent on hotspot residues. At
present, hotspot residues are always identified by Alanine scanning mutagenesis technology,
which is costly, time-consuming and laborious.
Objective:
Therefore, more accurate and efficient methods have to be developed to identify protein
hotspot residues.
Methods:
This paper proposed a novel encoding schema of sequence-segment neighbors and
constructed a random forest-based model to identify hotspots in protein interaction interfaces.
Firstly, 10 amino acid physicochemical properties, 16 features related to the PI and DI, and 25
features related to ASA were extracted. Different from the previous residue encoding schemas,
such as auto correlation descriptor or triplet combination information, this paper employed the
influence of amino acids neighbors to hotspot residues and amino acids with a certain distance in
sequence to the hotspot.
Results:
Moreover, the proposed model was compared with other hotspot prediction methods,
including APIS, Robetta, FOLDEF, KFC, MINERVA models, etc.
Conclusion:
The experimental results showed that the proposed model can improve the prediction
ability of protein hotspot residues on the same test set.
Collapse
Affiliation(s)
- Peng Chen
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology & School of Internet, Anhui University, 230601 Hefei, Anhui, China
| | - Tong Shen
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology & School of Internet, Anhui University, 230601 Hefei, Anhui, China
| | - Youzhi Zhang
- School of Computer and Information, Anqing Normal University, 246133 Anqing, Anhui, China
| | - Bing Wang
- School of Electrical and Information Engineering, Anhui University of Technology, 243032 Ma'anshan, Anhui, China
| |
Collapse
|
14
|
Wu R, Prabhu R, Ozkan A, Sitharam M. Rapid prediction of crucial hotspot interactions for icosahedral viral capsid self-assembly by energy landscape atlasing validated by mutagenesis. PLoS Comput Biol 2020; 16:e1008357. [PMID: 33079933 PMCID: PMC7598928 DOI: 10.1371/journal.pcbi.1008357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 10/30/2020] [Accepted: 09/22/2020] [Indexed: 02/07/2023] Open
Abstract
Icosahedral viruses are under a micrometer in diameter, their infectious genome encapsulated by a shell assembled by a multiscale process, starting from an integer multiple of 60 viral capsid or coat protein (VP) monomers. We predict and validate inter-atomic hotspot interactions between VP monomers that are important for the assembly of 3 types of icosahedral viral capsids: Adeno Associated Virus serotype 2 (AAV2) and Minute Virus of Mice (MVM), both T = 1 single stranded DNA viruses, and Bromo Mosaic Virus (BMV), a T = 3 single stranded RNA virus. Experimental validation is by in-vitro, site-directed mutagenesis data found in literature. We combine ab-initio predictions at two scales: at the interface-scale, we predict the importance (cruciality) of an interaction for successful subassembly across each interface between symmetry-related VP monomers; and at the capsid-scale, we predict the cruciality of an interface for successful capsid assembly. At the interface-scale, we measure cruciality by changes in the capsid free-energy landscape partition function when an interaction is removed. The partition function computation uses atlases of interface subassembly landscapes, rapidly generated by a novel geometric method and curated opensource software EASAL (efficient atlasing and search of assembly landscapes). At the capsid-scale, cruciality of an interface for successful assembly of the capsid is based on combinatorial entropy. Our study goes all the way from resource-light, multiscale computational predictions of crucial hotspot inter-atomic interactions to validation using data on site-directed mutagenesis' effect on capsid assembly. By reliably and rapidly narrowing down target interactions, (no more than 1.5 hours per interface on a laptop with Intel Core i5-2500K @ 3.2 Ghz CPU and 8GB of RAM) our predictions can inform and reduce time-consuming in-vitro and in-vivo experiments, or more computationally intensive in-silico analyses.
Collapse
Affiliation(s)
- Ruijin Wu
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Rahul Prabhu
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Aysegul Ozkan
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Meera Sitharam
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
15
|
Preto AJ, Moreira IS. SPOTONE: Hot Spots on Protein Complexes with Extremely Randomized Trees via Sequence-Only Features. Int J Mol Sci 2020; 21:ijms21197281. [PMID: 33019775 PMCID: PMC7582262 DOI: 10.3390/ijms21197281] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 09/26/2020] [Accepted: 09/30/2020] [Indexed: 01/02/2023] Open
Abstract
Protein Hot-Spots (HS) are experimentally determined amino acids, key to small ligand binding and tend to be structural landmarks on protein–protein interactions. As such, they were extensively approached by structure-based Machine Learning (ML) prediction methods. However, the availability of a much larger array of protein sequences in comparison to determined tree-dimensional structures indicates that a sequence-based HS predictor has the potential to be more useful for the scientific community. Herein, we present SPOTONE, a new ML predictor able to accurately classify protein HS via sequence-only features. This algorithm shows accuracy, AUROC, precision, recall and F1-score of 0.82, 0.83, 0.91, 0.82 and 0.85, respectively, on an independent testing set. The algorithm is deployed within a free-to-use webserver, only requiring the user to submit a FASTA file with one or more protein sequences.
Collapse
Affiliation(s)
- A. J. Preto
- CNC—Center for Neuroscience and Cell Biology, University of Coimbra, 3004-504 Coimbra, Portugal;
| | - Irina S. Moreira
- Department of Life Sciences, Center for Neuroscience and Cell Biology, Coimbra University, 3000-456 Coimbra, Portugal
- Correspondence:
| |
Collapse
|
16
|
Lin X, Zhang X, Xu X. Efficient Classification of Hot Spots and Hub Protein Interfaces by Recursive Feature Elimination and Gradient Boosting. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1525-1534. [PMID: 31380766 DOI: 10.1109/tcbb.2019.2931717] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Proteins are not isolated biological molecules, which have the specific three-dimensional structures and interact with other proteins to perform functions. A small number of residues (hot spots) in protein-protein interactions (PPIs) play the vital role in bioinformatics to influence and control of biological processes. This paper uses the boosting algorithm and gradient boosting algorithm based on two feature selection strategies to classify hot spots with three common datasets and two hub protein datasets. First, the correlation-based feature selection is used to remove the highly related features for improving accuracy of prediction. Then, the recursive feature elimination based on support vector machine (SVM-RFE) is adopted to select the optimal feature subset to improve the training performance. Finally, boosting and gradient boosting (G-boosting) methods are invoked to generate classification results. Gradient boosting is capable of obtaining an excellent model by reducing the loss function in the gradient direction to avoid overfitting. Five datasets from different protein databases are used to verify our models in the experiments. Experimental results show that our proposed classification models have the competitive performance compared with existing classification methods.
Collapse
|
17
|
Deng A, Zhang H, Wang W, Zhang J, Fan D, Chen P, Wang B. Developing Computational Model to Predict Protein-Protein Interaction Sites Based on the XGBoost Algorithm. Int J Mol Sci 2020; 21:E2274. [PMID: 32218345 PMCID: PMC7178137 DOI: 10.3390/ijms21072274] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 03/10/2020] [Accepted: 03/23/2020] [Indexed: 12/27/2022] Open
Abstract
The study of protein-protein interaction is of great biological significance, and the prediction of protein-protein interaction sites can promote the understanding of cell biological activity and will be helpful for drug development. However, uneven distribution between interaction and non-interaction sites is common because only a small number of protein interactions have been confirmed by experimental techniques, which greatly affects the predictive capability of computational methods. In this work, two imbalanced data processing strategies based on XGBoost algorithm were proposed to re-balance the original dataset from inherent relationship between positive and negative samples for the prediction of protein-protein interaction sites. Herein, a feature extraction method was applied to represent the protein interaction sites based on evolutionary conservatism of proteins, and the influence of overlapping regions of positive and negative samples was considered in prediction performance. Our method showed good prediction performance, such as prediction accuracy of 0.807 and MCC of 0.614, on an original dataset with 10,455 surface residues but only 2297 interface residues. Experimental results demonstrated the effectiveness of our XGBoost-based method.
Collapse
Affiliation(s)
- Aijun Deng
- Key Laboratory of Metallurgical Emission Reduction & Resources Recycling (Anhui University of Technology), Ministry of Education, Ma'anshan 243002, China
- School of Metallurgical Engineering, Anhui University of Technology, Ma'anshan 243032, China
- Department of Engineering, University of Leicester, Leicester LE1 7RH, UK
| | - Huan Zhang
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan 243032, China
| | - Wenyan Wang
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan 243032, China
| | - Jun Zhang
- Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei 230032, China
| | - Dingdong Fan
- School of Metallurgical Engineering, Anhui University of Technology, Ma'anshan 243032, China
| | - Peng Chen
- Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei 230032, China
| | - Bing Wang
- Key Laboratory of Metallurgical Emission Reduction & Resources Recycling (Anhui University of Technology), Ministry of Education, Ma'anshan 243002, China
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan 243032, China
- Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei 230032, China
| |
Collapse
|
18
|
Hu S, Zhang C, Chen P, Gu P, Zhang J, Wang B. Predicting drug-target interactions from drug structure and protein sequence using novel convolutional neural networks. BMC Bioinformatics 2019; 20:689. [PMID: 31874614 PMCID: PMC6929541 DOI: 10.1186/s12859-019-3263-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Background Accurate identification of potential interactions between drugs and protein targets is a critical step to accelerate drug discovery. Despite many relative experimental researches have been done in the past decades, detecting drug-target interactions (DTIs) remains to be extremely resource-intensive and time-consuming. Therefore, many computational approaches have been developed for predicting drug-target associations on a large scale. Results In this paper, we proposed an deep learning-based method to predict DTIs only using the information of drug structures and protein sequences. The final results showed that our method can achieve good performance with the accuracies up to 92.0%, 90.0%, 92.0% and 90.7% for the target families of enzymes, ion channels, GPCRs and nuclear receptors of our created dataset, respectively. Another dataset derived from DrugBank was used to further assess the generalization of the model, which yielded an accuracy of 0.9015 and an AUC value of 0.9557. Conclusion It was elucidated that our model shows improved performance in comparison with other state-of-the-art computational methods on the common benchmark datasets. Experimental results demonstrated that our model successfully extracted more nuanced yet useful features, and therefore can be used as a practical tool to discover new drugs. Availability http://deeplearner.ahu.edu.cn/web/CnnDTI.htm.
Collapse
Affiliation(s)
- ShanShan Hu
- School of Computer Science and Technology, Anhui University, Jiulong Road, Hefei, 230601, China
| | - Chenglin Zhang
- Institutes of Physical Science and Information Technology, Anhui University, Jiulong Road, Hefei, 230601, China
| | - Peng Chen
- School of Computer Science and Technology, Anhui University, Jiulong Road, Hefei, 230601, China. .,Institutes of Physical Science and Information Technology, Anhui University, Jiulong Road, Hefei, 230601, China. .,Cadre's Ward (South District), The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230001, China.
| | - Pengying Gu
- Cadre's Ward (South District), The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230001, China.
| | - Jun Zhang
- School of Electrical and Information Engineering, Anhui University, Hefei, 230601, China
| | - Bing Wang
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan, 243032, China
| |
Collapse
|
19
|
Wang Y, Mei C, Zhou Y, Wang Y, Zheng C, Zhen X, Xiong Y, Chen P, Zhang J, Wang B. Semi-supervised prediction of protein interaction sites from unlabeled sample information. BMC Bioinformatics 2019; 20:699. [PMID: 31874616 PMCID: PMC6929468 DOI: 10.1186/s12859-019-3274-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Background The recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs. However, most sites on protein sequences cannot be defined as interface or non-interface sites because only a small part of protein interactions had been identified, which will cause the lack of prediction accuracy and generalization ability of predictors in protein interaction sites prediction. Therefore, it is necessary to effectively improve prediction performance of protein interaction sites using large amounts of unlabeled data together with small amounts of labeled data and background knowledge today. Results In this work, three semi-supervised support vector machine–based methods are proposed to improve the performance in the protein interaction sites prediction, in which the information of unlabeled protein sites can be involved. Herein, five features related with the evolutionary conservation of amino acids are extracted from HSSP database and Consurf Sever, i.e., residue spatial sequence spectrum, residue sequence information entropy and relative entropy, residue sequence conserved weight and residual Base evolution rate, to represent the residues within the protein sequence. Then three predictors are built for identifying the interface residues from protein surface using three types of semi-supervised support vector machine algorithms. Conclusion The experimental results demonstrated that the semi-supervised approaches can effectively improve prediction performance of protein interaction sites when unlabeled information is involved into the predictors and one of them can achieve the best prediction performance, i.e., the accuracy of 70.7%, the sensitivity of 62.67% and the specificity of 78.72%, respectively. With comparison to the existing studies, the semi-supervised models show the improvement of the predication performance.
Collapse
Affiliation(s)
- Ye Wang
- School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, 243002, Anhui, China
| | - Changqing Mei
- School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, 243002, Anhui, China
| | - Yuming Zhou
- School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, 243002, Anhui, China
| | - Yan Wang
- School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, 243002, Anhui, China
| | - Chunhou Zheng
- Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Xiao Zhen
- School of Computer Science and Technology, Anhui University of Technology, Maanshan, 243002, Anhui, China
| | - Yan Xiong
- School of Computer Science and Technology, University of Science & Technology, Hefei, 230026, Anhui, China
| | - Peng Chen
- Institute of Health Sciences, Anhui University, Hefei, 230601, Anhui, China.
| | - Jun Zhang
- College of Electrical Engineering and Automation, Anhui University, Hefei, 230601, Anhui, China
| | - Bing Wang
- School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, 243002, Anhui, China. .,Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei, 230601, Anhui, China.
| |
Collapse
|
20
|
Wang Y, Xiao Q, Chen P, Wang B. In Silico Prediction of Drug-Induced Liver Injury Based on Ensemble Classifier Method. Int J Mol Sci 2019; 20:E4106. [PMID: 31443562 PMCID: PMC6747689 DOI: 10.3390/ijms20174106] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2019] [Revised: 08/20/2019] [Accepted: 08/20/2019] [Indexed: 11/17/2022] Open
Abstract
Drug-induced liver injury (DILI) is a major factor in the development of drugs and the safety of drugs. If the DILI cannot be effectively predicted during the development of the drug, it will cause the drug to be withdrawn from markets. Therefore, DILI is crucial at the early stages of drug research. This work presents a 2-class ensemble classifier model for predicting DILI, with 2D molecular descriptors and fingerprints on a dataset of 450 compounds. The purpose of our study is to investigate which are the key molecular fingerprints that may cause DILI risk, and then to obtain a reliable ensemble model to predict DILI risk with these key factors. Experimental results suggested that 8 molecular fingerprints are very critical for predicting DILI, and also obtained the best ratio of molecular fingerprints to molecular descriptors. The result of the 5-fold cross-validation of the ensemble vote classifier method obtain an accuracy of 77.25%, and the accuracy of the test set was 81.67%. This model could be used for drug-induced liver injury prediction.
Collapse
Affiliation(s)
- Yangyang Wang
- Institutes of Physical Science and Information Technology, Anhui University, Hefei 230601, China
| | - Qingxin Xiao
- Institutes of Physical Science and Information Technology, Anhui University, Hefei 230601, China
| | - Peng Chen
- Institutes of Physical Science and Information Technology, Anhui University, Hefei 230601, China.
- School of Computer Science and Technology, Anhui University, Hefei 230601, China.
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan 243032, China.
| | - Bing Wang
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan 243032, China.
| |
Collapse
|
21
|
Liu Q, Chen P, Wang B, Zhang J, Li J. Hot spot prediction in protein-protein interactions by an ensemble system. BMC SYSTEMS BIOLOGY 2018; 12:132. [PMID: 30598091 PMCID: PMC6311905 DOI: 10.1186/s12918-018-0665-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
BACKGROUND Hot spot residues are functional sites in protein interaction interfaces. The identification of hot spot residues is time-consuming and laborious using experimental methods. In order to address the issue, many computational methods have been developed to predict hot spot residues. Moreover, most prediction methods are based on structural features, sequence characteristics, and/or other protein features. RESULTS This paper proposed an ensemble learning method to predict hot spot residues that only uses sequence features and the relative accessible surface area of amino acid sequences. In this work, a novel feature selection technique was developed, an auto-correlation function combined with a sliding window technique was applied to obtain the characteristics of amino acid residues in protein sequence, and an ensemble classifier with SVM and KNN base classifiers was built to achieve the best classification performance. CONCLUSION The experimental results showed that our model yields the highest F1 score of 0.92 and an MCC value of 0.87 on ASEdb dataset. Compared with other machine learning methods, our model achieves a big improvement in hot spot prediction. AVAILABILITY http://deeplearner.ahu.edu.cn/web/HotspotEL.htm .
Collapse
Affiliation(s)
- Quanya Liu
- Institute of Physical Science and Information Technology, Anhui University, Hefei, Anhui, 230601, China
| | - Peng Chen
- Institute of Physical Science and Information Technology, Anhui University, Hefei, Anhui, 230601, China.
| | - Bing Wang
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan, Anhui, 243032, China. .,School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan, Anhui, 243032, China.
| | - Jun Zhang
- School of Electrical Engineering and Automation, Anhui University, Hefei, Anhui, 230601, China.
| | - Jinyan Li
- Advanced Analytics Institute and Centre for Health Technologies, University of Technology, Sydney, Sydney, Broadway, NSW, 2007, Australia
| |
Collapse
|
22
|
Liu Q, Chen P, Wang B, Zhang J, Li J. dbMPIKT: a database of kinetic and thermodynamic mutant protein interactions. BMC Bioinformatics 2018; 19:455. [PMID: 30482172 PMCID: PMC6260753 DOI: 10.1186/s12859-018-2493-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2018] [Accepted: 11/13/2018] [Indexed: 02/06/2023] Open
Abstract
Background Protein-protein interactions (PPIs) play important roles in biological functions. Studies of the effects of mutants on protein interactions can provide further understanding of PPIs. Currently, many databases collect experimental mutants to assess protein interactions, but most of these databases are old and have not been updated for several years. Results To address this issue, we manually curated a kinetic and thermodynamic database of mutant protein interactions (dbMPIKT) that is freely accessible at our website. This database contains 5291 mutants in protein interactions collected from previous databases and the literature published within the last three years. Furthermore, some data analysis, such as mutation number, mutation type, protein pair source and network map construction, can be performed online. Conclusion Our work can promote the study on PPIs, and novel information can be mined from the new database. Our database is available in http://DeepLearner.ahu.edu.cn/web/dbMPIKT/ for use by all, including both academics and non-academics. Electronic supplementary material The online version of this article (10.1186/s12859-018-2493-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Quanya Liu
- Institute of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Peng Chen
- Institute of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.
| | - Bing Wang
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan, 243032, Anhui, China
| | - Jun Zhang
- School of Electronic Engineering & Automation, Anhui University, Hefei, 230601, Anhui, China
| | - Jinyan Li
- Advanced Analytics Institute and Centre for Health Technologies, University of Technology, Broadway, Sydney, NSW, 2007, Australia
| |
Collapse
|
23
|
Machine Learning Approaches for Protein⁻Protein Interaction Hot Spot Prediction: Progress and Comparative Assessment. Molecules 2018; 23:molecules23102535. [PMID: 30287797 PMCID: PMC6222875 DOI: 10.3390/molecules23102535] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Revised: 09/27/2018] [Accepted: 10/02/2018] [Indexed: 12/27/2022] Open
Abstract
Hot spots are the subset of interface residues that account for most of the binding free energy, and they play essential roles in the stability of protein binding. Effectively identifying which specific interface residues of protein–protein complexes form the hot spots is critical for understanding the principles of protein interactions, and it has broad application prospects in protein design and drug development. Experimental methods like alanine scanning mutagenesis are labor-intensive and time-consuming. At present, the experimentally measured hot spots are very limited. Hence, the use of computational approaches to predicting hot spots is becoming increasingly important. Here, we describe the basic concepts and recent advances of machine learning applications in inferring the protein–protein interaction hot spots, and assess the performance of widely used features, machine learning algorithms, and existing state-of-the-art approaches. We also discuss the challenges and future directions in the prediction of hot spots.
Collapse
|
24
|
Abstract
Computational identification of special protein molecules is a key issue in understanding protein function. It can guide molecular experiments and help to save costs. I assessed 18 papers published in the special issue of Int. J. Mol. Sci., and also discussed the related works. The computational methods employed in this special issue focused on machine learning, network analysis, and molecular docking. New methods and new topics were also proposed. There were in addition several wet experiments, with proven results showing promise. I hope our special issue will help in protein molecules identification researches.
Collapse
|
25
|
Zou Q, He W. Special Protein Molecules Computational Identification. Int J Mol Sci 2018; 19:ijms19020536. [PMID: 29439426 PMCID: PMC5855758 DOI: 10.3390/ijms19020536] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Revised: 02/02/2018] [Accepted: 02/10/2018] [Indexed: 01/29/2023] Open
Abstract
Computational identification of special protein molecules is a key issue in understanding protein function. It can guide molecular experiments and help to save costs. I assessed 18 papers published in the special issue of Int. J. Mol. Sci., and also discussed the related works. The computational methods employed in this special issue focused on machine learning, network analysis, and molecular docking. New methods and new topics were also proposed. There were in addition several wet experiments, with proven results showing promise. I hope our special issue will help in protein molecules identification researches.
Collapse
Affiliation(s)
- Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin 300354, China.
| | - Wenying He
- School of Computer Science and Technology, Tianjin University, Tianjin 300354, China.
| |
Collapse
|