1
|
Ghafoor H, Asim MN, Ibrahim MA, Dengel A. ProSol-multi: Protein solubility prediction via amino acids multi-level correlation and discriminative distribution. Heliyon 2024; 10:e36041. [PMID: 39281576 PMCID: PMC11401092 DOI: 10.1016/j.heliyon.2024.e36041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 08/01/2024] [Accepted: 08/08/2024] [Indexed: 09/18/2024] Open
Abstract
Protein solubility prediction is useful for the careful selection of highly effective candidate proteins for drug development. In recombinant proteins synthesis, solubility prediction is valuable for optimizing key protein characteristics, including stability, functionality, and ease of purification. It contains valuable information about potential biomarkers or therapeutic targets and helps in early forecasting of neurodegenerative diseases, cancer, and cardiovascular disorders. Traditional wet-lab experimental protein solubility prediction approaches are error-prone, time-consuming, and costly. Researchers harnessed the competence of Artificial Intelligence approaches for replacing experimental approaches with computational predictors. These predictors inferred the solubility of proteins by analyzing amino acids distributions in raw protein sequences. There is still a lot of room for the development of robust computational predictors because existing predictors remain fail in extracting comprehensive discriminative distribution of amino acids. To more precisely discriminate soluble proteins from insoluble proteins, this paper presents ProSol-Multi predictor that makes use of a novel MLCDE encoder and Random Forest classifier. MLCDE encoder transforms protein sequences into informative statistical vectors by capturing amino acids multi-level correlation and discriminative distribution within raw protein sequences. The performance of proposed encoder is evaluated against 56 existing protein sequence encoding methods on a widely used protein solubility prediction benchmark dataset under two different experimental settings namely intrinsic and extrinsic. Intrinsic evaluation reveals that from all sequence encoders, proposed MLCDE encoder manages to generate non-overlapping clusters of soluble and insoluble classes. In extrinsic evaluation, 10 machine learning classifiers achieve better performance with proposed MLCDE encoder as compared to 56 existing protein sequence encoders. Moreover, across 4 public benchmark datasets, proposed ProSol-Multi predictor outshines 20 existing predictors by an average accuracy of 3%, MCC and AU-ROC of 2%. ProSol-Multi interactive web application is available at https://sds_genetic_analysis.opendfki.de/ProSol-Multi.
Collapse
Affiliation(s)
- Hina Ghafoor
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| |
Collapse
|
2
|
Chen B, Zhang Y, Niu Y, Wang Y, Liu Y, Ji H, Han R, Tian Y, Liu X, Kang X, Li Z. RRM2 promotes the proliferation of chicken myoblasts, inhibits their differentiation and muscle regeneration. Poult Sci 2024; 103:103407. [PMID: 38198913 PMCID: PMC10825555 DOI: 10.1016/j.psj.2023.103407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/10/2023] [Accepted: 12/22/2023] [Indexed: 01/12/2024] Open
Abstract
During myogenesis and regeneration, the proliferation and differentiation of myoblasts play key regulatory roles and may be regulated by many genes. In this study, we analyzed the transcriptomic data of chicken primary myoblasts at different periods of proliferation and differentiation with protein‒protein interaction network, and the results indicated that there was an interaction between cyclin-dependent kinase 1 (CDK1) and ribonucleotide reductase regulatory subunit M2 (RRM2). Previous studies in mammals have a role for RRM2 in skeletal muscle development as well as cell growth, but the role of RRM2 in chicken is unclear. In this study, we investigated the effects of RRM2 on skeletal muscle development and regeneration in chickens in vitro and in vivo. The interaction between RRM2 and CDK1 was initially identified by co-immunoprecipitation and mass spectrometry. Through a dual luciferase reporter assay and quantitative real-time PCR, we identified the core promoter region of RRM2, which is regulated by the SP1 transcription factor. In this study, through cell counting kit-8 assays, 5-ethynyl-2'-deoxyuridine incorporation assays, flow cytometry, immunofluorescence staining, and Western blot analysis, we demonstrated that RRM2 promoted the proliferation and inhibited the differentiation of myoblasts. In vivo studies showed that RRM2 reduced the diameter of muscle fibers and slowed skeletal muscle regeneration. In conclusion, these data provide preliminary insights into the biological functions of RRM2 in chicken muscle development and skeletal muscle regeneration.
Collapse
Affiliation(s)
- Bingjie Chen
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| | - Yushi Zhang
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| | - Yufang Niu
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| | - Yanxing Wang
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| | - Yang Liu
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| | - Haigang Ji
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| | - Ruili Han
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China; Henan Key Laboratory for Innovation and Utilization of Chicken Germplasm Resources, Zhengzhou 450046, China
| | - Yadong Tian
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China; Henan Key Laboratory for Innovation and Utilization of Chicken Germplasm Resources, Zhengzhou 450046, China
| | - Xiaojun Liu
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China; Henan Key Laboratory for Innovation and Utilization of Chicken Germplasm Resources, Zhengzhou 450046, China
| | - Xiangtao Kang
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China; Henan Key Laboratory for Innovation and Utilization of Chicken Germplasm Resources, Zhengzhou 450046, China
| | - Zhuanjian Li
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China; Henan Key Laboratory for Innovation and Utilization of Chicken Germplasm Resources, Zhengzhou 450046, China.
| |
Collapse
|
3
|
Liyaqat T, Ahmad T, Saxena C. TeM-DTBA: time-efficient drug target binding affinity prediction using multiple modalities with Lasso feature selection. J Comput Aided Mol Des 2023; 37:573-584. [PMID: 37777631 DOI: 10.1007/s10822-023-00533-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 09/07/2023] [Indexed: 10/02/2023]
Abstract
Drug discovery, especially virtual screening and drug repositioning, can be accelerated through deeper understanding and prediction of Drug Target Interactions (DTIs). The advancement of deep learning as well as the time and financial costs associated with conventional wet-lab experiments have made computational methods for DTI prediction more popular. However, the majority of these computational methods handle the DTI problem as a binary classification task, ignoring the quantitative binding affinity that determines the drug efficacy to their target proteins. Moreover, computational space as well as execution time of the model is often ignored over accuracy. To address these challenges, we introduce a novel method, called Time-efficient Multimodal Drug Target Binding Affinity (TeM-DTBA), which predicts the binding affinity between drugs and targets by fusing different modalities based on compound structures and target sequences. We employ the Lasso feature selection method, which lowers the dimensionality of feature vectors and speeds up the proposed model training time by more than 50%. The results from two benchmark datasets demonstrate that our method outperforms state-of-the-art methods in terms of performance. The mean squared errors of 18.8% and 23.19%, achieved on the KIBA and Davis datasets, respectively, suggest that our method is more accurate in predicting drug-target binding affinity.
Collapse
Affiliation(s)
- Tanya Liyaqat
- Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India.
| | - Tanvir Ahmad
- Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India
| | - Chandni Saxena
- The Chinese University of Hong Kong, Sha Tin, SAR, China
| |
Collapse
|
4
|
Ahmad U, Abdullah S, Chau DM, Chia SL, Yusoff K, Chan SC, Ong TA, Razack AH, Veerakumarasivam A. Analysis of PPI networks of transcriptomic expression identifies hub genes associated with Newcastle disease virus persistent infection in bladder cancer. Sci Rep 2023; 13:7323. [PMID: 37147328 PMCID: PMC10162992 DOI: 10.1038/s41598-022-20521-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 09/14/2022] [Indexed: 05/07/2023] Open
Abstract
Bladder cancer cells can acquire persistent infection of oncolytic Newcastle disease virus (NDV) but the molecular mechanism(s) remain unelucidated. This poses a major barrier to the effective clinical translation of oncolytic NDV virotherapy of cancers. To improve our understanding of the molecular mechanism(s) associated with the development of NDV persistent infection in bladder cancer, we used mRNA expression profiles of persistently infected bladder cancer cells to construct PPI networks. Based on paths and modules in the PPI network, the bridges were found mainly in the upregulated mRNA-pathways of p53 signalling, ECM-receptor interaction, and TGF-beta signalling and downregulated mRNA-pathways of antigen processing and presentation, protein processing in endoplasmic reticulum, completement and coagulation cascades in persistent TCCSUPPi cells. In persistent EJ28Pi cells, connections were identified mainly through upregulated mRNA-pathways of renal carcinoma, viral carcinogenesis, Ras signalling and cell cycle and the downregulated mRNA-pathways of Wnt signalling, HTLV-I infection and pathways in cancers. These connections were mainly dependent on RPL8-HSPA1A/HSPA4 in TCCSUPPi cells and EP300, PTPN11, RAC1-TP53, SP1, CCND1 and XPO1 in EJ28Pi cells. Oncomine validation showed that the top hub genes identified in the networks that include RPL8, THBS1, F2 from TCCSUPPi and TP53 and RAC1 from EJ28Pi are involved in the development and progression of bladder cancer. Protein-drug interaction networks identified several putative drug targets that could be used to disrupt the linkages between the modules and prevent bladder cancer cells from acquiring NDV persistent infection. This novel PPI network analysis of differentially expressed mRNAs of NDV persistently infected bladder cancer cell lines provide an insight into the molecular mechanisms of NDV persistency of infection in bladder cancers and the future screening of drugs that can be used together with NDV to enhance its oncolytic efficacy.
Collapse
Affiliation(s)
- Umar Ahmad
- Medical Genetics Laboratory, Genetics and Regenerative Medicine Research Centre, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Medical Genetics Unit, Faculty of Basic Medical Sciences, Bauchi State University, Gadau, PMB 65, Itas/Gadau, Nigeria
| | - Syahril Abdullah
- Medical Genetics Laboratory, Genetics and Regenerative Medicine Research Centre, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- MAKNA Cancer Research Laboratory, Institute of Bioscience, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
| | - De Ming Chau
- Medical Genetics Laboratory, Genetics and Regenerative Medicine Research Centre, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
| | - Suet Lin Chia
- MAKNA Cancer Research Laboratory, Institute of Bioscience, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Department of Microbiology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor Darul Ehsan, Malaysia
| | - Khatijah Yusoff
- Department of Microbiology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor Darul Ehsan, Malaysia
- Malaysia Genome Institute, Ministry of Science, Technology and Innovation, Jalan Bangi, 43000, Kajang, Selangor Darul Ehsan, Malaysia
| | - Soon Choy Chan
- School of Liberal Arts, Science and Technology (PUScLST), Perdana University, Perdana University, 50490, Kuala Lumpur, Malaysia
| | - Teng Aik Ong
- Department of Surgery, Faculty of Medicine, University of Malaya, Wilayah Persekutuan, Kuala Lumpur, Malaysia
| | - Azad Hassan Razack
- Department of Surgery, Faculty of Medicine, University of Malaya, Wilayah Persekutuan, Kuala Lumpur, Malaysia
| | - Abhi Veerakumarasivam
- Medical Genetics Laboratory, Genetics and Regenerative Medicine Research Centre, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia.
- Department of Biological Sciences, School of Medical and Life Sciences, Sunway University, 47500, Bandar Sunway, Selangor Darul Ehsan, Malaysia.
| |
Collapse
|
5
|
Predicting Protein–Protein Interactions Based on Ensemble Learning-Based Model from Protein Sequence. BIOLOGY 2022; 11:biology11070995. [PMID: 36101379 PMCID: PMC9311754 DOI: 10.3390/biology11070995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 05/27/2022] [Accepted: 06/29/2022] [Indexed: 11/17/2022]
Abstract
Simple Summary Due to most traditional high-throughput experiments are tedious and laborious in identifying potential protein–protein interaction. To better improve accuracy prediction in protein–protein interactions. We proposed a novel computational method that can identify unknown protein–protein interaction efficiently and hope this method can provide a helpful idea and tool for proteomics research. Abstract Protein–protein interactions (PPIs) play an essential role in many biological cellular functions. However, it is still tedious and time-consuming to identify protein–protein interactions through traditional experimental methods. For this reason, it is imperative and necessary to develop a computational method for predicting PPIs efficiently. This paper explores a novel computational method for detecting PPIs from protein sequence, the approach which mainly adopts the feature extraction method: Locality Preserving Projections (LPP) and classifier: Rotation Forest (RF). Specifically, we first employ the Position Specific Scoring Matrix (PSSM), which can remain evolutionary information of biological for representing protein sequence efficiently. Then, the LPP descriptor is applied to extract feature vectors from PSSM. The feature vectors are fed into the RF to obtain the final results. The proposed method is applied to two datasets: Yeast and H. pylori, and obtained an average accuracy of 92.81% and 92.56%, respectively. We also compare it with K nearest neighbors (KNN) and support vector machine (SVM) to better evaluate the performance of the proposed method. In summary, all experimental results indicate that the proposed approach is stable and robust for predicting PPIs and promising to be a useful tool for proteomics research.
Collapse
|
6
|
Wang Y, Wang LL, Wong L, Li Y, Wang L, You ZH. SIPGCN: A Novel Deep Learning Model for Predicting Self-Interacting Proteins from Sequence Information Using Graph Convolutional Networks. Biomedicines 2022; 10:biomedicines10071543. [PMID: 35884848 PMCID: PMC9313220 DOI: 10.3390/biomedicines10071543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/24/2022] [Indexed: 11/16/2022] Open
Abstract
Protein is the basic organic substance that constitutes the cell and is the material condition for the life activity and the guarantee of the biological function activity. Elucidating the interactions and functions of proteins is a central task in exploring the mysteries of life. As an important protein interaction, self-interacting protein (SIP) has a critical role. The fast growth of high-throughput experimental techniques among biomolecules has led to a massive influx of available SIP data. How to conduct scientific research using the massive amount of SIP data has become a new challenge that is being faced in related research fields such as biology and medicine. In this work, we design an SIP prediction method SIPGCN using a deep learning graph convolutional network (GCN) based on protein sequences. First, protein sequences are characterized using a position-specific scoring matrix, which is able to describe the biological evolutionary message, then their hidden features are extracted by the deep learning method GCN, and, finally, the random forest is utilized to predict whether there are interrelationships between proteins. In the cross-validation experiment, SIPGCN achieved 93.65% accuracy and 99.64% specificity in the human data set. SIPGCN achieved 90.69% and 99.08% of these two indicators in the yeast data set, respectively. Compared with other feature models and previous methods, SIPGCN showed excellent results. These outcomes suggest that SIPGCN may be a suitable instrument for predicting SIP and may be a reliable candidate for future wet experiments.
Collapse
Affiliation(s)
- Ying Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
| | - Lin-Lin Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Correspondence: (L.-L.W.); (L.W.)
| | - Leon Wong
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China;
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- Correspondence: (L.-L.W.); (L.W.)
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| |
Collapse
|
7
|
Wang L, Wong L, Chen ZH, Hu J, Sun XF, Li Y, You ZH. MSPEDTI: Prediction of Drug-Target Interactions via Molecular Structure with Protein Evolutionary Information. BIOLOGY 2022; 11:740. [PMID: 35625468 PMCID: PMC9138588 DOI: 10.3390/biology11050740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/03/2022] [Accepted: 05/04/2022] [Indexed: 11/25/2022]
Abstract
The key to new drug discovery and development is first and foremost the search for molecular targets of drugs, thus advancing drug discovery and drug repositioning. However, traditional drug-target interactions (DTIs) is a costly, lengthy, high-risk, and low-success-rate system project. Therefore, more and more pharmaceutical companies are trying to use computational technologies to screen existing drug molecules and mine new drugs, leading to accelerating new drug development. In the current study, we designed a deep learning computational model MSPEDTI based on Molecular Structure and Protein Evolutionary to predict the potential DTIs. The model first fuses protein evolutionary information and drug structure information, then a deep learning convolutional neural network (CNN) to mine its hidden features, and finally accurately predicts the associated DTIs by extreme learning machine (ELM). In cross-validation experiments, MSPEDTI achieved 94.19%, 90.95%, 87.95%, and 86.11% prediction accuracy in the gold-standard datasets enzymes, ion channels, G-protein-coupled receptors (GPCRs), and nuclear receptors, respectively. MSPEDTI showed its competitive ability in ablation experiments and comparison with previous excellent methods. Additionally, 7 of 10 potential DTIs predicted by MSPEDTI were substantiated by the classical database. These excellent outcomes demonstrate the ability of MSPEDTI to provide reliable drug candidate targets and strongly facilitate the development of drug repositioning and drug development.
Collapse
Affiliation(s)
- Lei Wang
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China;
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China; (J.H.); (X.-F.S.)
| | - Leon Wong
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China;
| | - Zhan-Heng Chen
- Computer Science and Technology, Tongji University, Shanghai 200092, China;
| | - Jing Hu
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China; (J.H.); (X.-F.S.)
| | - Xiao-Fei Sun
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China; (J.H.); (X.-F.S.)
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China;
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China;
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| |
Collapse
|
8
|
Wang L, You ZH, Li LP, Yan X, Zhang W, Song KJ, Song CD. Identification of potential drug-targets by combining evolutionary information extracted from frequency profiles and molecular topological structures. Chem Biol Drug Des 2020; 96:758-767. [PMID: 31393672 DOI: 10.1111/cbdd.13599] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 07/29/2019] [Accepted: 08/03/2019] [Indexed: 01/09/2023]
Abstract
Identifying interactions among drug compounds and target proteins is the basis of drug research and plays a crucial role in drug discovery. However, determining drug-target interactions (DTIs) and potential protein-compound interactions by biological experiment-based method alone is a very complicated, expensive, and time-consuming process. Hence, there is an intense motivation to design in silico prediction methods to overcome these obstacles. In this work, we designed a novel in silico strategy to predict proteome-scale DTIs based on the assumption that DTI pairs can be expressed through the evolutionary information derived from frequency profiles and drugs' structural properties. To achieve this, drug molecules are encoded into the substructure fingerprints to represent certain fragments; target proteins are first converted into position-specific scoring matrix (PSSM) and then encoded as 2-dimensional principal component analysis (2DPCA) descriptors. In the prediction phase, the feature weighted rotation forest (RF) classifier is used to estimate whether drug and target interact with each other on four benchmark datasets, including Enzymes, Ion Channels, GPCRs, and Nuclear Receptors. The prediction accuracy of cross-validation on the four datasets is 95.40%, 88.82%, 85.67%, and 82.22%, respectively. In order to have a clearer assessment of the proposed approach, we compared it with the discrete cosine transform (DCT) descriptor model, support vector machine (SVM) classifier model, and existing excellent approaches, including DBSI, NetCBP, KBMF2K, SIMCOMP, and RFDT. The excellent results of the experiment indicated that the proposed approach can effectively improve the DTI prediction accuracy and can be used as a practical tool for the research and design of new drugs.
Collapse
Affiliation(s)
- Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China.,Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
| | - Zhu-Hong You
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
| | - Li-Ping Li
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
| | - Xin Yan
- School of Foreign Languages, Zaozhuang University, Zaozhuang, China
| | - Wei Zhang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
| | - Ke-Jian Song
- School of information engineering, JiangXi University of Science and Technology, Ganzhou, China
| | - Chuan-Dong Song
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
| |
Collapse
|
9
|
Wang L, You ZH, Huang DS, Zhou F. Combining High Speed ELM Learning with a Deep Convolutional Neural Network Feature Encoding for Predicting Protein-RNA Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:972-980. [PMID: 30296240 DOI: 10.1109/tcbb.2018.2874267] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Emerging evidence has shown that RNA plays a crucial role in many cellular processes, and their biological functions are primarily achieved by binding with a variety of proteins. High-throughput biological experiments provide a lot of valuable information for the initial identification of RNA-protein interactions (RPIs), but with the increasing complexity of RPIs networks, this method gradually falls into expensive and time-consuming situations. Therefore, there is an urgent need for high speed and reliable methods to predict RNA-protein interactions. In this study, we propose a computational method for predicting the RNA-protein interactions using sequence information. The deep learning convolution neural network (CNN) algorithm is utilized to mine the hidden high-level discriminative features from the RNA and protein sequences and feed it into the extreme learning machine (ELM) classifier. The experimental results with 5-fold cross-validation indicate that the proposed method achieves superior performance on benchmark datasets (RPI1807, RPI2241, and RPI369) with the accuracy of 98.83, 90.83, and 85.63 percent, respectively. We further evaluate the performance of the proposed model by comparing it with the state-of-the-art SVM classifier and other existing methods on the same benchmark data set. In addition, we predicted the independent NPInter v2.0 data set using the model trained on RPI369. The experimental results show that our model can serve as a useful tool for predicting RNA-protein interactions.
Collapse
|
10
|
Gui YM, Wang RJ, Wang X, Wei YY. Using Deep Neural Networks to Improve the Performance of Protein–Protein Interactions Prediction. INT J PATTERN RECOGN 2020. [DOI: 10.1142/s0218001420520126] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein–protein interactions (PPIs) help to elucidate the molecular mechanisms of life activities and have a certain role in promoting disease treatment and new drug development. With the advent of the proteomics era, some PPIs prediction methods have emerged. However, the performances of these PPIs prediction methods still need to be optimized and improved. In order to optimize the performance of the PPIs prediction methods, we used the dropout method to reduce over-fitting by deep neural networks (DNNs), and combined with three types of feature extraction methods, conjoint triad (CT), auto covariance (AC) and local descriptor (LD), to build DNN models based on amino acid sequences. The results showed that the accuracy of the CT, AC and LD increased from 97.11% to 98.12%, 96.84% to 98.17%, and 95.30% to 95.60%, respectively. The loss values of the CT, AC and LD decreased from 27.47% to 14.96%, 65.91% to 17.82% and 36.23% to 15.34%, respectively. Experimental results show that dropout can optimize the performances of the DNN models. The results can provide a resource for scholars in future studies involving the prediction of PPIs. The experimental code is available at https://github.com/smalltalkman/hppi-tensorflow .
Collapse
Affiliation(s)
- Yuan-Miao Gui
- Institute of Intelligent Machines, Hefei Institute of Physics, Chinese Academy of Sciences, Hefei City, Anhui Province, P. R. China
- University of Science and Technology of China, Hefei City, Anhui Province, P. R. China
| | - Ru-Jing Wang
- Institute of Intelligent Machines, Hefei Institute of Physics, Chinese Academy of Sciences, Hefei City, Anhui Province, P. R. China
| | - Xue Wang
- Institute of Intelligent Machines, Hefei Institute of Physics, Chinese Academy of Sciences, Hefei City, Anhui Province, P. R. China
| | - Yuan-Yuan Wei
- Institute of Intelligent Machines, Hefei Institute of Physics, Chinese Academy of Sciences, Hefei City, Anhui Province, P. R. China
| |
Collapse
|
11
|
Incorporating chemical sub-structures and protein evolutionary information for inferring drug-target interactions. Sci Rep 2020; 10:6641. [PMID: 32313024 PMCID: PMC7171114 DOI: 10.1038/s41598-020-62891-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 03/12/2020] [Indexed: 01/29/2023] Open
Abstract
Accumulating evidence has shown that drug-target interactions (DTIs) play a crucial role in the process of genomic drug discovery. Although biological experimental technology has made great progress, the identification of DTIs is still very time-consuming and expensive nowadays. Hence it is urgent to develop in silico model as a supplement to the biological experiments to predict the potential DTIs. In this work, a new model is designed to predict DTIs by incorporating chemical sub-structures and protein evolutionary information. Specifically, we first use Position-Specific Scoring Matrix (PSSM) to convert the protein sequence into the numerical descriptor containing biological evolutionary information, then use Discrete Cosine Transform (DCT) algorithm to extract the hidden features and integrate them with the chemical sub-structures descriptor, and finally utilize Rotation Forest (RF) classifier to accurately predict whether there is interaction between the drug and the target protein. In the 5-fold cross-validation (CV) experiment, the average accuracy of the proposed model on the benchmark datasets of Enzymes, Ion Channels, GPCRs and Nuclear Receptors reached 0.9140, 0.8919, 0.8724 and 0.8111, respectively. In order to fully evaluate the performance of the proposed model, we compare it with different feature extraction model, classifier model, and other state-of-the-art models. Furthermore, we also implemented case studies. As a result, 8 of the top 10 drug-target pairs with the highest prediction score were confirmed by related databases. These excellent results indicate that the proposed model has outstanding ability in predicting DTIs and can provide reliable candidates for biological experiments.
Collapse
|
12
|
Li Y, Li LP, Wang L, Yu CQ, Wang Z, You ZH. An Ensemble Classifier to Predict Protein-Protein Interactions by Combining PSSM-based Evolutionary Information with Local Binary Pattern Model. Int J Mol Sci 2019; 20:E3511. [PMID: 31319578 PMCID: PMC6679202 DOI: 10.3390/ijms20143511] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2019] [Revised: 07/04/2019] [Accepted: 07/15/2019] [Indexed: 01/03/2023] Open
Abstract
Protein plays a critical role in the regulation of biological cell functions. Among them, whether proteins interact with each other has become a fundamental problem, because proteins usually perform their functions by interacting with other proteins. Although a large amount of protein-protein interactions (PPIs) data has been produced by high-throughput biotechnology, the disadvantage of biological experimental technique is time-consuming and costly. Thus, computational methods for predicting protein interactions have become a research hot spot. In this research, we propose an efficient computational method that combines Rotation Forest (RF) classifier with Local Binary Pattern (LBP) feature extraction method to predict PPIs from the perspective of Position-Specific Scoring Matrix (PSSM). The proposed method has achieved superior performance in predicting Yeast, Human, and H. pylori datasets with average accuracies of 92.12%, 96.21%, and 86.59%, respectively. In addition, we also evaluated the performance of the proposed method on the four independent datasets of C. elegans, H. pylori, H. sapiens, and M. musculus datasets. These obtained experimental results fully prove that our model has good feasibility and robustness in predicting PPIs.
Collapse
Affiliation(s)
- Yang Li
- School of Information Engineering, Xijing University, Xi'an 710123, China
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi'an 710123, China.
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an 710123, China.
| | - Zheng Wang
- School of Information Engineering, Xijing University, Xi'an 710123, China
| | - Zhu-Hong You
- School of Information Engineering, Xijing University, Xi'an 710123, China
| |
Collapse
|
13
|
Wang X, Wu Y, Wang R, Wei Y, Gui Y. A novel matrix of sequence descriptors for predicting protein-protein interactions from amino acid sequences. PLoS One 2019; 14:e0217312. [PMID: 31173605 PMCID: PMC6555512 DOI: 10.1371/journal.pone.0217312] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 05/08/2019] [Indexed: 12/20/2022] Open
Abstract
Protein-protein interactions (PPIs) play an important role in the life activities of organisms. With the availability of large amounts of protein sequence data, PPIs prediction methods have attracted increasing attention. A variety of protein sequence coding methods have emerged, but the training of these methods is particularly time consuming. To solve this issue, we have proposed a novel matrix sequence coding method. Based on deep neural network (DNN) and a novel matrix protein sequence descriptor, we constructed a protein interaction prediction model for predicting PPIs. When performed on human PPIs data, the method achieved an accuracy of 94.34%, a recall of 98.28%, an area under the curve (AUC) of 97.79% and a loss of 23.25%. A non-redundant dataset was used to evaluate this prediction model, and the prediction accuracy is 88.29%. These results indicate that the matrix of sequence (MOS) descriptor can enhance the predictive power of PPIs and reduce training time, which can be a useful complement for future proteomics research. The experimental code and experimental results can be found at https://github.com/smalltalkman/hppi-tensorflow.
Collapse
Affiliation(s)
- Xue Wang
- Institute of Technical Biology & Agriculture Engineering, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
- University of Science and Technology of China, HeFei City, AnHui Province, China
- Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
| | - Yuejin Wu
- Institute of Technical Biology & Agriculture Engineering, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
- University of Science and Technology of China, HeFei City, AnHui Province, China
| | - Rujing Wang
- University of Science and Technology of China, HeFei City, AnHui Province, China
- Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
| | - Yuanyuan Wei
- Institute of Technical Biology & Agriculture Engineering, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
| | - Yuanmiao Gui
- Institute of Technical Biology & Agriculture Engineering, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
- University of Science and Technology of China, HeFei City, AnHui Province, China
- * E-mail:
| |
Collapse
|
14
|
Wang L, You ZH, Chen X, Li YM, Dong YN, Li LP, Zheng K. LMTRDA: Using logistic model tree to predict MiRNA-disease associations by fusing multi-source information of sequences and similarities. PLoS Comput Biol 2019; 15:e1006865. [PMID: 30917115 PMCID: PMC6464243 DOI: 10.1371/journal.pcbi.1006865] [Citation(s) in RCA: 77] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Revised: 04/15/2019] [Accepted: 02/13/2019] [Indexed: 11/18/2022] Open
Abstract
Emerging evidence has shown microRNAs (miRNAs) play an important role in human disease research. Identifying potential association among them is significant for the development of pathology, diagnose and therapy. However, only a tiny portion of all miRNA-disease pairs in the current datasets are experimentally validated. This prompts the development of high-precision computational methods to predict real interaction pairs. In this paper, we propose a new model of Logistic Model Tree for predicting miRNA-Disease Association (LMTRDA) by fusing multi-source information including miRNA sequences, miRNA functional similarity, disease semantic similarity, and known miRNA-disease associations. In particular, we introduce miRNA sequence information and extract its features using natural language processing technique for the first time in the miRNA-disease prediction model. In the cross-validation experiment, LMTRDA obtained 90.51% prediction accuracy with 92.55% sensitivity at the AUC of 90.54% on the HMDD V3.0 dataset. To further evaluate the performance of LMTRDA, we compared it with different classifier and feature descriptor models. In addition, we also validate the predictive ability of LMTRDA in human diseases including Breast Neoplasms, Breast Neoplasms and Lymphoma. As a result, 28, 27 and 26 out of the top 30 miRNAs associated with these diseases were verified by experiments in different kinds of case studies. These experimental results demonstrate that LMTRDA is a reliable model for predicting the association among miRNAs and diseases.
Collapse
Affiliation(s)
- Lei Wang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
- * E-mail: (ZHY); (XC)
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
- * E-mail: (ZHY); (XC)
| | - Yang-Ming Li
- Department of Electrical Computer and Telecommunications Engineering Technology, Rochester Institute of Technology, Rochester, United States of America
| | - Ya-Nan Dong
- Xiangya School of Public Health, Central South University, Changsha, China
| | - Li-Ping Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
| | - Kai Zheng
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
| |
Collapse
|
15
|
Wang L, Yan X, Liu ML, Song KJ, Sun XF, Pan WW. Prediction of RNA-protein interactions by combining deep convolutional neural network with feature selection ensemble method. J Theor Biol 2018; 461:230-238. [PMID: 30321541 DOI: 10.1016/j.jtbi.2018.10.029] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 03/22/2018] [Accepted: 10/11/2018] [Indexed: 01/01/2023]
Abstract
RNA-protein interaction (RPI) plays an important role in the basic cellular processes of organisms. Unfortunately, due to time and cost constraints, it is difficult for biological experiments to determine the relationship between RNA and protein to a large extent. So there is an urgent need for reliable computational methods to quickly and accurately predict RNA-protein interaction. In this study, we propose a novel computational method RPIFSE (predicting RPI with Feature Selection Ensemble method) based on RNA and protein sequence information to predict RPI. Firstly, RPIFSE disturbs the features extracted by the convolution neural network (CNN) and generates multiple data sets according to the weight of the feature, and then use extreme learning machine (ELM) classifier to classify these data sets. Finally, the results of each classifier are combined, and the highest score is chosen as the final prediction result by weighting voting method. In 5-fold cross-validation experiments, RPIFSE achieved 91.87%, 89.74%, 97.76% and 98.98% accuracy on RPI369, RPI2241, RPI488 and RPI1807 data sets, respectively. To further evaluate the performance of RPIFSE, we compare it with the state-of-the-art support vector machine (SVM) classifier and other exiting methods on those data sets. Furthermore, we also predicted the RPI on the independent data set NPInter2.0 and drew the network graph based on the prediction results. These promising comparison results demonstrated the effectiveness of RPIFSE and indicated that RPIFSE could be a useful tool for predicting RPI.
Collapse
Affiliation(s)
- Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong 277100, China.
| | - Xin Yan
- School of Foreign Languages, Zaozhuang University, Zaozhuang, Shandong 277100, China.
| | - Meng-Lin Liu
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong 277100, China
| | - Ke-Jian Song
- School of Information Engineering, JiangXi University of Science and Technology, Ganzhou, Jiangxi 341000, China.
| | - Xiao-Fei Sun
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong 277100, China.
| | - Wen-Wen Pan
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong 277100, China.
| |
Collapse
|
16
|
In silico-prediction of protein-protein interactions network about MAPKs and PP2Cs reveals a novel docking site variants in Brachypodium distachyon. Sci Rep 2018; 8:15083. [PMID: 30305661 PMCID: PMC6180098 DOI: 10.1038/s41598-018-33428-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 09/13/2018] [Indexed: 12/26/2022] Open
Abstract
Protein-protein interactions (PPIs) underlie the molecular mechanisms of most biological processes. Mitogen-activated protein kinases (MAPKs) can be dephosphorylated by MAPK-specific phosphatases such as PP2C, which are critical to transduce extracellular signals into adaptive and programmed responses. However, the experimental approaches for identifying PPIs are expensive, time-consuming, laborious and challenging. In response, many computational methods have been developed to predict PPIs. Yet, these methods have inherent disadvantages such as high false positive and negative results. Thus, it is crucial to develop in silico approaches for predicting PPIs efficiently and accurately. In this study, we identified PPIs among 16 BdMAPKs and 86 BdPP2Cs in B. distachyon using a novel docking approach. Further, we systematically investigated the docking site (D-site) of BdPP2C which plays a vital role for recognition and docking of BdMAPKs. D-site analysis revealed that there were 96 pairs of PPIs including all BdMAPKs and most BdPP2Cs, which indicated that BdPP2C may play roles in other signaling networks. Moreover, most BdPP2Cs have a D-site for BdMAPKs in our prediction results, which suggested that our method can effectively predict PPIs, as confirmed by their 3D structure. In addition, we validated this methodology with known Arabidopsis and yeast phosphatase-MAPK interactions from the STRING database. The results obtained provide a vital research resource for exploring an accurate network of PPIs between BdMAPKs and BdPP2Cs.
Collapse
|
17
|
Using a Classifier Fusion Strategy to Identify Anti-angiogenic Peptides. Sci Rep 2018; 8:14062. [PMID: 30218091 PMCID: PMC6138733 DOI: 10.1038/s41598-018-32443-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 09/07/2018] [Indexed: 12/27/2022] Open
Abstract
Anti-angiogenic peptides perform distinct physiological functions and potential therapies for angiogenesis-related diseases. Accurate identification of anti-angiogenic peptides may provide significant clues to understand the essential angiogenic homeostasis within tissues and develop antineoplastic therapies. In this study, an ensemble predictor is proposed for anti-angiogenic peptide prediction by fusing an individual classifier with the best sensitivity and another individual one with the best specificity. We investigate predictive capabilities of various feature spaces with respect to the corresponding optimal individual classifiers and ensemble classifiers. The accuracy and Matthew’s Correlation Coefficient (MCC) of the ensemble classifier trained by Bi-profile Bayes (BpB) features are 0.822 and 0.649, respectively, which represents the highest prediction results among the investigated prediction models. Discriminative features are obtained from BpB using the Relief algorithm followed by the Incremental Feature Selection (IFS) method. The sensitivity, specificity, accuracy, and MCC of the ensemble classifier trained by the discriminative features reach up to 0.776, 0.888, 0.832, and 0.668, respectively. Experimental results indicate that the proposed method is far superior to the previous study for anti-angiogenic peptide prediction.
Collapse
|
18
|
Using Two-dimensional Principal Component Analysis and Rotation Forest for Prediction of Protein-Protein Interactions. Sci Rep 2018; 8:12874. [PMID: 30150728 PMCID: PMC6110764 DOI: 10.1038/s41598-018-30694-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 07/17/2018] [Indexed: 11/09/2022] Open
Abstract
The interaction among proteins is essential in all life activities, and it is the basis of all the metabolic activities of the cells. By studying the protein-protein interactions (PPIs), people can better interpret the function of protein, decoding the phenomenon of life, especially in the design of new drugs with great practical value. Although many high-throughput techniques have been devised for large-scale detection of PPIs, these methods are still expensive and time-consuming. For this reason, there is a much-needed to develop computational methods for predicting PPIs at the entire proteome scale. In this article, we propose a new approach to predict PPIs using Rotation Forest (RF) classifier combine with matrix-based protein sequence. We apply the Position-Specific Scoring Matrix (PSSM), which contains biological evolution information, to represent protein sequences and extract the features through the two-dimensional Principal Component Analysis (2DPCA) algorithm. The descriptors are then sending to the rotation forest classifier for classification. We obtained 97.43% prediction accuracy with 94.92% sensitivity at the precision of 99.93% when the proposed method was applied to the PPIs data of yeast. To evaluate the performance of the proposed method, we compared it with other methods in the same dataset, and validate it on an independent datasets. The results obtained show that the proposed method is an appropriate and promising method for predicting PPIs.
Collapse
|
19
|
PPInS: a repository of protein-protein interaction sitesbase. Sci Rep 2018; 8:12453. [PMID: 30127348 PMCID: PMC6102274 DOI: 10.1038/s41598-018-30999-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 08/03/2018] [Indexed: 01/14/2023] Open
Abstract
Protein-Protein Interaction Sitesbase (PPInS), a high-performance database of protein-protein interacting interfaces, is presented. The atomic level information of the molecular interaction happening amongst various protein chains in protein-protein complexes (as reported in the Protein Data Bank [PDB]) together with their evolutionary information in Structural Classification of Proteins (SCOPe release 2.06), is made available in PPInS. Total 32468 PDB files representing X-ray crystallized multimeric protein-protein complexes with structural resolution better than 2.5 Å had been shortlisted to demarcate the protein-protein interaction interfaces (PPIIs). A total of 111857 PPIIs with ~32.24 million atomic contact pairs (ACPs) were generated and made available on a web server for on-site analysis and downloading purpose. All these PPIIs and protein-protein interacting patches (PPIPs) involved in them, were also analyzed in terms of a number of residues contributing in patch formation, their hydrophobic nature, amount of surface area they contributed in binding, and their homo and heterodimeric nature, to describe the diversity of information covered in PPInS. It was observed that 42.37% of total PPIPs were made up of 6–20 interacting residues, 53.08% PPIPs had interface area ≤1000 Å2 in PPII formation, 82.64% PPIPs were reported with hydrophobicity score of ≤10, and 73.26% PPIPs were homologous to each other with the sequence similarity score ranging from 75–100%. A subset “Non-Redundant Database (NRDB)” of the PPInS containing 2265 PPIIs, with over 1.8 million ACPs corresponding to the 1931 protein-protein complexes (PDBs), was also designed by removing structural redundancies at the level of SCOP superfamily (SCOP release 1.75). The web interface of the PPInS (http://www.cup.edu.in:99/ppins/home.php) offers an easy-to-navigate, intuitive and user-friendly environment, and can be accessed by providing PDB ID, SCOP superfamily ID, and protein sequence.
Collapse
|
20
|
Reciprocal Perspective for Improved Protein-Protein Interaction Prediction. Sci Rep 2018; 8:11694. [PMID: 30076341 PMCID: PMC6076239 DOI: 10.1038/s41598-018-30044-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 07/20/2018] [Indexed: 02/06/2023] Open
Abstract
All protein-protein interaction (PPI) predictors require the determination of an operational decision threshold when differentiating positive PPIs from negatives. Historically, a single global threshold, typically optimized via cross-validation testing, is applied to all protein pairs. However, we here use data visualization techniques to show that no single decision threshold is suitable for all protein pairs, given the inherent diversity of protein interaction profiles. The recent development of high throughput PPI predictors has enabled the comprehensive scoring of all possible protein-protein pairs. This, in turn, has given rise to context, enabling us now to evaluate a PPI within the context of all possible predictions. Leveraging this context, we introduce a novel modeling framework called Reciprocal Perspective (RP), which estimates a localized threshold on a per-protein basis using several rank order metrics. By considering a putative PPI from the perspective of each of the proteins within the pair, RP rescores the predicted PPI and applies a cascaded Random Forest classifier leading to improvements in recall and precision. We here validate RP using two state-of-the-art PPI predictors, the Protein-protein Interaction Prediction Engine and the Scoring PRotein INTeractions methods, over five organisms: Homo sapiens, Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, and Mus musculus. Results demonstrate the application of a post hoc RP rescoring layer significantly improves classification (p < 0.001) in all cases over all organisms and this new rescoring approach can apply to any PPI prediction method.
Collapse
|
21
|
The PPI network analysis of mRNA expression profile of uterus from primary dysmenorrheal rats. Sci Rep 2018; 8:351. [PMID: 29321498 PMCID: PMC5762641 DOI: 10.1038/s41598-017-18748-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 12/15/2017] [Indexed: 11/08/2022] Open
Abstract
To elucidate the mechanisms of molecular regulations underlying primary dysmenorrhea (PD), we used our previously published mRNA expression profile of uterus from PD syndrome rats to construct protein-protein interactions (PPI) network via STRING Interactome. Consequently, 34 subnetworks, including a "continent" (Subnetwork 1) and 33 "islands" (Subnetwork 2-34) were generated. The nodes, with relative expression ratios, were visualized in the PPI networks and their connections were identified. Through path and module exploring in the network, the bridges were found from pathways of cellular response to calcium ion, SMAD protein signal transduction, regulation of transcription from RNA polymerase II promoter in response to stress and muscle stretch that were significantly enriched by the up-regulated mRNAs, to the cascades of cAMP metabolic processes and positive regulation of cyclase activities by the down-regulated ones. This link is mainly dependent on Fos/Jun - Vip connection. Our data, for the first time, report the PPI network analysis of differentially expressed mRNAs in the uterus of PD syndrome rats, to give insight into screening drugs and find new therapeutic strategies to relieve PD.
Collapse
|
22
|
Prediction of cassava protein interactome based on interolog method. Sci Rep 2017; 7:17206. [PMID: 29222529 PMCID: PMC5722940 DOI: 10.1038/s41598-017-17633-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Accepted: 11/28/2017] [Indexed: 12/20/2022] Open
Abstract
Cassava is a starchy root crop whose role in food security becomes more significant nowadays. Together with the industrial uses for versatile purposes, demand for cassava starch is continuously growing. However, in-depth study to uncover the mystery of cellular regulation, especially the interaction between proteins, is lacking. To reduce the knowledge gap in protein-protein interaction (PPI), genome-scale PPI network of cassava was constructed using interolog-based method (MePPI-In, available at http://bml.sbi.kmutt.ac.th/ppi). The network was constructed from the information of seven template plants. The MePPI-In included 90,173 interactions from 7,209 proteins. At least, 39 percent of the total predictions were found with supports from gene/protein expression data, while further co-expression analysis yielded 16 highly promising PPIs. In addition, domain-domain interaction information was employed to increase reliability of the network and guide the search for more groups of promising PPIs. Moreover, the topology and functional content of MePPI-In was similar to the networks of Arabidopsis and rice. The potential contribution of MePPI-In for various applications, such as protein-complex formation and prediction of protein function, was discussed and exemplified. The insights provided by our MePPI-In would hopefully enable us to pursue precise trait improvement in cassava.
Collapse
|
23
|
Ur Rehman H, Bari I, Ali A, Mahmood H. A Bayesian approach for estimating protein-protein interactions by integrating structural and non-structural biological data. MOLECULAR BIOSYSTEMS 2017; 13:2592-2602. [PMID: 29028065 DOI: 10.1039/c7mb00484b] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Accurate elucidation of genome wide protein-protein interactions is crucial for understanding the regulatory processes of the cell. High-throughput techniques, such as the yeast-2-hybrid (Y2H) assay, co-immunoprecipitation (co-IP), mass spectrometric (MS) protein complex identification, affinity purification (AP) etc., are generally relied upon to determine protein interactions. Unfortunately, each type of method is inherently subject to different types of noise and results in false positive interactions. On the other hand, precise understanding of proteins, especially knowledge of their functional associations is necessary for understanding how complex molecular machines function. To solve this problem, computational techniques are generally relied upon to precisely predict protein interactions. In this work, we present a novel method that combines structural and non-structural biological data to precisely predict protein interactions. The conceptual novelty of our approach lies in identifying and precisely associating biological information that provides substantial interaction clues. Our model combines structural and non-structural information using Bayesian statistics to calculate the likelihood of each interaction. The proposed model is tested on Saccharomyces cerevisiae's interactions extracted from the DIP and IntAct databases and provides substantial improvements in terms of accuracy, precision, recall and F1 score, as compared with the most widely used related state-of-the-art techniques.
Collapse
Affiliation(s)
- Hafeez Ur Rehman
- Department of Computer Science, FAST National University of Computer & Emerging Sciences, Peshawar, Pakistan.
| | | | | | | |
Collapse
|
24
|
Wang L, You ZH, Chen X, Xia SX, Liu F, Yan X, Zhou Y, Song KJ. A Computational-Based Method for Predicting Drug-Target Interactions by Using Stacked Autoencoder Deep Neural Network. J Comput Biol 2017; 25:361-373. [PMID: 28891684 DOI: 10.1089/cmb.2017.0135] [Citation(s) in RCA: 103] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Identifying the interaction between drugs and target proteins is an important area of drug research, which provides a broad prospect for low-risk and faster drug development. However, due to the limitations of traditional experiments when revealing drug-protein interactions (DTIs), the screening of targets not only takes a lot of time and money but also has high false-positive and false-negative rates. Therefore, it is imperative to develop effective automatic computational methods to accurately predict DTIs in the postgenome era. In this article, we propose a new computational method for predicting DTIs from drug molecular structure and protein sequence by using the stacked autoencoder of deep learning, which can adequately extract the raw data information. The proposed method has the advantage that it can automatically mine the hidden information from protein sequences and generate highly representative features through iterations of multiple layers. The feature descriptors are then constructed by combining the molecular substructure fingerprint information, and fed into the rotation forest for accurate prediction. The experimental results of fivefold cross-validation indicate that the proposed method achieves superior performance on gold standard data sets (enzymes, ion channels, GPCRs [G-protein-coupled receptors], and nuclear receptors) with accuracy of 0.9414, 0.9116, 0.8669, and 0.8056, respectively. We further comprehensively explore the performance of the proposed method by comparing it with other feature extraction algorithms, state-of-the-art classifiers, and other excellent methods on the same data set. The excellent comparison results demonstrate that the proposed method is highly competitive when predicting drug-target interactions.
Collapse
Affiliation(s)
- Lei Wang
- 1 School of Computer Science and Technology, China University of Mining and Technology , Xuzhou, China .,2 College of Information Science and Engineering, Zaozhuang University , Zaozhuang, China
| | - Zhu-Hong You
- 3 Xinjiang Technical Institutes of Physics and Chemistry , Chinese Academy of Science, Urumqi, China
| | - Xing Chen
- 4 School of Information and Control Engineering, China University of Mining and Technology , Xuzhou, China
| | - Shi-Xiong Xia
- 1 School of Computer Science and Technology, China University of Mining and Technology , Xuzhou, China
| | - Feng Liu
- 5 China National Coal Association , Beijing, China
| | - Xin Yan
- 6 School of Foreign Languages, Zaozhuang University , Zaozhuang, China
| | - Yong Zhou
- 1 School of Computer Science and Technology, China University of Mining and Technology , Xuzhou, China
| | - Ke-Jian Song
- 7 School of Information Engineering, JiangXi University of Science and Technology , Ganzhou, China
| |
Collapse
|
25
|
Wang Y, You Z, Li X, Chen X, Jiang T, Zhang J. PCVMZM: Using the Probabilistic Classification Vector Machines Model Combined with a Zernike Moments Descriptor to Predict Protein-Protein Interactions from Protein Sequences. Int J Mol Sci 2017; 18:ijms18051029. [PMID: 28492483 PMCID: PMC5454941 DOI: 10.3390/ijms18051029] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Revised: 04/24/2017] [Accepted: 04/29/2017] [Indexed: 01/08/2023] Open
Abstract
Protein–protein interactions (PPIs) are essential for most living organisms’ process. Thus, detecting PPIs is extremely important to understand the molecular mechanisms of biological systems. Although many PPIs data have been generated by high-throughput technologies for a variety of organisms, the whole interatom is still far from complete. In addition, the high-throughput technologies for detecting PPIs has some unavoidable defects, including time consumption, high cost, and high error rate. In recent years, with the development of machine learning, computational methods have been broadly used to predict PPIs, and can achieve good prediction rate. In this paper, we present here PCVMZM, a computational method based on a Probabilistic Classification Vector Machines (PCVM) model and Zernike moments (ZM) descriptor for predicting the PPIs from protein amino acids sequences. Specifically, a Zernike moments (ZM) descriptor is used to extract protein evolutionary information from Position-Specific Scoring Matrix (PSSM) generated by Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then, PCVM classifier is used to infer the interactions among protein. When performed on PPIs datasets of Yeast and H. Pylori, the proposed method can achieve the average prediction accuracy of 94.48% and 91.25%, respectively. In order to further evaluate the performance of the proposed method, the state-of-the-art support vector machines (SVM) classifier is used and compares with the PCVM model. Experimental results on the Yeast dataset show that the performance of PCVM classifier is better than that of SVM classifier. The experimental results indicate that our proposed method is robust, powerful and feasible, which can be used as a helpful tool for proteomics research.
Collapse
Affiliation(s)
- Yanbin Wang
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
| | - Zhuhong You
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
| | - Xiao Li
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China.
| | - Tonghai Jiang
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
| | - Jingting Zhang
- Department of Mathematics and Statistics, Henan University, Kaifeng 100190, China.
| |
Collapse
|