1
|
Ahmad U, Abdullah S, Chau DM, Chia SL, Yusoff K, Chan SC, Ong TA, Razack AH, Veerakumarasivam A. Analysis of PPI networks of transcriptomic expression identifies hub genes associated with Newcastle disease virus persistent infection in bladder cancer. Sci Rep 2023; 13:7323. [PMID: 37147328 PMCID: PMC10162992 DOI: 10.1038/s41598-022-20521-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 09/14/2022] [Indexed: 05/07/2023] Open
Abstract
Bladder cancer cells can acquire persistent infection of oncolytic Newcastle disease virus (NDV) but the molecular mechanism(s) remain unelucidated. This poses a major barrier to the effective clinical translation of oncolytic NDV virotherapy of cancers. To improve our understanding of the molecular mechanism(s) associated with the development of NDV persistent infection in bladder cancer, we used mRNA expression profiles of persistently infected bladder cancer cells to construct PPI networks. Based on paths and modules in the PPI network, the bridges were found mainly in the upregulated mRNA-pathways of p53 signalling, ECM-receptor interaction, and TGF-beta signalling and downregulated mRNA-pathways of antigen processing and presentation, protein processing in endoplasmic reticulum, completement and coagulation cascades in persistent TCCSUPPi cells. In persistent EJ28Pi cells, connections were identified mainly through upregulated mRNA-pathways of renal carcinoma, viral carcinogenesis, Ras signalling and cell cycle and the downregulated mRNA-pathways of Wnt signalling, HTLV-I infection and pathways in cancers. These connections were mainly dependent on RPL8-HSPA1A/HSPA4 in TCCSUPPi cells and EP300, PTPN11, RAC1-TP53, SP1, CCND1 and XPO1 in EJ28Pi cells. Oncomine validation showed that the top hub genes identified in the networks that include RPL8, THBS1, F2 from TCCSUPPi and TP53 and RAC1 from EJ28Pi are involved in the development and progression of bladder cancer. Protein-drug interaction networks identified several putative drug targets that could be used to disrupt the linkages between the modules and prevent bladder cancer cells from acquiring NDV persistent infection. This novel PPI network analysis of differentially expressed mRNAs of NDV persistently infected bladder cancer cell lines provide an insight into the molecular mechanisms of NDV persistency of infection in bladder cancers and the future screening of drugs that can be used together with NDV to enhance its oncolytic efficacy.
Collapse
Affiliation(s)
- Umar Ahmad
- Medical Genetics Laboratory, Genetics and Regenerative Medicine Research Centre, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Medical Genetics Unit, Faculty of Basic Medical Sciences, Bauchi State University, Gadau, PMB 65, Itas/Gadau, Nigeria
| | - Syahril Abdullah
- Medical Genetics Laboratory, Genetics and Regenerative Medicine Research Centre, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- MAKNA Cancer Research Laboratory, Institute of Bioscience, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
| | - De Ming Chau
- Medical Genetics Laboratory, Genetics and Regenerative Medicine Research Centre, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
| | - Suet Lin Chia
- MAKNA Cancer Research Laboratory, Institute of Bioscience, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Department of Microbiology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor Darul Ehsan, Malaysia
| | - Khatijah Yusoff
- Department of Microbiology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor Darul Ehsan, Malaysia
- Malaysia Genome Institute, Ministry of Science, Technology and Innovation, Jalan Bangi, 43000, Kajang, Selangor Darul Ehsan, Malaysia
| | - Soon Choy Chan
- School of Liberal Arts, Science and Technology (PUScLST), Perdana University, Perdana University, 50490, Kuala Lumpur, Malaysia
| | - Teng Aik Ong
- Department of Surgery, Faculty of Medicine, University of Malaya, Wilayah Persekutuan, Kuala Lumpur, Malaysia
| | - Azad Hassan Razack
- Department of Surgery, Faculty of Medicine, University of Malaya, Wilayah Persekutuan, Kuala Lumpur, Malaysia
| | - Abhi Veerakumarasivam
- Medical Genetics Laboratory, Genetics and Regenerative Medicine Research Centre, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia.
- Department of Biological Sciences, School of Medical and Life Sciences, Sunway University, 47500, Bandar Sunway, Selangor Darul Ehsan, Malaysia.
| |
Collapse
|
2
|
Han Y, Zhang SW. Docsubty: FLAncRPI-LGAT: Prediction of ncRNA-Protein Interactions with Line Graph Attention Network Framework. Comput Struct Biotechnol J 2023; 21:2286-2295. [PMID: 37035546 PMCID: PMC10073990 DOI: 10.1016/j.csbj.2023.03.027] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 03/11/2023] [Accepted: 03/16/2023] [Indexed: 03/19/2023] Open
Abstract
Identification of ncRNA-protein interactions (ncRPIs) through wet experiments is still time-consuming and highly-costly. Although several computational approaches have been developed to predict ncRPIs using the structure and sequence information of ncRNAs and proteins, the prediction accuracy needs to be improved, and the results lack interpretability. In this work, we proposed a novel computational method (called ncRPI-LGAT) to predict the ncRNA-Protein Interactions by transforming the link prediction (i.e., subgraph classification) task into a node classification task in the line network, and introducing a Line Graph ATtention network framework. ncRPI-LGAT first extracts the ncRNA/protein attributes using node2vec, and then generates the local enclosing subgraph of a target ncRNA-protein pair with SEAL. Because using the pooling operations in local enclosing subgraphs to learn a fixed-size feature vector for representing ncRNAs/proteins will cause the information loss, ncRPI-LGAT converts the local enclosing subgraphs into their corresponding line graphs, in which the node corresponds to the edge (i.e., ncRNA-protein pair) of the local enclosing subgraphs. Then, the attention mechanism-based graph neural network GATv2 is used on these line graphs to efficiently learn the embedding features of the target nodes (i.e., ncRNA-protein pairs) by focusing on learning the significance of one ncRNA-protein pair to another ncRNA-protein pair. These embedding features of one ncRNA-protein pair obtained from multi-head attention are concatenated in series and then fed them into a fully connected network to predict ncRPIs. Compared with other state-of-the-art methods in the 5CV test, ncRPI-LGAT shows superior performance on three benchmark datasets, demonstrating the effectiveness of our ncRPI-LGAT method in predicting ncRNA-protein interactions.
Collapse
|
3
|
Vora DS, Kalakoti Y, Sundar D. Computational Methods and Deep Learning for Elucidating Protein Interaction Networks. Methods Mol Biol 2023; 2553:285-323. [PMID: 36227550 DOI: 10.1007/978-1-0716-2617-7_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Protein interactions play a critical role in all biological processes, but experimental identification of protein interactions is a time- and resource-intensive process. The advances in next-generation sequencing and multi-omics technologies have greatly benefited large-scale predictions of protein interactions using machine learning methods. A wide range of tools have been developed to predict protein-protein, protein-nucleic acid, and protein-drug interactions. Here, we discuss the applications, methods, and challenges faced when employing the various prediction methods. We also briefly describe ways to overcome the challenges and prospective future developments in the field of protein interaction biology.
Collapse
Affiliation(s)
- Dhvani Sandip Vora
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India
| | - Yogesh Kalakoti
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India
| | - Durai Sundar
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India.
- School of Artificial Intelligence, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India.
| |
Collapse
|
4
|
Jha K, Saha S, Singh H. Prediction of protein-protein interaction using graph neural networks. Sci Rep 2022; 12:8360. [PMID: 35589837 PMCID: PMC9120162 DOI: 10.1038/s41598-022-12201-9] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 04/18/2022] [Indexed: 01/09/2023] Open
Abstract
Proteins are the essential biological macromolecules required to perform nearly all biological processes, and cellular functions. Proteins rarely carry out their tasks in isolation but interact with other proteins (known as protein-protein interaction) present in their surroundings to complete biological activities. The knowledge of protein-protein interactions (PPIs) unravels the cellular behavior and its functionality. The computational methods automate the prediction of PPI and are less expensive than experimental methods in terms of resources and time. So far, most of the works on PPI have mainly focused on sequence information. Here, we use graph convolutional network (GCN) and graph attention network (GAT) to predict the interaction between proteins by utilizing protein's structural information and sequence features. We build the graphs of proteins from their PDB files, which contain 3D coordinates of atoms. The protein graph represents the amino acid network, also known as residue contact network, where each node is a residue. Two nodes are connected if they have a pair of atoms (one from each node) within the threshold distance. To extract the node/residue features, we use the protein language model. The input to the language model is the protein sequence, and the output is the feature vector for each amino acid of the underlying sequence. We validate the predictive capability of the proposed graph-based approach on two PPI datasets: Human and S. cerevisiae. Obtained results demonstrate the effectiveness of the proposed approach as it outperforms the previous leading methods. The source code for training and data to train the model are available at https://github.com/JhaKanchan15/PPI_GNN.git .
Collapse
Affiliation(s)
- Kanchan Jha
- Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, Bihar, 801103, India.
| | - Sriparna Saha
- Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, Bihar, 801103, India
| | - Hiteshi Singh
- Department of Electrical Engineering, Indian Institute of Technology Jodhpur, Jodhpur, Rajasthan, 342030, India
| |
Collapse
|
5
|
Sahni G, Mewara B, Lalwani S, Kumar R. CF-PPI: Centroid based new feature extraction approach for Protein-Protein Interaction Prediction. J EXP THEOR ARTIF IN 2022. [DOI: 10.1080/0952813x.2022.2052189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Gunjan Sahni
- Department of Computer Science and Engineering, Career Point University, Kota, India
| | - Bhawna Mewara
- Department of Computer Science and Engineering, Career Point University, Kota, India
| | - Soniya Lalwani
- Department of Mathematics, Career Point University, Kota, India
| | - Rajesh Kumar
- Department of Electrical Engineering, Malaviya National Institute of Technology, Jaipur, India
| |
Collapse
|
6
|
Tsagris M, Papadovasilakis Z, Lakiotaki K, Tsamardinos I. The γ-OMP Algorithm for Feature Selection With Application to Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1214-1224. [PMID: 33035156 DOI: 10.1109/tcbb.2020.3029952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Feature selection for predictive analytics is the problem of identifying a minimal-size subset of features that is maximally predictive of an outcome of interest. To apply to molecular data, feature selection algorithms need to be scalable to tens of thousands of features. In this paper, we propose γ-OMP, a generalisation of the highly-scalable Orthogonal Matching Pursuit feature selection algorithm. γ-OMP can handle (a)various types of outcomes, such as continuous, binary, nominal, time-to-event, (b)discrete (categorical)features, (c)different statistical-based stopping criteria, (d)several predictive models (e.g., linear or logistic regression), (e)various types of residuals, and (f)different types of association. We compare γ-OMP against LASSO, a prototypical, widely used algorithm for high-dimensional data. On both simulated data and several real gene expression datasets, γ-OMP is on par, or outperforms LASSO in binary classification (case-control data), regression (quantified outcomes), and time-to-event data (censored survival times). γ-OMP is based on simple statistical ideas, it is easy to implement and to extend, and our extensive evaluation shows that it is also effective in bioinformatics analysis settings.
Collapse
|
7
|
Younis H, Anwar MW, Khan MUG, Sikandar A, Bajwa UI. A New Sequential Forward Feature Selection (SFFS) Algorithm for Mining Best Topological and Biological Features to Predict Protein Complexes from Protein-Protein Interaction Networks (PPINs). Interdiscip Sci 2021; 13:371-388. [PMID: 33959851 DOI: 10.1007/s12539-021-00433-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Revised: 04/09/2021] [Accepted: 04/15/2021] [Indexed: 10/21/2022]
Abstract
Protein-protein interaction plays an important role in the understanding of biological processes in the body. A network of dynamic protein complexes within a cell that regulates most biological processes is known as a protein-protein interaction network (PPIN). Complex prediction from PPINs is a challenging task. Most of the previous computation approaches mine cliques, stars, linear and hybrid structures as complexes from PPINs by considering topological features and fewer of them focus on important biological information contained within protein amino acid sequence. In this study, we have computed a wide variety of topological features and integrate them with biological features computed from protein amino acid sequence such as bag of words, physicochemical and spectral domain features. We propose a new Sequential Forward Feature Selection (SFFS) algorithm, i.e., random forest-based Boruta feature selection for selecting the best features from computed large feature set. Decision tree, linear discriminant analysis and gradient boosting classifiers are used as learners. We have conducted experiments by considering two reference protein complex datasets of yeast, i.e., CYC2008 and MIPS. Human and mouse complex information is taken from CORUM 3.0 dataset. Protein interaction information is extracted from the database of interacting proteins (DIP). Our proposed SFFS, i.e., random forest-based Brouta feature selection in combination with decision trees, linear discriminant analysis and Gradient Boosting Classifiers outperforms other state of art algorithms by achieving precision, recall and F-measure rates, i.e. 94.58%, 94.92% and 94.45% for MIPS, 96.31%, 93.55% and 96.02% for CYC2008, 98.84%, 98.00%, 98.87 % for CORUM humans and 96.60%, 96.70%, 96.32% for CORUM mouse dataset complexes, respectively.
Collapse
Affiliation(s)
- Haseeb Younis
- School of Professional Advancement, University of Management and Technology, Lahore, Pakistan.,Department of Computer Science, COMSATS University Islamabad, Lahore, Pakistan
| | | | - Muhammad Usman Ghani Khan
- Department of Computer Science and Engineering, University of Engineering and Technology, Lahore, Pakistan
| | - Aisha Sikandar
- Govt. Girls Post Graduate College No.1 Abbottabad, Abbottabad, Pakistan
| | - Usama Ijaz Bajwa
- Department of Computer Science, COMSATS University Islamabad, Lahore, Pakistan
| |
Collapse
|
8
|
Long Noncoding RNA THAP9-AS1 and TSPOAP1-AS1 Provide Potential Diagnostic Signatures for Pediatric Septic Shock. BIOMED RESEARCH INTERNATIONAL 2020; 2020:7170464. [PMID: 33344646 PMCID: PMC7725549 DOI: 10.1155/2020/7170464] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 11/07/2020] [Accepted: 11/24/2020] [Indexed: 12/21/2022]
Abstract
Background Sepsis is a systemic inflammatory syndrome caused by infection with a high incidence and mortality. Although long noncoding RNAs have been identified to be closely involved in many inflammatory diseases, little is known about the role of lncRNAs in pediatric septic shock. Methods We downloaded the mRNA profiles GSE13904 and GSE4607, of which GSE13904 includes 106 blood samples of pediatric patients with septic shock and 18 health control samples; GSE4607 includes 69 blood samples of pediatric patients with septic shock and 15 health control samples. The differentially expressed lncRNAs were identified through the limma R package; meanwhile, GO terms and KEGG pathway enrichment analysis was performed via the clusterProfiler R package. The protein-protein interaction (PPI) network was constructed based on the STRING database using the targets of differently expressed lncRNAs. The MCODE plug-in of Cytoscape was used to screen significant clustering modules composed of key genes. Finally, stepwise regression analysis was performed to screen the optimal lncRNAs and construct the logistic regression model, and the ROC curve was applied to evaluate the accuracy of the model. Results A total of 13 lncRNAs which simultaneously exhibited significant differences in the septic shock group compared with the control group from two sets were identified. According to the 18 targets of differentially expressed lncRNAs, we identified some inflammatory and immune response-related pathways. In addition, several target mRNAs were predicted to be potentially involved in the occurrence of septic shock. The logistic regression model constructed based on two optimal lncRNAs THAP9-AS1 and TSPOAP1-AS1 could efficiently separate samples with septic shock from normal controls. Conclusion In summary, a predictive model based on the lncRNAs THAP9-AS1 and TSPOAP1-AS1 provided novel lightings on diagnostic research of septic shock.
Collapse
|
9
|
Zhong L, Zhen M, Sun J, Zhao Q. Recent advances on the machine learning methods in predicting ncRNA-protein interactions. Mol Genet Genomics 2020; 296:243-258. [PMID: 33006667 DOI: 10.1007/s00438-020-01727-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 09/17/2020] [Indexed: 12/22/2022]
Abstract
Recent transcriptomics and bioinformatics studies have shown that ncRNAs can affect chromosome structure and gene transcription, participate in the epigenetic regulation, and take part in diseases such as tumorigenesis. Biologists have found that most ncRNAs usually work by interacting with the corresponding RNA-binding proteins. Therefore, ncRNA-protein interaction is a very popular study in both the biological and medical fields. However, due to the limitations of manual experiments in the laboratory, machine-learning methods for predicting ncRNA-protein interactions are increasingly favored by the researchers. In this review, we summarize several machine learning predictive models of ncRNA-protein interactions over the past few years, and briefly describe the characteristics of these machine learning models. In order to optimize the performance of machine learning models to better predict ncRNA-protein interactions, we give some promising future computational directions at the end.
Collapse
Affiliation(s)
- Lin Zhong
- School of Mathematics, Liaoning University, Shenyang, 110036, China
| | - Meiqin Zhen
- Beijing Chest Hospital, Capital Medical University/Beijing Tuberculosis and Thoracic Tumor Research Institute, Beijing, 101149, China
| | - Jianqiang Sun
- School of Automation and Electrical Engineering, Linyi University, Linyi, 276000, China
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China.
| |
Collapse
|
10
|
Khatun MS, Shoombuatong W, Hasan MM, Kurata H. Evolution of Sequence-based Bioinformatics Tools for Protein-protein Interaction Prediction. Curr Genomics 2020; 21:454-463. [PMID: 33093807 PMCID: PMC7536797 DOI: 10.2174/1389202921999200625103936] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 03/19/2020] [Accepted: 05/27/2020] [Indexed: 12/22/2022] Open
Abstract
Protein-protein interactions (PPIs) are the physical connections between two or more proteins via electrostatic forces or hydrophobic effects. Identification of the PPIs is pivotal, which contributes to many biological processes including protein function, disease incidence, and therapy design. The experimental identification of PPIs via high-throughput technology is time-consuming and expensive. Bioinformatics approaches are expected to solve such restrictions. In this review, our main goal is to provide an inclusive view of the existing sequence-based computational prediction of PPIs. Initially, we briefly introduce the currently available PPI databases and then review the state-of-the-art bioinformatics approaches, working principles, and their performances. Finally, we discuss the caveats and future perspective of the next generation algorithms for the prediction of PPIs.
Collapse
Affiliation(s)
| | | | - Md. Mehedi Hasan
- Address correspondence to these authors at the Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan; Tel: +81-948-297-828; E-mail: and Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828; E-mail:
| | - Hiroyuki Kurata
- Address correspondence to these authors at the Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan; Tel: +81-948-297-828; E-mail: and Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828; E-mail:
| |
Collapse
|
11
|
Gui YM, Wang RJ, Wang X, Wei YY. Using Deep Neural Networks to Improve the Performance of Protein–Protein Interactions Prediction. INT J PATTERN RECOGN 2020. [DOI: 10.1142/s0218001420520126] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein–protein interactions (PPIs) help to elucidate the molecular mechanisms of life activities and have a certain role in promoting disease treatment and new drug development. With the advent of the proteomics era, some PPIs prediction methods have emerged. However, the performances of these PPIs prediction methods still need to be optimized and improved. In order to optimize the performance of the PPIs prediction methods, we used the dropout method to reduce over-fitting by deep neural networks (DNNs), and combined with three types of feature extraction methods, conjoint triad (CT), auto covariance (AC) and local descriptor (LD), to build DNN models based on amino acid sequences. The results showed that the accuracy of the CT, AC and LD increased from 97.11% to 98.12%, 96.84% to 98.17%, and 95.30% to 95.60%, respectively. The loss values of the CT, AC and LD decreased from 27.47% to 14.96%, 65.91% to 17.82% and 36.23% to 15.34%, respectively. Experimental results show that dropout can optimize the performances of the DNN models. The results can provide a resource for scholars in future studies involving the prediction of PPIs. The experimental code is available at https://github.com/smalltalkman/hppi-tensorflow .
Collapse
Affiliation(s)
- Yuan-Miao Gui
- Institute of Intelligent Machines, Hefei Institute of Physics, Chinese Academy of Sciences, Hefei City, Anhui Province, P. R. China
- University of Science and Technology of China, Hefei City, Anhui Province, P. R. China
| | - Ru-Jing Wang
- Institute of Intelligent Machines, Hefei Institute of Physics, Chinese Academy of Sciences, Hefei City, Anhui Province, P. R. China
| | - Xue Wang
- Institute of Intelligent Machines, Hefei Institute of Physics, Chinese Academy of Sciences, Hefei City, Anhui Province, P. R. China
| | - Yuan-Yuan Wei
- Institute of Intelligent Machines, Hefei Institute of Physics, Chinese Academy of Sciences, Hefei City, Anhui Province, P. R. China
| |
Collapse
|
12
|
Li Z, Nie R, You Z, Cao C, Li J. Using discriminative vector machine model with 2DPCA to predict interactions among proteins. BMC Bioinformatics 2019; 20:694. [PMID: 31874626 PMCID: PMC6929273 DOI: 10.1186/s12859-019-3268-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Background The interactions among proteins act as crucial roles in most cellular processes. Despite enormous effort put for identifying protein-protein interactions (PPIs) from a large number of organisms, existing firsthand biological experimental methods are high cost, low efficiency, and high false-positive rate. The application of in silico methods opens new doors for predicting interactions among proteins, and has been attracted a great deal of attention in the last decades. Results Here we present a novelty computational model with the adoption of our proposed Discriminative Vector Machine (DVM) model and a 2-Dimensional Principal Component Analysis (2DPCA) descriptor to identify candidate PPIs only based on protein sequences. To be more specific, a 2DPCA descriptor is employed to capture discriminative feature information from Position-Specific Scoring Matrix (PSSM) of amino acid sequences by the tool of PSI-BLAST. Then, a robust and powerful DVM classifier is employed to infer PPIs. When applied on both gold benchmark datasets of Yeast and H. pylori, our model obtained mean prediction accuracies as high as of 97.06 and 92.89%, respectively, which demonstrates a noticeable improvement than some state-of-the-art methods. Moreover, we constructed Support Vector Machines (SVM) based predictive model and made comparison it with our model on Human benchmark dataset. In addition, to further demonstrate the predictive reliability of our proposed method, we also carried out extensive experiments for identifying cross-species PPIs on five other species datasets. Conclusions All the experimental results indicate that our method is very effective for identifying potential PPIs and could serve as a practical approach to aid bioexperiment in proteomics research.
Collapse
Affiliation(s)
- Zhengwei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China.,Mine Digitization Engineering Research Center of Ministry of Education, China University of Mining and Technology, Xuzhou, 221116, China.,Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Shanghai, 201804, China.,KUNPAND Communications (Kunshan) Co., Ltd., Suzhou, 215300, China
| | - Ru Nie
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China. .,Mine Digitization Engineering Research Center of Ministry of Education, China University of Mining and Technology, Xuzhou, 221116, China.
| | - Zhuhong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, 830011, China
| | - Chen Cao
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, T2N 4N1, Canada
| | - Jiashu Li
- Mine Digitization Engineering Research Center of Ministry of Education, China University of Mining and Technology, Xuzhou, 221116, China.
| |
Collapse
|
13
|
Zhong L, Ming Z, Xie G, Fan C, Piao X. Recent Advances on the Semi-Supervised Learning for Long Non-Coding RNA-Protein Interactions Prediction: A Review. Protein Pept Lett 2019; 27:385-391. [PMID: 31654509 DOI: 10.2174/0929866526666191025104043] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 05/30/2019] [Accepted: 09/24/2019] [Indexed: 12/24/2022]
Abstract
In recent years, more and more evidence indicates that long non-coding RNA (lncRNA) plays a significant role in the development of complex biological processes, especially in RNA progressing, chromatin modification, and cell differentiation, as well as many other processes. Surprisingly, lncRNA has an inseparable relationship with human diseases such as cancer. Therefore, only by knowing more about the function of lncRNA can we better solve the problems of human diseases. However, lncRNAs need to bind to proteins to perform their biomedical functions. So we can reveal the lncRNA function by studying the relationship between lncRNA and protein. But due to the limitations of traditional experiments, researchers often use computational prediction models to predict lncRNA protein interactions. In this review, we summarize several computational models of the lncRNA protein interactions prediction base on semi-supervised learning during the past two years, and introduce their advantages and shortcomings briefly. Finally, the future research directions of lncRNA protein interaction prediction are pointed out.
Collapse
Affiliation(s)
- Lin Zhong
- School of Mathematics, Liaoning University, Shenyang, 110036, China
| | - Zhong Ming
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, 518060, China.,College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Guobo Xie
- School of Computer Science, Guangdong University of Technology, Guangzhou, 510006, China
| | - Chunlong Fan
- College of Computer Science, Shenyang Aerospace University, Shenyang, 110136, China
| | - Xue Piao
- School of Medical Informatics, Xuzhou Medical University, Xuzhou, 221004, China
| |
Collapse
|
14
|
Wang X, Wu Y, Wang R, Wei Y, Gui Y. A novel matrix of sequence descriptors for predicting protein-protein interactions from amino acid sequences. PLoS One 2019; 14:e0217312. [PMID: 31173605 PMCID: PMC6555512 DOI: 10.1371/journal.pone.0217312] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 05/08/2019] [Indexed: 12/20/2022] Open
Abstract
Protein-protein interactions (PPIs) play an important role in the life activities of organisms. With the availability of large amounts of protein sequence data, PPIs prediction methods have attracted increasing attention. A variety of protein sequence coding methods have emerged, but the training of these methods is particularly time consuming. To solve this issue, we have proposed a novel matrix sequence coding method. Based on deep neural network (DNN) and a novel matrix protein sequence descriptor, we constructed a protein interaction prediction model for predicting PPIs. When performed on human PPIs data, the method achieved an accuracy of 94.34%, a recall of 98.28%, an area under the curve (AUC) of 97.79% and a loss of 23.25%. A non-redundant dataset was used to evaluate this prediction model, and the prediction accuracy is 88.29%. These results indicate that the matrix of sequence (MOS) descriptor can enhance the predictive power of PPIs and reduce training time, which can be a useful complement for future proteomics research. The experimental code and experimental results can be found at https://github.com/smalltalkman/hppi-tensorflow.
Collapse
Affiliation(s)
- Xue Wang
- Institute of Technical Biology & Agriculture Engineering, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
- University of Science and Technology of China, HeFei City, AnHui Province, China
- Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
| | - Yuejin Wu
- Institute of Technical Biology & Agriculture Engineering, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
- University of Science and Technology of China, HeFei City, AnHui Province, China
| | - Rujing Wang
- University of Science and Technology of China, HeFei City, AnHui Province, China
- Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
| | - Yuanyuan Wei
- Institute of Technical Biology & Agriculture Engineering, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
| | - Yuanmiao Gui
- Institute of Technical Biology & Agriculture Engineering, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei City, AnHui Province, China
- University of Science and Technology of China, HeFei City, AnHui Province, China
- * E-mail:
| |
Collapse
|
15
|
Chen ZH, Li LP, He Z, Zhou JR, Li Y, Wong L. An Improved Deep Forest Model for Predicting Self-Interacting Proteins From Protein Sequence Using Wavelet Transformation. Front Genet 2019; 10:90. [PMID: 30881376 PMCID: PMC6405691 DOI: 10.3389/fgene.2019.00090] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 01/29/2019] [Indexed: 12/23/2022] Open
Abstract
Self-interacting proteins (SIPs), whose more than two identities can interact with each other, play significant roles in the understanding of cellular process and cell functions. Although a number of experimental methods have been designed to detect the SIPs, they remain to be extremely time-consuming, expensive, and challenging even nowadays. Therefore, there is an urgent need to develop the computational methods for predicting SIPs. In this study, we propose a deep forest based predictor for accurate prediction of SIPs using protein sequence information. More specifically, a novel feature representation method, which integrate position-specific scoring matrix (PSSM) with wavelet transform, is introduced. To evaluate the performance of the proposed method, cross-validation tests are performed on two widely used benchmark datasets. The experimental results show that the proposed model achieved high accuracies of 95.43 and 93.65% on human and yeast datasets, respectively. The AUC value for evaluating the performance of the proposed method was also reported. The AUC value for yeast and human datasets are 0.9203 and 0.9586, respectively. To further show the advantage of the proposed method, it is compared with several existing methods. The results demonstrate that the proposed model is better than other SIPs prediction methods. This work can offer an effective architecture to biologists in detecting new SIPs.
Collapse
Affiliation(s)
- Zhan-Heng Chen
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Li-Ping Li
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Zhou He
- College of Engineering and Applied Science, University of Colorado Boulder, Boulder, CO, United States
| | - Ji-Ren Zhou
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Yangming Li
- ECTET, Rochester Institute of Technology, Rochester, NY, United States
| | - Leon Wong
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
16
|
In silico-prediction of protein-protein interactions network about MAPKs and PP2Cs reveals a novel docking site variants in Brachypodium distachyon. Sci Rep 2018; 8:15083. [PMID: 30305661 PMCID: PMC6180098 DOI: 10.1038/s41598-018-33428-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 09/13/2018] [Indexed: 12/26/2022] Open
Abstract
Protein-protein interactions (PPIs) underlie the molecular mechanisms of most biological processes. Mitogen-activated protein kinases (MAPKs) can be dephosphorylated by MAPK-specific phosphatases such as PP2C, which are critical to transduce extracellular signals into adaptive and programmed responses. However, the experimental approaches for identifying PPIs are expensive, time-consuming, laborious and challenging. In response, many computational methods have been developed to predict PPIs. Yet, these methods have inherent disadvantages such as high false positive and negative results. Thus, it is crucial to develop in silico approaches for predicting PPIs efficiently and accurately. In this study, we identified PPIs among 16 BdMAPKs and 86 BdPP2Cs in B. distachyon using a novel docking approach. Further, we systematically investigated the docking site (D-site) of BdPP2C which plays a vital role for recognition and docking of BdMAPKs. D-site analysis revealed that there were 96 pairs of PPIs including all BdMAPKs and most BdPP2Cs, which indicated that BdPP2C may play roles in other signaling networks. Moreover, most BdPP2Cs have a D-site for BdMAPKs in our prediction results, which suggested that our method can effectively predict PPIs, as confirmed by their 3D structure. In addition, we validated this methodology with known Arabidopsis and yeast phosphatase-MAPK interactions from the STRING database. The results obtained provide a vital research resource for exploring an accurate network of PPIs between BdMAPKs and BdPP2Cs.
Collapse
|
17
|
PPInS: a repository of protein-protein interaction sitesbase. Sci Rep 2018; 8:12453. [PMID: 30127348 PMCID: PMC6102274 DOI: 10.1038/s41598-018-30999-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 08/03/2018] [Indexed: 01/14/2023] Open
Abstract
Protein-Protein Interaction Sitesbase (PPInS), a high-performance database of protein-protein interacting interfaces, is presented. The atomic level information of the molecular interaction happening amongst various protein chains in protein-protein complexes (as reported in the Protein Data Bank [PDB]) together with their evolutionary information in Structural Classification of Proteins (SCOPe release 2.06), is made available in PPInS. Total 32468 PDB files representing X-ray crystallized multimeric protein-protein complexes with structural resolution better than 2.5 Å had been shortlisted to demarcate the protein-protein interaction interfaces (PPIIs). A total of 111857 PPIIs with ~32.24 million atomic contact pairs (ACPs) were generated and made available on a web server for on-site analysis and downloading purpose. All these PPIIs and protein-protein interacting patches (PPIPs) involved in them, were also analyzed in terms of a number of residues contributing in patch formation, their hydrophobic nature, amount of surface area they contributed in binding, and their homo and heterodimeric nature, to describe the diversity of information covered in PPInS. It was observed that 42.37% of total PPIPs were made up of 6–20 interacting residues, 53.08% PPIPs had interface area ≤1000 Å2 in PPII formation, 82.64% PPIPs were reported with hydrophobicity score of ≤10, and 73.26% PPIPs were homologous to each other with the sequence similarity score ranging from 75–100%. A subset “Non-Redundant Database (NRDB)” of the PPInS containing 2265 PPIIs, with over 1.8 million ACPs corresponding to the 1931 protein-protein complexes (PDBs), was also designed by removing structural redundancies at the level of SCOP superfamily (SCOP release 1.75). The web interface of the PPInS (http://www.cup.edu.in:99/ppins/home.php) offers an easy-to-navigate, intuitive and user-friendly environment, and can be accessed by providing PDB ID, SCOP superfamily ID, and protein sequence.
Collapse
|
18
|
Li Z, Liao B, Li Y, Liu W, Chen M, Cai L. Gene function prediction based on combining gene ontology hierarchy with multi-instance multi-label learning. RSC Adv 2018; 8:28503-28509. [PMID: 35542493 PMCID: PMC9083914 DOI: 10.1039/c8ra05122d] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Accepted: 07/12/2018] [Indexed: 12/04/2022] Open
Abstract
Gene function annotation is the main challenge in the post genome era, which is an important part of the genome annotation. The sequencing of the human genome project produces a whole genome data, providing abundant biological information for the study of gene function annotation. However, to obtain useful knowledge from a large amount of data, a potential strategy is to apply machine learning methods to mine these data and predict gene function. In this study, we improved multi-instance hierarchical clustering by using gene ontology hierarchy to annotate gene function, which combines gene ontology hierarchy with multi-instance multi-label learning frame structure. Then, we used multi-label support vector machine (MLSVM) and multi-label k-nearest neighbor (MLKNN) algorithm to predict the function of gene. Finally, we verified our method in four yeast expression datasets. The performance of the simulated experiments proved that our method is efficient.
Collapse
Affiliation(s)
- Zejun Li
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
- School of Computer and Information Science, Hunan Institute of Technology Hengyang 412002 China
| | - Bo Liao
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| | - Yun Li
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| | - Wenhua Liu
- School of Computer and Information Science, Hunan Institute of Technology Hengyang 412002 China
| | - Min Chen
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
- School of Computer and Information Science, Hunan Institute of Technology Hengyang 412002 China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| |
Collapse
|
19
|
Reciprocal Perspective for Improved Protein-Protein Interaction Prediction. Sci Rep 2018; 8:11694. [PMID: 30076341 PMCID: PMC6076239 DOI: 10.1038/s41598-018-30044-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 07/20/2018] [Indexed: 02/06/2023] Open
Abstract
All protein-protein interaction (PPI) predictors require the determination of an operational decision threshold when differentiating positive PPIs from negatives. Historically, a single global threshold, typically optimized via cross-validation testing, is applied to all protein pairs. However, we here use data visualization techniques to show that no single decision threshold is suitable for all protein pairs, given the inherent diversity of protein interaction profiles. The recent development of high throughput PPI predictors has enabled the comprehensive scoring of all possible protein-protein pairs. This, in turn, has given rise to context, enabling us now to evaluate a PPI within the context of all possible predictions. Leveraging this context, we introduce a novel modeling framework called Reciprocal Perspective (RP), which estimates a localized threshold on a per-protein basis using several rank order metrics. By considering a putative PPI from the perspective of each of the proteins within the pair, RP rescores the predicted PPI and applies a cascaded Random Forest classifier leading to improvements in recall and precision. We here validate RP using two state-of-the-art PPI predictors, the Protein-protein Interaction Prediction Engine and the Scoring PRotein INTeractions methods, over five organisms: Homo sapiens, Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, and Mus musculus. Results demonstrate the application of a post hoc RP rescoring layer significantly improves classification (p < 0.001) in all cases over all organisms and this new rescoring approach can apply to any PPI prediction method.
Collapse
|
20
|
Zhao Q, Zhang Y, Hu H, Ren G, Zhang W, Liu H. IRWNRLPI: Integrating Random Walk and Neighborhood Regularized Logistic Matrix Factorization for lncRNA-Protein Interaction Prediction. Front Genet 2018; 9:239. [PMID: 30023002 PMCID: PMC6040094 DOI: 10.3389/fgene.2018.00239] [Citation(s) in RCA: 69] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Accepted: 06/15/2018] [Indexed: 11/13/2022] Open
Abstract
Long non-coding RNA (lncRNA) plays an important role in many important biological processes and has attracted widespread attention. Although the precise functions and mechanisms for most lncRNAs are still unknown, we are certain that lncRNAs usually perform their functions by interacting with the corresponding RNA- binding proteins. For example, lncRNA-protein interactions play an important role in post transcriptional gene regulation, such as splicing, translation, signaling, and advances in complex diseases. However, experimental verification of lncRNA-protein interactions prediction is time-consuming and laborious. In this work, we propose a computational method, named IRWNRLPI, to find the potential associations between lncRNAs and proteins. IRWNRLPI integrates two algorithms, random walk and neighborhood regularized logistic matrix factorization, which can optimize a lot more than using an algorithm alone. Moreover, the method is semi-supervised and does not require negative samples. Based on the leave-one-out cross validation, we obtain the AUC of 0.9150 and the AUPR of 0.7138, demonstrating its reliable performance. In addition, by means of case study in the “Mus musculus,” many lncRNA-protein interactions which are predicted by our method can be successfully confirmed by experiments. This suggests that IRWNRLPI will be a useful bioinformatics resource in biomedical research.
Collapse
Affiliation(s)
- Qi Zhao
- School of Mathematics, Liaoning University, Shenyang, China.,Research Center for Computer Simulating and Information Processing of Bio-Macromolecules of Liaoning Province, Shenyang, China
| | - Yue Zhang
- School of Mathematics, Liaoning University, Shenyang, China
| | - Huan Hu
- School of Life Science, Liaoning University, Shenyang, China
| | - Guofei Ren
- School of Information, Liaoning University, Shenyang, China
| | - Wen Zhang
- School of Computer, Wuhan University, Wuhan, China
| | - Hongsheng Liu
- Research Center for Computer Simulating and Information Processing of Bio-Macromolecules of Liaoning Province, Shenyang, China.,School of Life Science, Liaoning University, Shenyang, China.,Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Shenyang, China
| |
Collapse
|
21
|
The PPI network analysis of mRNA expression profile of uterus from primary dysmenorrheal rats. Sci Rep 2018; 8:351. [PMID: 29321498 PMCID: PMC5762641 DOI: 10.1038/s41598-017-18748-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 12/15/2017] [Indexed: 11/08/2022] Open
Abstract
To elucidate the mechanisms of molecular regulations underlying primary dysmenorrhea (PD), we used our previously published mRNA expression profile of uterus from PD syndrome rats to construct protein-protein interactions (PPI) network via STRING Interactome. Consequently, 34 subnetworks, including a "continent" (Subnetwork 1) and 33 "islands" (Subnetwork 2-34) were generated. The nodes, with relative expression ratios, were visualized in the PPI networks and their connections were identified. Through path and module exploring in the network, the bridges were found from pathways of cellular response to calcium ion, SMAD protein signal transduction, regulation of transcription from RNA polymerase II promoter in response to stress and muscle stretch that were significantly enriched by the up-regulated mRNAs, to the cascades of cAMP metabolic processes and positive regulation of cyclase activities by the down-regulated ones. This link is mainly dependent on Fos/Jun - Vip connection. Our data, for the first time, report the PPI network analysis of differentially expressed mRNAs in the uterus of PD syndrome rats, to give insight into screening drugs and find new therapeutic strategies to relieve PD.
Collapse
|
22
|
Prediction of cassava protein interactome based on interolog method. Sci Rep 2017; 7:17206. [PMID: 29222529 PMCID: PMC5722940 DOI: 10.1038/s41598-017-17633-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Accepted: 11/28/2017] [Indexed: 12/20/2022] Open
Abstract
Cassava is a starchy root crop whose role in food security becomes more significant nowadays. Together with the industrial uses for versatile purposes, demand for cassava starch is continuously growing. However, in-depth study to uncover the mystery of cellular regulation, especially the interaction between proteins, is lacking. To reduce the knowledge gap in protein-protein interaction (PPI), genome-scale PPI network of cassava was constructed using interolog-based method (MePPI-In, available at http://bml.sbi.kmutt.ac.th/ppi). The network was constructed from the information of seven template plants. The MePPI-In included 90,173 interactions from 7,209 proteins. At least, 39 percent of the total predictions were found with supports from gene/protein expression data, while further co-expression analysis yielded 16 highly promising PPIs. In addition, domain-domain interaction information was employed to increase reliability of the network and guide the search for more groups of promising PPIs. Moreover, the topology and functional content of MePPI-In was similar to the networks of Arabidopsis and rice. The potential contribution of MePPI-In for various applications, such as protein-complex formation and prediction of protein function, was discussed and exemplified. The insights provided by our MePPI-In would hopefully enable us to pursue precise trait improvement in cassava.
Collapse
|
23
|
Li ZW, You ZH, Chen X, Li LP, Huang DS, Yan GY, Nie R, Huang YA. Accurate prediction of protein-protein interactions by integrating potential evolutionary information embedded in PSSM profile and discriminative vector machine classifier. Oncotarget 2017; 8:23638-23649. [PMID: 28423569 PMCID: PMC5410333 DOI: 10.18632/oncotarget.15564] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Accepted: 01/11/2017] [Indexed: 11/25/2022] Open
Abstract
Identification of protein-protein interactions (PPIs) is of critical importance for deciphering the underlying mechanisms of almost all biological processes of cell and providing great insight into the study of human disease. Although much effort has been devoted to identifying PPIs from various organisms, existing high-throughput biological techniques are time-consuming, expensive, and have high false positive and negative results. Thus it is highly urgent to develop in silico methods to predict PPIs efficiently and accurately in this post genomic era. In this article, we report a novel computational model combining our newly developed discriminative vector machine classifier (DVM) and an improved Weber local descriptor (IWLD) for the prediction of PPIs. Two components, differential excitation and orientation, are exploited to build evolutionary features for each protein sequence. The main characteristics of the proposed method lies in introducing an effective feature descriptor IWLD which can capture highly discriminative evolutionary information from position-specific scoring matrixes (PSSM) of protein data, and employing the powerful and robust DVM classifier. When applying the proposed method to Yeast and H. pylori data sets, we obtained excellent prediction accuracies as high as 96.52% and 91.80%, respectively, which are significantly better than the previous methods. Extensive experiments were then performed for predicting cross-species PPIs and the predictive results were also pretty promising. To further validate the performance of the proposed method, we compared it with the state-of-the-art support vector machine (SVM) classifier on Human data set. The experimental results obtained indicate that our method is highly effective for PPIs prediction and can be taken as a supplementary tool for future proteomics research.
Collapse
Affiliation(s)
- Zheng-Wei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Li-Ping Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - De-Shuang Huang
- School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Gui-Ying Yan
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| | - Ru Nie
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Yu-An Huang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| |
Collapse
|
24
|
An JY, Zhang L, Zhou Y, Zhao YJ, Wang DF. Computational methods using weighed-extreme learning machine to predict protein self-interactions with protein evolutionary information. J Cheminform 2017; 9:47. [PMID: 29086182 PMCID: PMC5561767 DOI: 10.1186/s13321-017-0233-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 08/05/2017] [Indexed: 02/07/2023] Open
Abstract
Self-interactions Proteins (SIPs) is important for their biological activity owing to the inherent interaction amongst their secondary structures or domains. However, due to the limitations of experimental Self-interactions detection, one major challenge in the study of prediction SIPs is how to exploit computational approaches for SIPs detection based on evolutionary information contained protein sequence. In the work, we presented a novel computational approach named WELM-LAG, which combined the Weighed-Extreme Learning Machine (WELM) classifier with Local Average Group (LAG) to predict SIPs based on protein sequence. The major improvement of our method lies in presenting an effective feature extraction method used to represent candidate Self-interactions proteins by exploring the evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix (PSSM); and then employing a reliable and robust WELM classifier to carry out classification. In addition, the Principal Component Analysis (PCA) approach is used to reduce the impact of noise. The WELM-LAG method gave very high average accuracies of 92.94 and 96.74% on yeast and human datasets, respectively. Meanwhile, we compared it with the state-of-the-art support vector machine (SVM) classifier and other existing methods on human and yeast datasets, respectively. Comparative results indicated that our approach is very promising and may provide a cost-effective alternative for predicting SIPs. In addition, we developed a freely available web server called WELM-LAG-SIPs to predict SIPs. The web server is available at http://219.219.62.123:8888/WELMLAG/ .
Collapse
Affiliation(s)
- Ji-Yong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116 Jiangsu China
| | - Lei Zhang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116 Jiangsu China
| | - Yong Zhou
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116 Jiangsu China
| | - Yu-Jun Zhao
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116 Jiangsu China
| | - Da-Fu Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116 Jiangsu China
| |
Collapse
|
25
|
An JY, You ZH, Chen X, Huang DS, Li ZW, Liu G, Wang Y. Identification of self-interacting proteins by exploring evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix. Oncotarget 2016; 7:82440-82449. [PMID: 27732957 PMCID: PMC5347703 DOI: 10.18632/oncotarget.12517] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 09/28/2016] [Indexed: 01/31/2023] Open
Abstract
Self-interacting Proteins (SIPs) play an essential role in a wide range of biological processes, such as gene expression regulation, signal transduction, enzyme activation and immune response. Because of the limitations for experimental self-interaction proteins identification, developing an effective computational method based on protein sequence to detect SIPs is much important. In the study, we proposed a novel computational approach called RVMBIGP that combines the Relevance Vector Machine (RVM) model and Bi-gram probability (BIGP) to predict SIPs based on protein sequence. The proposed prediction model includes as following steps: (1) an effective feature extraction method named BIGP is used to represent protein sequences on Position Specific Scoring Matrix (PSSM); (2) Principal Component Analysis (PCA) method is employed for integrating the useful information and reducing the influence of noise; (3) the robust classifier Relevance Vector Machine (RVM) is used to carry out classification. When performed on yeast and human datasets, the proposed RVMBIGP model can achieve very high accuracies of 95.48% and 98.80%, respectively. The experimental results show that our proposed method is very promising and may provide a cost-effective alternative for SIPs identification. In addition, to facilitate extensive studies for future proteomics research, the RVMBIGP server is freely available for academic use at http://219.219.62.123:8888/RVMBIGP.
Collapse
Affiliation(s)
- Ji-Yong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - Xing Chen
- School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - De-Shuang Huang
- School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Zheng-Wei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China
| | - Gang Liu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| | - Yin Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China
| |
Collapse
|