1
|
Deng Q, Zhang J, Liu J, Liu Y, Dai Z, Zou X, Li Z. Identifying Protein Phosphorylation Site-Disease Associations Based on Multi-Similarity Fusion and Negative Sample Selection by Convolutional Neural Network. Interdiscip Sci 2024:10.1007/s12539-024-00615-0. [PMID: 38457108 DOI: 10.1007/s12539-024-00615-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 01/26/2024] [Accepted: 01/29/2024] [Indexed: 03/09/2024]
Abstract
As one of the most important post-translational modifications (PTMs), protein phosphorylation plays a key role in a variety of biological processes. Many studies have shown that protein phosphorylation is associated with various human diseases. Therefore, identifying protein phosphorylation site-disease associations can help to elucidate the pathogenesis of disease and discover new drug targets. Networks of sequence similarity and Gaussian interaction profile kernel similarity were constructed for phosphorylation sites, as well as networks of disease semantic similarity, disease symptom similarity and Gaussian interaction profile kernel similarity were constructed for diseases. To effectively combine different phosphorylation sites and disease similarity information, random walk with restart algorithm was used to obtain the topology information of the network. Then, the diffusion component analysis method was utilized to obtain the comprehensive phosphorylation site similarity and disease similarity. Meanwhile, the reliable negative samples were screened based on the Euclidean distance method. Finally, a convolutional neural network (CNN) model was constructed to identify potential associations between phosphorylation sites and diseases. Based on tenfold cross-validation, the evaluation indicators were obtained including accuracy of 93.48%, specificity of 96.82%, sensitivity of 90.15%, precision of 96.62%, Matthew's correlation coefficient of 0.8719, area under the receiver operating characteristic curve of 0.9786 and area under the precision-recall curve of 0.9836. Additionally, most of the top 20 predicted disease-related phosphorylation sites (19/20 for Alzheimer's disease; 20/16 for neuroblastoma) were verified by literatures and databases. These results show that the proposed method has an outstanding prediction performance and a high practical value.
Collapse
Affiliation(s)
- Qian Deng
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China
| | - Jing Zhang
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China
| | - Jie Liu
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China
| | - Yuqi Liu
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China
| | - Zong Dai
- School of Biomedical Engineering, Sun Yat-Sen University, Guangzhou, 510275, China
| | - Xiaoyong Zou
- School of Chemistry, Sun Yat-Sen University, Guangzhou, 510275, China.
| | - Zhanchao Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China.
| |
Collapse
|
2
|
Watson J, Schwartz JM, Francavilla C. Using Multilayer Heterogeneous Networks to Infer Functions of Phosphorylated Sites. J Proteome Res 2021; 20:3532-3548. [PMID: 34164982 PMCID: PMC8256419 DOI: 10.1021/acs.jproteome.1c00150] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Indexed: 01/23/2023]
Abstract
Mass spectrometry-based quantitative phosphoproteomics has become an essential approach in the study of cellular processes such as signaling. Commonly used methods to analyze phosphoproteomics datasets depend on generic, gene-centric annotations such as Gene Ontology terms, which do not account for the function of a protein in a particular phosphorylation state. Analysis of phosphoproteomics data is hampered by a lack of phosphorylated site-specific annotations. We propose a method that combines shotgun phosphoproteomics data, protein-protein interactions, and functional annotations into a heterogeneous multilayer network. Phosphorylation sites are associated to potential functions using a random walk on the heterogeneous network (RWHN) algorithm. We validated our approach against a model of the MAPK/ERK pathway and functional annotations from PhosphoSitePlus and were able to associate differentially regulated sites on the same proteins to their previously described specific functions. We further tested the algorithm on three previously published datasets and were able to reproduce their experimentally validated conclusions and to associate phosphorylation sites with known functions based on their regulatory patterns. Our approach provides a refinement of commonly used analysis methods and accurately predicts context-specific functions for sites with similar phosphorylation profiles.
Collapse
Affiliation(s)
- Joanne Watson
- Division
of Evolution & Genomic Sciences, School of Biological Sciences,
Faculty of Biology, Medicine & Health, University of Manchester, Manchester M13 9PT, U.K.
- Division
of Molecular and Cellular Function, School of Biological Sciences,
Faculty of Biology, Medicine & Health, University of Manchester, Manchester M13 9PT, U.K.
| | - Jean-Marc Schwartz
- Division
of Evolution & Genomic Sciences, School of Biological Sciences,
Faculty of Biology, Medicine & Health, University of Manchester, Manchester M13 9PT, U.K.
| | - Chiara Francavilla
- Division
of Molecular and Cellular Function, School of Biological Sciences,
Faculty of Biology, Medicine & Health, University of Manchester, Manchester M13 9PT, U.K.
| |
Collapse
|
3
|
Lee B, Zhang S, Poleksic A, Xie L. Heterogeneous Multi-Layered Network Model for Omics Data Integration and Analysis. Front Genet 2020; 10:1381. [PMID: 32063919 PMCID: PMC6997577 DOI: 10.3389/fgene.2019.01381] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Accepted: 12/18/2019] [Indexed: 01/08/2023] Open
Abstract
Advances in next-generation sequencing and high-throughput techniques have enabled the generation of vast amounts of diverse omics data. These big data provide an unprecedented opportunity in biology, but impose great challenges in data integration, data mining, and knowledge discovery due to the complexity, heterogeneity, dynamics, uncertainty, and high-dimensionality inherited in the omics data. Network has been widely used to represent relations between entities in biological system, such as protein-protein interaction, gene regulation, and brain connectivity (i.e. network construction) as well as to infer novel relations given a reconstructed network (aka link prediction). Particularly, heterogeneous multi-layered network (HMLN) has proven successful in integrating diverse biological data for the representation of the hierarchy of biological system. The HMLN provides unparalleled opportunities but imposes new computational challenges on establishing causal genotype-phenotype associations and understanding environmental impact on organisms. In this review, we focus on the recent advances in developing novel computational methods for the inference of novel biological relations from the HMLN. We first discuss the properties of biological HMLN. Then we survey four categories of state-of-the-art methods (matrix factorization, random walk, knowledge graph, and deep learning). Thirdly, we demonstrate their applications to omics data integration and analysis. Finally, we outline strategies for future directions in the development of new HMLN models.
Collapse
Affiliation(s)
- Bohyun Lee
- Ph.D. Program in Computer Science, The City University of New York, New York, NY, United States
| | - Shuo Zhang
- Ph.D. Program in Computer Science, The City University of New York, New York, NY, United States
| | - Aleksandar Poleksic
- Department of Computer Science, The University of Northern Iowa, Cedar Falls, IA, United States
| | - Lei Xie
- Ph.D. Program in Computer Science, The City University of New York, New York, NY, United States
- Ph.D. Program in Biochemistry and Biology, The City University of New York, New York, NY, United States
- Department of Computer Science, Hunter College, The City University of New York, New York, NY, United States
- Helen and Robert Appel Alzheimer’s Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, Ithaca, NY, United States
| |
Collapse
|
4
|
Tang Y, Chen K, Wu X, Wei Z, Zhang SY, Song B, Zhang SW, Huang Y, Meng J. DRUM: Inference of Disease-Associated m 6A RNA Methylation Sites From a Multi-Layer Heterogeneous Network. Front Genet 2019; 10:266. [PMID: 31001320 PMCID: PMC6456716 DOI: 10.3389/fgene.2019.00266] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 03/11/2019] [Indexed: 01/27/2023] Open
Abstract
Recent studies have revealed that the RNA N 6-methyladenosine (m6A) modification plays a critical role in a variety of biological processes and associated with multiple diseases including cancers. Till this day, transcriptome-wide m6A RNA methylation sites have been identified by high-throughput sequencing technique combined with computational methods, and the information is publicly available in a few bioinformatics databases; however, the association between individual m6A sites and various diseases are still largely unknown. There are yet computational approaches developed for investigating potential association between individual m6A sites and diseases, which represents a major challenge in the epitranscriptome analysis. Thus, to infer the disease-related m6A sites, we implemented a novel multi-layer heterogeneous network-based approach, which incorporates the associations among diseases, genes and m6A RNA methylation sites from gene expression, RNA methylation and disease similarities data with the Random Walk with Restart (RWR) algorithm. To evaluate the performance of the proposed approach, a ten-fold cross validation is performed, in which our approach achieved a reasonable good performance (overall AUC: 0.827, average AUC 0.867), higher than a hypergeometric test-based approach (overall AUC: 0.7333 and average AUC: 0.723) and a random predictor (overall AUC: 0.550 and average AUC: 0.486). Additionally, we show that a number of predicted cancer-associated m6A sites are supported by existing literatures, suggesting that the proposed approach can effectively uncover the underlying epitranscriptome circuits of disease mechanisms. An online database DRUM, which stands for disease-associated ribonucleic acid methylation, was built to support the query of disease-associated RNA m6A methylation sites, and is freely available at: www.xjtlu.edu.cn/biologicalsciences/drum.
Collapse
Affiliation(s)
- Yujiao Tang
- Department of Biological Sciences, Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, China
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Kunqi Chen
- Department of Biological Sciences, Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, China
- Institute of & Chronic Disease, University of Liverpool, Liverpool, United Kingdom
| | - Xiangyu Wu
- Department of Biological Sciences, Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, China
- Institute of & Chronic Disease, University of Liverpool, Liverpool, United Kingdom
| | - Zhen Wei
- Department of Biological Sciences, Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, China
- Institute of & Chronic Disease, University of Liverpool, Liverpool, United Kingdom
| | - Song-Yao Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, China
| | - Bowen Song
- Department of Biological Sciences, Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, China
- Institute of & Chronic Disease, University of Liverpool, Liverpool, United Kingdom
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, China
| | - Yufei Huang
- Department of Epidemiology and Biostatistics, University of Texas Health San Antonio, San Antonio, TX, United States
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, United States
| | - Jia Meng
- Department of Biological Sciences, Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, China
- Institute of & Chronic Disease, University of Liverpool, Liverpool, United Kingdom
| |
Collapse
|
5
|
Liu Y, Wang M, Xi J, Luo F, Li A. PTM-ssMP: A Web Server for Predicting Different Types of Post-translational Modification Sites Using Novel Site-specific Modification Profile. Int J Biol Sci 2018; 14:946-956. [PMID: 29989096 PMCID: PMC6036757 DOI: 10.7150/ijbs.24121] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Accepted: 01/24/2018] [Indexed: 12/26/2022] Open
Abstract
Protein post-translational modifications (PTMs) are chemical modifications of a protein after its translation. Owing to its play an important role in deep understanding of various biological processes and the development of effective drugs, PTM site prediction have become a hot topic in bioinformatics. Recently, many online tools are developed to prediction various types of PTM sites, most of which are based on local sequence and some biological information. However, few of existing tools consider the relations between different PTMs for their prediction task. Here, we develop a web server called PTM-ssMP to predict PTM site, which adopts site-specific modification profile (ssMP) to efficiently extract and encode the information of both proximal PTMs and local sequence simultaneously. In PTM-ssMP we provide efficient prediction of multiple types of PTM site including phosphorylation, lysine acetylation, ubiquitination, sumoylation, methylation, O-GalNAc, O-GlcNAc, sulfation and proteolytic cleavage. To assess the performance of PTM-ssMP, a large number of experimentally verified PTM sites are collected from several sources and used to train and test the prediction models. Our results suggest that ssMP consistently contributes to remarkable improvement of prediction performance. In addition, results of independent tests demonstrate that PTM-ssMP compares favorably with other existing tools for different PTM types. PTM-ssMP is implemented as an online web server with user-friendly interface, which is freely available at http://bioinformatics.ustc.edu.cn/PTM-ssMP/index/.
Collapse
Affiliation(s)
- Yu Liu
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH230027, China
| | - Jianing Xi
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China
| | - Fenglin Luo
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH230027, China
| |
Collapse
|
6
|
A novel heterogeneous network-based method for drug response prediction in cancer cell lines. Sci Rep 2018; 8:3355. [PMID: 29463808 PMCID: PMC5820329 DOI: 10.1038/s41598-018-21622-4] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 02/06/2018] [Indexed: 02/01/2023] Open
Abstract
An enduring challenge in personalized medicine lies in selecting a suitable drug for each individual patient. Here we concentrate on predicting drug responses based on a cohort of genomic, chemical structure, and target information. Therefore, a recently study such as GDSC has provided an unprecedented opportunity to infer the potential relationships between cell line and drug. While existing approach rely primarily on regression, classification or multiple kernel learning to predict drug responses. Synthetic approach indicates drug target and protein-protein interaction could have the potential to improve the prediction performance of drug response. In this study, we propose a novel heterogeneous network-based method, named as HNMDRP, to accurately predict cell line-drug associations through incorporating heterogeneity relationship among cell line, drug and target. Compared to previous study, HNMDRP can make good use of above heterogeneous information to predict drug responses. The validity of our method is verified not only by plotting the ROC curve, but also by predicting novel cell line-drug sensitive associations which have dependable literature evidences. This allows us possibly to suggest potential sensitive associations among cell lines and drugs. Matlab and R codes of HNMDRP can be found at following https://github.com/USTC-HIlab/HNMDRP.
Collapse
|
7
|
Wang M, Wang T, Li A. ksrMKL: a novel method for identification of kinase-substrate relationships using multiple kernel learning. PeerJ 2017; 5:e4182. [PMID: 29340231 PMCID: PMC5741978 DOI: 10.7717/peerj.4182] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 12/01/2017] [Indexed: 01/24/2023] Open
Abstract
Phosphorylation exerts a crucial role in multiple biological cellular processes which is catalyzed by protein kinases and closely related to many diseases. Identification of kinase-substrate relationships is important for understanding phosphorylation and provides a fundamental basis for further disease-related research and drug design. In this study, we develop a novel computational method to identify kinase-substrate relationships based on multiple kernel learning. The comparative analysis is based on a 10-fold cross-validation process and the dataset collected from the Phospho.ELM database. The results show that ksrMKL is greatly improved in various measures when compared with the single kernel support vector machine. Furthermore, with an independent test dataset extracted from the PhosphoSitePlus database, we compare ksrMKL with two existing kinase-substrate relationship prediction tools, namely iGPS and PKIS. The experimental results show that ksrMKL has better prediction performance than these existing tools.
Collapse
Affiliation(s)
- Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei, China.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, China
| | - Tao Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei, China.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, China
| |
Collapse
|
8
|
A Novel Phosphorylation Site-Kinase Network-Based Method for the Accurate Prediction of Kinase-Substrate Relationships. BIOMED RESEARCH INTERNATIONAL 2017; 2017:1826496. [PMID: 29312990 PMCID: PMC5660750 DOI: 10.1155/2017/1826496] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Revised: 08/14/2017] [Accepted: 09/05/2017] [Indexed: 01/06/2023]
Abstract
Protein phosphorylation is catalyzed by kinases which regulate many aspects that control death, movement, and cell growth. Identification of the phosphorylation site-specific kinase-substrate relationships (ssKSRs) is important for understanding cellular dynamics and provides a fundamental basis for further disease-related research and drug design. Although several computational methods have been developed, most of these methods mainly use local sequence of phosphorylation sites and protein-protein interactions (PPIs) to construct the prediction model. While phosphorylation presents very complicated processes and is usually involved in various biological mechanisms, the aforementioned information is not sufficient for accurate prediction. In this study, we propose a new and powerful computational approach named KSRPred for ssKSRs prediction, by introducing a novel phosphorylation site-kinase network (pSKN) profiles that can efficiently incorporate the relationships between various protein kinases and phosphorylation sites. The experimental results show that the pSKN profiles can efficiently improve the prediction performance in collaboration with local sequence and PPI information. Furthermore, we compare our method with the existing ssKSRs prediction tools and the results demonstrate that KSRPred can significantly improve the prediction performance compared with existing tools.
Collapse
|
9
|
Sun D, Wang M, Li A. MPTM: A tool for mining protein post-translational modifications from literature. J Bioinform Comput Biol 2017; 15:1740005. [PMID: 28982288 DOI: 10.1142/s0219720017400054] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Due to the importance of post-translational modifications (PTMs) in human health and diseases, PTMs are regularly reported in the biomedical literature. However, the continuing and rapid pace of expansion of this literature brings a huge challenge for researchers and database curators. Therefore, there is a pressing need to aid them in identifying relevant PTM information more efficiently by using a text mining system. So far, only a few web servers are available for mining information of a very limited number of PTMs, which are based on simple pattern matching or pre-defined rules. In our work, in order to help researchers and database curators easily find and retrieve PTM information from available text, we have developed a text mining tool called MPTM, which extracts and organizes valuable knowledge about 11 common PTMs from abstracts in PubMed by using relations extracted from dependency parse trees and a heuristic algorithm. It is the first web server that provides literature mining service for hydroxylation, myristoylation and GPI-anchor. The tool is also used to find new publications on PTMs from PubMed and uncovers potential PTM information by large-scale text analysis. MPTM analyzes text sentences to identify protein names including substrates and protein-interacting enzymes, and automatically associates them with the UniProtKB protein entry. To facilitate further investigation, it also retrieves PTM-related information, such as human diseases, Gene Ontology terms and organisms from the input text and related databases. In addition, an online database (MPTMDB) with extracted PTM information and a local MPTM Lite package are provided on the MPTM website. MPTM is freely available online at http://bioinformatics.ustc.edu.cn/mptm/ and the source codes are hosted on GitHub: https://github.com/USTC-HILAB/MPTM .
Collapse
Affiliation(s)
- Dongdong Sun
- 1 School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, P. R. China
| | - Minghui Wang
- 1 School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, P. R. China
| | - Ao Li
- 1 School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, P. R. China
| |
Collapse
|
10
|
Wang B, Wang M, Li A. Prediction of post-translational modification sites using multiple kernel support vector machine. PeerJ 2017; 5:e3261. [PMID: 28462053 PMCID: PMC5410141 DOI: 10.7717/peerj.3261] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Accepted: 04/01/2017] [Indexed: 01/12/2023] Open
Abstract
Protein post-translational modification (PTM) is an important mechanism that is involved in the regulation of protein function. Considering the high-cost and labor-intensive of experimental identification, many computational prediction methods are currently available for the prediction of PTM sites by using protein local sequence information in the context of conserved motif. Here we proposed a novel computational method by using the combination of multiple kernel support vector machines (SVM) for predicting PTM sites including phosphorylation, O-linked glycosylation, acetylation, sulfation and nitration. To largely make use of local sequence information and site-modification relationships, we developed a local sequence kernel and Gaussian interaction profile kernel, respectively. Multiple kernels were further combined to train SVM for efficiently leveraging kernel information to boost predictive performance. We compared the proposed method with existing PTM prediction methods. The experimental results revealed that the proposed method performed comparable or better performance than the existing prediction methods, suggesting the feasibility of the developed kernels and the usefulness of the proposed method in PTM sites prediction.
Collapse
Affiliation(s)
- BingHua Wang
- University of Science and Technology of China, School of Information Science and Technology, Hefei, China
| | - Minghui Wang
- University of Science and Technology of China, School of Information Science and Technology, Hefei, China
- University of Science and Technology of China, Centers for Biomedical Engineering, Hefei, China
| | - Ao Li
- University of Science and Technology of China, School of Information Science and Technology, Hefei, China
- University of Science and Technology of China, Centers for Biomedical Engineering, Hefei, China
| |
Collapse
|