1
|
Zhang Y, Yao L, Chung CR, Huang Y, Li S, Zhang W, Pang Y, Lee TY. KinPred-RNA-kinase activity inference and cancer type classification using machine learning on RNA-seq data. iScience 2024; 27:109333. [PMID: 38523792 PMCID: PMC10959666 DOI: 10.1016/j.isci.2024.109333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 12/07/2023] [Accepted: 02/21/2024] [Indexed: 03/26/2024] Open
Abstract
Kinases as important enzymes can transfer phosphate groups from high-energy and phosphate-donating molecules to specific substrates and play essential roles in various cellular processes. Existing algorithms for kinase activity from phosphorylated proteomics data are often costly, requiring valuable samples. Moreover, methods to extract kinase activities from bulk RNA sequencing data remain undeveloped. In this study, we propose a computational framework KinPred-RNA to derive kinase activities from bulk RNA-sequencing data in cancer samples. KinPred-RNA framework, using the extreme gradient boosting (XGBoost) regression model, outperforms random forest regression, multiple linear regression, and support vector machine regression models in predicting kinase activities from cancer-related RNA sequencing data. Efficient gene signatures from the LINCS-L1000 dataset were used as inputs for KinPred-RNA. The results highlight its potential to be related to biological function. In conclusion, KinPred RNA constitutes a significant advance in cancer research by potentially facilitating the identification of cancer.
Collapse
Affiliation(s)
- Yuntian Zhang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Lantian Yao
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, Taoyuan 320953, Taiwan
| | - Yixian Huang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Shangfu Li
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Wenyang Zhang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Yuxuan Pang
- Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, Japan
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDSB), National Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan
| |
Collapse
|
2
|
Guan J, Yao L, Chung CR, Chiang YC, Lee TY. StackTHPred: Identifying Tumor-Homing Peptides through GBDT-Based Feature Selection with Stacking Ensemble Architecture. Int J Mol Sci 2023; 24:10348. [PMID: 37373494 DOI: 10.3390/ijms241210348] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 05/31/2023] [Accepted: 06/02/2023] [Indexed: 06/29/2023] Open
Abstract
One of the major challenges in cancer therapy lies in the limited targeting specificity exhibited by existing anti-cancer drugs. Tumor-homing peptides (THPs) have emerged as a promising solution to this issue, due to their capability to specifically bind to and accumulate in tumor tissues while minimally impacting healthy tissues. THPs are short oligopeptides that offer a superior biological safety profile, with minimal antigenicity, and faster incorporation rates into target cells/tissues. However, identifying THPs experimentally, using methods such as phage display or in vivo screening, is a complex, time-consuming task, hence the need for computational methods. In this study, we proposed StackTHPred, a novel machine learning-based framework that predicts THPs using optimal features and a stacking architecture. With an effective feature selection algorithm and three tree-based machine learning algorithms, StackTHPred has demonstrated advanced performance, surpassing existing THP prediction methods. It achieved an accuracy of 0.915 and a 0.831 Matthews Correlation Coefficient (MCC) score on the main dataset, and an accuracy of 0.883 and a 0.767 MCC score on the small dataset. StackTHPred also offers favorable interpretability, enabling researchers to better understand the intrinsic characteristics of THPs. Overall, StackTHPred is beneficial for both the exploration and identification of THPs and facilitates the development of innovative cancer therapies.
Collapse
Affiliation(s)
- Jiahui Guan
- School of Medicine, The Chinese University of Hong Kong (Shenzhen) 2001 Longxiang Road, Shenzhen 518172, China
| | - Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Chia-Ru Chung
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Ying-Chih Chiang
- School of Medicine, The Chinese University of Hong Kong (Shenzhen) 2001 Longxiang Road, Shenzhen 518172, China
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| |
Collapse
|
3
|
Comprehensive identification of protein orthologs in the family Ascoviridae facilitates an understanding of phylogenomics, protein conservation, and phosphorylation. Arch Virol 2022; 167:1075-1087. [PMID: 35246734 DOI: 10.1007/s00705-022-05402-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 01/18/2022] [Indexed: 11/02/2022]
Abstract
Analysis of orthology is important for understanding protein conservation, function, and phylogenomics. In this study, we performed a comprehensive analysis of gene orthology in the family Ascoviridae based on identification of 366 protein homologue groups and phylogenetic analysis of 34 non-single-copy proteins. Our findings revealed 90 newly annotated proteins, five newly identified core proteins for the family Ascoviridae, and 14 core proteins for the genus Ascovirus. A phylogenomic tree of 11 Ascoviridae members was constructed based on a concatenation of 35 of the 45 ortholog groups. In combination with phosphoproteomic results and conservation estimations, 30 conserved phosphorylation sites on 17 phosphoproteins were identified from a total of 176 phosphosites on 57 phosphoproteins from Heliothis virescens ascovirus 3h (HvAV-3h), providing potential research targets for investigating the role of these protein in the regulation of viral infection. This study will facilitate genome annotation and comparison of further Ascoviridae members as well as functional genomic investigations.
Collapse
|
4
|
Hawkins LM, Naumov AV, Batra M, Wang C, Chaput D, Suvorova ES. Novel CRK-Cyclin Complex Controls Spindle Assembly Checkpoint in Toxoplasma Endodyogeny. mBio 2021; 13:e0356121. [PMID: 35130726 PMCID: PMC8822342 DOI: 10.1128/mbio.03561-21] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 01/18/2022] [Indexed: 12/21/2022] Open
Abstract
Opportunistic parasites of the Apicomplexa phylum use a variety of division modes built on two types of cell cycles that incorporate two distinctive mechanisms of mitosis: uncoupled from and coupled to parasite budding. Parasites have evolved novel factors to regulate such unique replication mechanisms that are poorly understood. Here, we have combined genetics, quantitative fluorescence microscopy, and global proteomics approaches to examine endodyogeny in Toxoplasma gondii dividing by mitosis coupled to cytokinesis. In the current study, we focus on the steps controlled by the recently described atypical Cdk-related kinase T. gondii Crk6 (TgCrk6). While inspecting protein complexes, we found that this previously orphaned TgCrk6 kinase interacts with a parasite-specific atypical cyclin, TgCyc1. We built conditional expression models and examined primary cell cycle defects caused by the lack of TgCrk6 or TgCyc1. Quantitative microscopy assays revealed that tachyzoites deficient in either TgCrk6 or the cyclin partner TgCyc1 exhibit identical mitotic defects, suggesting cooperative action of the complex components. Further examination of the mitotic structures indicated that the TgCrk6/TgCyc1 complex regulates metaphase. This novel finding confirms a functional spindle assembly checkpoint (SAC) in T. gondii. Measuring global changes in protein expression and phosphorylation, we found evidence that canonical activities of the Toxoplasma SAC are intertwined with parasite-specific tasks. Analysis of phosphorylation motifs suggests that Toxoplasma metaphase is regulated by CDK, mitogen-activated kinase (MAPK), and Aurora kinases, while the TgCrk6/TgCyc1 complex specifically controls the centromere-associated network. IMPORTANCE The rate of Toxoplasma tachyzoite division directly correlates with the severity of the disease, toxoplasmosis, which affects humans and animals. Thus, a better understanding of the tachyzoite cell cycle would offer much-needed efficient tools to control the acute stage of infection. Although tachyzoites divide by binary division, the cell cycle architecture and regulation differ significantly from the conventional binary fission of their host cells. Unlike the unidirectional conventional cell cycle, the Toxoplasma budding cycle is braided and is regulated by multiple essential Cdk-related kinases (Crks) that emerged in the place of missing conventional cell cycle regulators. How these novel Crks control apicomplexan cell cycles is largely unknown. Here, we have discovered a novel parasite-specific complex, TgCrk6/TgCyc1, that orchestrates a major mitotic event, the spindle assembly checkpoint. We demonstrated that tachyzoites incorporated parasite-specific tasks in the canonical checkpoint functions.
Collapse
Affiliation(s)
- Lauren M. Hawkins
- Division of Infectious Diseases, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, Florida, USA
| | - Anatoli V. Naumov
- Division of Infectious Diseases, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, Florida, USA
| | - Mrinalini Batra
- Division of Infectious Diseases, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, Florida, USA
| | - Changqi Wang
- College of Public Health, University of South Florida, Tampa, Florida, USA
| | - Dale Chaput
- Proteomics Core, College of Arts and Sciences, University of South Florida, Tampa, Florida, USA
| | - Elena S. Suvorova
- Division of Infectious Diseases, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, Florida, USA
| |
Collapse
|
5
|
Dakal TC. SARS-CoV-2 attachment to host cells is possibly mediated via RGD-integrin interaction in a calcium-dependent manner and suggests pulmonary EDTA chelation therapy as a novel treatment for COVID 19. Immunobiology 2021; 226:152021. [PMID: 33232865 PMCID: PMC7642744 DOI: 10.1016/j.imbio.2020.152021] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 10/18/2020] [Indexed: 12/15/2022]
Abstract
SARS-CoV-2 is a highly contagious virus that has caused serious health crisis world-wide resulting into a pandemic situation. As per the literature, the SARS-CoV-2 is known to exploit humanACE2 receptors (similar toprevious SARS-CoV-1) for gaining entry into the host cell for invasion, infection, multiplication and pathogenesis. However, considering the higher infectivity of SARS-CoV-2 along with the complex etiology and pathophysiological outcomes seen in COVID-19 patients, it seems that there may be an alternate receptor for SARS-CoV-2. I performed comparative protein sequence analysis, database based gene expression profiling, bioinformatics based molecular docking using authentic tools and techniques for unveiling the molecular basis of high infectivity of SARS-CoV-2 as compared to previous known coronaviruses. My study revealed that SARS-CoV-2 (previously known as 2019-nCoV) harbors a RGD motif in its receptor binding domain (RBD) and the motif is absent in all other previously known SARS-CoVs. The RGD motif is well known for its role in cell-attachment and cell-adhesion. My hypothesis is that the SARS-CoV-2 may be (via RGD) exploiting integrins, that have high expression in lungs and all other vital organs, for invading host cells. However, an experimental verification is required. The expression of ACE2, which is a known receptor for SARS-CoV-2, was found to be negligible in lungs. I assume that higher infectivity of SARS-CoV-2 could be due to this RGD-integrin mediated acquired cell-adhesive property. Gene expression profiling revealed that expression of integrins is significantly high in lung cells, in particular αvβ6, α5β1, αvβ8 and an ECM protein, ICAM1. The molecular docking experiment showed the RBD of spike protein binds with integrins precisely at RGD motif in a similar manner as a synthetic RGD peptide binds to integrins as found by other researchers. SARS-CoV-2 spike protein has a number of phosphorylation sites that can induce cAMP, PKC, Tyr signaling pathways. These pathways either activate calcium ion channels or get activated by calcium. In fact, integrins have calcium & metal binding sites that were predicted around and in vicinity of RGD-integrin docking site in our analysis which suggests that RGD-integrins interaction possibly occurs in calcium-dependent manner. The higher expression of integrins in lungs along with their previously known high binding affinity (~KD = 4.0 nM) for virus RGD motif could serve as a possible explanation for high infectivity of SARS-CoV-2. On the contrary, human ACE2 has lower expression in lungs and its high binding affinity (~KD = 15 nM) for spike RBD alone could not manifest significant virus-host attachment. This suggests that besides human ACE2, an additional or alternate receptor for SARS-CoV-2 is likely to exist. A highly relevant evidence never reported earlier which corroborate in favor of RGD-integrins mediated virus-host attachment is an unleashed cytokine storm which causes due to activation of TNF-α and IL-6 activation; and integrins role in their activation is also well established. Altogether, the current study has highlighted possible role of calcium and other divalent ions in RGD-integrins interaction for virus invasion into host cells and suggested that lowering divalent ion in lungs could avert virus-host cells attachment.
Collapse
Affiliation(s)
- Tikam Chand Dakal
- Genome and Computational Biology Lab, Department of Biotechnology, Mohanlal Sukhadia University, Udaipur 313001, Rajasthan, India.
| |
Collapse
|
6
|
Huang KY, Lee TY, Kao HJ, Ma CT, Lee CC, Lin TH, Chang WC, Huang HD. dbPTM in 2019: exploring disease association and cross-talk of post-translational modifications. Nucleic Acids Res 2020; 47:D298-D308. [PMID: 30418626 PMCID: PMC6323979 DOI: 10.1093/nar/gky1074] [Citation(s) in RCA: 138] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 10/19/2018] [Indexed: 12/25/2022] Open
Abstract
The dbPTM (http://dbPTM.mbc.nctu.edu.tw/) has been maintained for over 10 years with the aim to provide functional and structural analyses for post-translational modifications (PTMs). In this update, dbPTM not only integrates more experimentally validated PTMs from available databases and through manual curation of literature but also provides PTM-disease associations based on non-synonymous single nucleotide polymorphisms (nsSNPs). The high-throughput deep sequencing technology has led to a surge in the data generated through analysis of association between SNPs and diseases, both in terms of growth amount and scope. This update thus integrated disease-associated nsSNPs from dbSNP based on genome-wide association studies. The PTM substrate sites located at a specified distance in terms of the amino acids encoded from nsSNPs were deemed to have an association with the involved diseases. In recent years, increasing evidence for crosstalk between PTMs has been reported. Although mass spectrometry-based proteomics has substantially improved our knowledge about substrate site specificity of single PTMs, the fact that the crosstalk of combinatorial PTMs may act in concert with the regulation of protein function and activity is neglected. Because of the relatively limited information about concurrent frequency and functional relevance of PTM crosstalk, in this update, the PTM sites neighboring other PTM sites in a specified window length were subjected to motif discovery and functional enrichment analysis. This update highlights the current challenges in PTM crosstalk investigation and breaks the bottleneck of how proteomics may contribute to understanding PTM codes, revealing the next level of data complexity and proteomic limitation in prospective PTM research.
Collapse
Affiliation(s)
- Kai-Yao Huang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Life and Health Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Life and Health Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Hui-Ju Kao
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
| | - Chen-Tse Ma
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
| | - Chao-Chun Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
| | - Tsai-Hsuan Lin
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
| | - Wen-Chi Chang
- Institute of Tropical Plant Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| | - Hsien-Da Huang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Life and Health Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| |
Collapse
|
7
|
Hervás M, Ciordia S, Navajas R, García JA, Martínez-Turiño S. Common and Strain-Specific Post-Translational Modifications of the Potyvirus Plum pox virus Coat Protein in Different Hosts. Viruses 2020; 12:E308. [PMID: 32178365 PMCID: PMC7150786 DOI: 10.3390/v12030308] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Revised: 03/05/2020] [Accepted: 03/09/2020] [Indexed: 01/04/2023] Open
Abstract
Phosphorylation and O-GlcNAcylation are widespread post-translational modifications (PTMs), often sharing protein targets. Numerous studies have reported the phosphorylation of plant viral proteins. In plants, research on O-GlcNAcylation lags behind that of other eukaryotes, and information about O-GlcNAcylated plant viral proteins is extremely scarce. The potyvirus Plum pox virus (PPV) causes sharka disease in Prunus trees and also infects a wide range of experimental hosts. Capsid protein (CP) from virions of PPV-R isolate purified from herbaceous plants can be extensively modified by O-GlcNAcylation and phosphorylation. In this study, a combination of proteomics and biochemical approaches was employed to broaden knowledge of PPV CP PTMs. CP proved to be modified regardless of whether or not it was assembled into mature particles. PTMs of CP occurred in the natural host Prunus persica, similarly to what happens in herbaceous plants. Additionally, we observed that O-GlcNAcylation and phosphorylation were general features of different PPV strains, suggesting that these modifications contribute to general strategies deployed during plant-virus interactions. Interestingly, phosphorylation at a casein kinase II motif conserved among potyviral CPs exhibited strain specificity in PPV; however, it did not display the critical role attributed to the same modification in the CP of another potyvirus, Potato virus A.
Collapse
Affiliation(s)
- Marta Hervás
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología (CNB-CSIC), Campus Universidad Autónoma de Madrid, 28049 Madrid, Spain;
| | - Sergio Ciordia
- Proteomics Unit, Centro Nacional de Biotecnología (CNB-CSIC), ProteoRed ISCIII, 28049 Madrid, Spain; (S.C.); (R.N.)
| | - Rosana Navajas
- Proteomics Unit, Centro Nacional de Biotecnología (CNB-CSIC), ProteoRed ISCIII, 28049 Madrid, Spain; (S.C.); (R.N.)
| | - Juan Antonio García
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología (CNB-CSIC), Campus Universidad Autónoma de Madrid, 28049 Madrid, Spain;
| | - Sandra Martínez-Turiño
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología (CNB-CSIC), Campus Universidad Autónoma de Madrid, 28049 Madrid, Spain;
| |
Collapse
|
8
|
Huang KY, Hsu JBK, Lee TY. Characterization and Identification of Lysine Succinylation Sites based on Deep Learning Method. Sci Rep 2019; 9:16175. [PMID: 31700141 PMCID: PMC6838336 DOI: 10.1038/s41598-019-52552-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 10/18/2019] [Indexed: 12/14/2022] Open
Abstract
Succinylation is a type of protein post-translational modification (PTM), which can play important roles in a variety of cellular processes. Due to an increasing number of site-specific succinylated peptides obtained from high-throughput mass spectrometry (MS), various tools have been developed for computationally identifying succinylated sites on proteins. However, most of these tools predict succinylation sites based on traditional machine learning methods. Hence, this work aimed to carry out the succinylation site prediction based on a deep learning model. The abundance of MS-verified succinylated peptides enabled the investigation of substrate site specificity of succinylation sites through sequence-based attributes, such as position-specific amino acid composition, the composition of k-spaced amino acid pairs (CKSAAP), and position-specific scoring matrix (PSSM). Additionally, the maximal dependence decomposition (MDD) was adopted to detect the substrate signatures of lysine succinylation sites by dividing all succinylated sequences into several groups with conserved substrate motifs. According to the results of ten-fold cross-validation, the deep learning model trained using PSSM and informative CKSAAP attributes can reach the best predictive performance and also perform better than traditional machine-learning methods. Moreover, an independent testing dataset that truly did not exist in the training dataset was used to compare the proposed method with six existing prediction tools. The testing dataset comprised of 218 positive and 2621 negative instances, and the proposed model could yield a promising performance with 84.40% sensitivity, 86.99% specificity, 86.79% accuracy, and an MCC value of 0.489. Finally, the proposed method has been implemented as a web-based prediction tool (CNN-SuccSite), which is now freely accessible at http://csb.cse.yzu.edu.tw/CNN-SuccSite/.
Collapse
Affiliation(s)
- Kai-Yao Huang
- Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu city, 300, Taiwan
| | - Justin Bo-Kai Hsu
- Department of Medical Research, Taipei Medical University Hospital, Taipei city, 110, Taiwan
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 518172, China. .,School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, 518172, China.
| |
Collapse
|
9
|
Huang KY, Kao HJ, Hsu JBK, Weng SL, Lee TY. Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites. BMC Bioinformatics 2019; 19:384. [PMID: 30717647 PMCID: PMC7394328 DOI: 10.1186/s12859-018-2394-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 09/25/2018] [Indexed: 01/06/2023] Open
Abstract
Background Glutarylation, the addition of a glutaryl group (five carbons) to a lysine residue of a protein molecule, is an important post-translational modification and plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified glutarylated peptides increases, it becomes imperative to investigate substrate motifs to enhance the study of protein glutarylation. We carried out a bioinformatics investigation of glutarylation sites based on amino acid composition using a public database containing information on 430 non-homologous glutarylation sites. Results The TwoSampleLogo analysis indicates that positively charged and polar amino acids surrounding glutarylated sites may be associated with the specificity in substrate site of protein glutarylation. Additionally, the chi-squared test was utilized to explore the intrinsic interdependence between two positions around glutarylation sites. Further, maximal dependence decomposition (MDD), which consists of partitioning a large-scale dataset into subgroups with statistically significant amino acid conservation, was used to capture motif signatures of glutarylation sites. We considered single features, such as amino acid composition (AAC), amino acid pair composition (AAPC), and composition of k-spaced amino acid pairs (CKSAAP), as well as the effectiveness of incorporating MDD-identified substrate motifs into an integrated prediction model. Evaluation by five-fold cross-validation showed that AAC was most effective in discriminating between glutarylation and non-glutarylation sites, according to support vector machine (SVM). Conclusions The SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.677, a specificity of 0.619, an accuracy of 0.638, and a Matthews Correlation Coefficient (MCC) value of 0.28. Using an independent testing dataset (46 glutarylated and 92 non-glutarylated sites) obtained from the literature, we demonstrated that the integrated SVM model could improve the predictive performance effectively, yielding a balanced sensitivity and specificity of 0.652 and 0.739, respectively. This integrated SVM model has been implemented as a web-based system (MDDGlutar), which is now freely available at http://csb.cse.yzu.edu.tw/MDDGlutar/. Electronic supplementary material The online version of this article (10.1186/s12859-018-2394-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kai-Yao Huang
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 518172, China.,Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 518172, China
| | - Hui-Ju Kao
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 518172, China.,Department of Computer Science and Engineering, Yuan Ze University, Taoyuan city, 320, Taiwan
| | - Justin Bo-Kai Hsu
- Department of Medical Research, Taipei Medical University Hospital, Taipei city, 110, Taiwan
| | - Shun-Long Weng
- Department of Medicine, Mackay Medical College, New Taipei City, 252, Taiwan.,Mackay Medicine, Nursing and Management College, Taipei, 112, Taiwan.,Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsin-Chu, 300, Taiwan
| | - Tzong-Yi Lee
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 518172, China. .,Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 518172, China.
| |
Collapse
|
10
|
Ziegler CM, Eisenhauer P, Manuelyan I, Weir ME, Bruce EA, Ballif BA, Botten J. Host-Driven Phosphorylation Appears to Regulate the Budding Activity of the Lassa Virus Matrix Protein. Pathogens 2018; 7:pathogens7040097. [PMID: 30544850 PMCID: PMC6313517 DOI: 10.3390/pathogens7040097] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 12/06/2018] [Accepted: 12/06/2018] [Indexed: 12/17/2022] Open
Abstract
Lassa mammarenavirus (LASV) is an enveloped RNA virus that can cause Lassa fever, an acute hemorrhagic fever syndrome associated with significant morbidity and high rates of fatality in endemic regions of western Africa. The arenavirus matrix protein Z has several functions during the virus life cycle, including coordinating viral assembly, driving the release of new virus particles, regulating viral polymerase activity, and antagonizing the host antiviral response. There is limited knowledge regarding how the various functions of Z are regulated. To investigate possible means of regulation, mass spectrometry was used to identify potential sites of phosphorylation in the LASV Z protein. This analysis revealed that two serines (S18, S98) and one tyrosine (Y97) are phosphorylated in the flexible N- and C-terminal regions of the protein. Notably, two of these sites, Y97 and S98, are located in (Y97) or directly adjacent to (S98) the PPXY late domain, an important motif for virus release. Studies with non-phosphorylatable and phosphomimetic Z proteins revealed that these sites are important regulators of the release of LASV particles and that host-driven, reversible phosphorylation may play an important role in the regulation of LASV Z protein function.
Collapse
Affiliation(s)
- Christopher M Ziegler
- Department of Medicine, Division of Immunobiology, University of Vermont, Burlington, VT 05405, USA.
| | - Philip Eisenhauer
- Department of Medicine, Division of Immunobiology, University of Vermont, Burlington, VT 05405, USA.
| | - Inessa Manuelyan
- Department of Medicine, Division of Immunobiology, University of Vermont, Burlington, VT 05405, USA.
- Cellular, Molecular and Biomedical Sciences Graduate Program, University of Vermont, Burlington, VT 05405, USA.
| | - Marion E Weir
- Department of Biology, University of Vermont, Burlington, VT 05405, USA.
| | - Emily A Bruce
- Department of Medicine, Division of Immunobiology, University of Vermont, Burlington, VT 05405, USA.
| | - Bryan A Ballif
- Department of Biology, University of Vermont, Burlington, VT 05405, USA.
| | - Jason Botten
- Department of Medicine, Division of Immunobiology, University of Vermont, Burlington, VT 05405, USA.
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT 05405, USA.
| |
Collapse
|
11
|
Kao HJ, Weng SL, Huang KY, Kaunang FJ, Hsu JBK, Huang CH, Lee TY. MDD-carb: a combinatorial model for the identification of protein carbonylation sites with substrate motifs. BMC SYSTEMS BIOLOGY 2017; 11:137. [PMID: 29322938 PMCID: PMC5763492 DOI: 10.1186/s12918-017-0511-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Background Carbonylation, which takes place through oxidation of reactive oxygen species (ROS) on specific residues, is an irreversibly oxidative modification of proteins. It has been reported that the carbonylation is related to a number of metabolic or aging diseases including diabetes, chronic lung disease, Parkinson’s disease, and Alzheimer’s disease. Due to the lack of computational methods dedicated to exploring motif signatures of protein carbonylation sites, we were motivated to exploit an iterative statistical method to characterize and identify carbonylated sites with motif signatures. Results By manually curating experimental data from research articles, we obtained 332, 144, 135, and 140 verified substrate sites for K (lysine), R (arginine), T (threonine), and P (proline) residues, respectively, from 241 carbonylated proteins. In order to examine the informative attributes for classifying between carbonylated and non-carbonylated sites, multifarious features including composition of twenty amino acids (AAC), composition of amino acid pairs (AAPC), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM) were investigated in this study. Additionally, in an attempt to explore the motif signatures of carbonylation sites, an iterative statistical method was adopted to detect statistically significant dependencies of amino acid compositions between specific positions around substrate sites. Profile hidden Markov model (HMM) was then utilized to train a predictive model from each motif signature. Moreover, based on the method of support vector machine (SVM), we adopted it to construct an integrative model by combining the values of bit scores obtained from profile HMMs. The combinatorial model could provide an enhanced performance with evenly predictive sensitivity and specificity in the evaluation of cross-validation and independent testing. Conclusion This study provides a new scheme for exploring potential motif signatures at substrate sites of protein carbonylation. The usefulness of the revealed motifs in the identification of carbonylated sites is demonstrated by their effective performance in cross-validation and independent testing. Finally, these substrate motifs were adopted to build an available online resource (MDD-Carb, http://csb.cse.yzu.edu.tw/MDDCarb/) and are also anticipated to facilitate the study of large-scale carbonylated proteomes. Electronic supplementary material The online version of this article (10.1186/s12918-017-0511-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hui-Ju Kao
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, city, 320, Taiwan
| | - Shun-Long Weng
- Department of Medicine, Mackay Medical College, New Taipei City, 252, Taiwan.,Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsinchu, city, 300, Taiwan.,Mackay Junior College of Medicine, Nursing and Management, Taipei, city, 112, Taiwan
| | - Kai-Yao Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, city, 320, Taiwan.,Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu, city, 300, Taiwan
| | - Fergie Joanda Kaunang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, city, 320, Taiwan
| | - Justin Bo-Kai Hsu
- Department of Medical Research, Taipei Medical University Hospital, Taipei, city, 110, Taiwan
| | - Chien-Hsun Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, city, 320, Taiwan. .,Tao-Yuan Hospital, Ministry of Health & Welfare, Taoyuan, 320, Taiwan.
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, city, 320, Taiwan. .,Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, 320, Taiwan.
| |
Collapse
|
12
|
Weng SL, Kao HJ, Huang CH, Lee TY. MDD-Palm: Identification of protein S-palmitoylation sites with substrate motifs based on maximal dependence decomposition. PLoS One 2017; 12:e0179529. [PMID: 28662047 PMCID: PMC5491019 DOI: 10.1371/journal.pone.0179529] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 05/31/2017] [Indexed: 12/14/2022] Open
Abstract
S-palmitoylation, the covalent attachment of 16-carbon palmitic acids to a cysteine residue via a thioester linkage, is an important reversible lipid modification that plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified S-palmitoylated peptides increases, it is imperative to investigate substrate motifs to facilitate the study of protein S-palmitoylation. Based on 710 non-homologous S-palmitoylation sites obtained from published databases and the literature, we carried out a bioinformatics investigation of S-palmitoylation sites based on amino acid composition. Two Sample Logo indicates that positively charged and polar amino acids surrounding S-palmitoylated sites may be associated with the substrate site specificity of protein S-palmitoylation. Additionally, maximal dependence decomposition (MDD) was applied to explore the motif signatures of S-palmitoylation sites by categorizing a large-scale dataset into subgroups with statistically significant conservation of amino acids. Single features such as amino acid composition (AAC), amino acid pair composition (AAPC), position specific scoring matrix (PSSM), position weight matrix (PWM), amino acid substitution matrix (BLOSUM62), and accessible surface area (ASA) were considered, along with the effectiveness of incorporating MDD-identified substrate motifs into a two-layered prediction model. Evaluation by five-fold cross-validation showed that a hybrid of AAC and PSSM performs best at discriminating between S-palmitoylation and non-S-palmitoylation sites, according to the support vector machine (SVM). The two-layered SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.79, specificity of 0.80, accuracy of 0.80, and Matthews Correlation Coefficient (MCC) value of 0.45. Using an independent testing dataset (613 S-palmitoylated and 5412 non-S-palmitoylated sites) obtained from the literature, we demonstrated that the two-layered SVM model could outperform other prediction tools, yielding a balanced sensitivity and specificity of 0.690 and 0.694, respectively. This two-layered SVM model has been implemented as a web-based system (MDD-Palm), which is now freely available at http://csb.cse.yzu.edu.tw/MDDPalm/.
Collapse
Affiliation(s)
- Shun-Long Weng
- Department of Medicine, Mackay Medical College, New Taipei City, Taiwan
- Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsinchu city, Taiwan
- Mackay Junior College of Medicine, Nursing and Management, Taipei, Taiwan
| | - Hui-Ju Kao
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Chien-Hsun Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan
- Tao-Yuan Hospital, Ministry of Health & Welfare, Taoyuan, Taiwan
- * E-mail: (TYL); (CHH)
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan
- Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, Taiwan
- * E-mail: (TYL); (CHH)
| |
Collapse
|
13
|
Weng SL, Huang KY, Kaunang FJ, Huang CH, Kao HJ, Chang TH, Wang HY, Lu JJ, Lee TY. Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features. BMC Bioinformatics 2017; 18:66. [PMID: 28361707 PMCID: PMC5374553 DOI: 10.1186/s12859-017-1472-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Protein carbonylation, an irreversible and non-enzymatic post-translational modification (PTM), is often used as a marker of oxidative stress. When reactive oxygen species (ROS) oxidized the amino acid side chains, carbonyl (CO) groups are produced especially on Lysine (K), Arginine (R), Threonine (T), and Proline (P). Nevertheless, due to the lack of information about the carbonylated substrate specificity, we were encouraged to develop a systematic method for a comprehensive investigation of protein carbonylation sites. RESULTS After the removal of redundant data from multipe carbonylation-related articles, totally 226 carbonylated proteins in human are regarded as training dataset, which consisted of 307, 126, 128, and 129 carbonylation sites for K, R, T and P residues, respectively. To identify the useful features in predicting carbonylation sites, the linear amino acid sequence was adopted not only to build up the predictive model from training dataset, but also to compare the effectiveness of prediction with other types of features including amino acid composition (AAC), amino acid pair composition (AAPC), position-specific scoring matrix (PSSM), positional weighted matrix (PWM), solvent-accessible surface area (ASA), and physicochemical properties. The investigation of position-specific amino acid composition revealed that the positively charged amino acids (K and R) are remarkably enriched surrounding the carbonylated sites, which may play a functional role in discriminating between carbonylation and non-carbonylation sites. A variety of predictive models were built using various features and three different machine learning methods. Based on the evaluation by five-fold cross-validation, the models trained with PWM feature could provide better sensitivity in the positive training dataset, while the models trained with AAindex feature achieved higher specificity in the negative training dataset. Additionally, the model trained using hybrid features, including PWM, AAC and AAindex, obtained best MCC values of 0.432, 0.472, 0.443 and 0.467 on K, R, T and P residues, respectively. CONCLUSION When comparing to an existing prediction tool, the selected models trained with hybrid features provided a promising accuracy on an independent testing dataset. In short, this work not only characterized the carbonylated substrate preference, but also demonstrated that the proposed method could provide a feasible means for accelerating preliminary discovery of protein carbonylation.
Collapse
Affiliation(s)
- Shun-Long Weng
- Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsin-Chu, 300, Taiwan.,Mackay Medicine, Nursing and Management College, Taipei, 112, Taiwan.,Department of Medicine, Mackay Medical College, New Taipei City, 252, Taiwan
| | - Kai-Yao Huang
- Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsin-Chu, 300, Taiwan.,Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan
| | - Fergie Joanda Kaunang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan
| | - Chien-Hsun Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan.,Tao-Yuan Hospital, Ministry of Health & Welfare, Taoyuan, 320, Taiwan
| | - Hui-Ju Kao
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan
| | - Tzu-Hao Chang
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, 110, Taiwan
| | - Hsin-Yao Wang
- Department of Laboratory Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan, 333, Taiwan
| | - Jang-Jih Lu
- Department of Laboratory Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan, 333, Taiwan. .,Department of Medical Biotechnology and Laboratory Science, Chang Gung University, Taoyuan, 333, Taiwan.
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, 320, Taiwan.
| |
Collapse
|
14
|
Nguyen VN, Huang KY, Huang CH, Lai KR, Lee TY. A New Scheme to Characterize and Identify Protein Ubiquitination Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:393-403. [PMID: 26887002 DOI: 10.1109/tcbb.2016.2520939] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Protein ubiquitination, involving the conjugation of ubiquitin on lysine residue, serves as an important modulator of many cellular functions in eukaryotes. Recent advancements in proteomic technology have stimulated increasing interest in identifying ubiquitination sites. However, most computational tools for predicting ubiquitination sites are focused on small-scale data. With an increasing number of experimentally verified ubiquitination sites, we were motivated to design a predictive model for identifying lysine ubiquitination sites for large-scale proteome dataset. This work assessed not only single features, such as amino acid composition (AAC), amino acid pair composition (AAPC) and evolutionary information, but also the effectiveness of incorporating two or more features into a hybrid approach to model construction. The support vector machine (SVM) was applied to generate the prediction models for ubiquitination site identification. Evaluation by five-fold cross-validation showed that the SVM models learned from the combination of hybrid features delivered a better prediction performance. Additionally, a motif discovery tool, MDDLogo, was adopted to characterize the potential substrate motifs of ubiquitination sites. The SVM models integrating the MDDLogo-identified substrate motifs could yield an average accuracy of 68.70 percent. Furthermore, the independent testing result showed that the MDDLogo-clustered SVM models could provide a promising accuracy (78.50 percent) and perform better than other prediction tools. Two cases have demonstrated the effective prediction of ubiquitination sites with corresponding substrate motifs.
Collapse
|
15
|
Lateef Z, Gimenez G, Baker ES, Ward VK. Transcriptomic analysis of human norovirus NS1-2 protein highlights a multifunctional role in murine monocytes. BMC Genomics 2017; 18:39. [PMID: 28056773 PMCID: PMC5217272 DOI: 10.1186/s12864-016-3417-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 12/12/2016] [Indexed: 12/22/2022] Open
Abstract
Background The GII.4 Sydney 2012 strain of human norovirus (HuNoV) is a pandemic strain that is responsible for the majority of norovirus outbreaks in healthcare settings. The function of the non-structural (NS)1-2 protein from HuNoV is unknown. Results In silico analysis of human norovirus NS1-2 protein showed that it shares features with the murine NS1-2 protein, including a disordered region, a transmembrane domain and H-box and NC sequence motifs. The proteins also contain caspase cleavage and phosphorylation sites, indicating that processing and phosphorylation may be a conserved feature of norovirus NS1-2 proteins. In this study, RNA transcripts of human and murine norovirus full-length and the disordered region of NS1-2 were transfected into monocytes, and next generation sequencing was used to analyse the transcriptomic profile of cells expressing virus proteins. The profiles were then compared to the transcriptomic profile of MNV-infected cells. Conclusions RNAseq analysis showed that NS1-2 proteins from human and murine noroviruses affect multiple immune systems (chemokine, cytokine, and Toll-like receptor signaling) and intracellular pathways (NFκB, MAPK, PI3K-Akt signaling) in murine monocytes. Comparison to the transcriptomic profile of MNV-infected cells indicated the pathways that NS1-2 may affect during norovirus infection. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3417-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zabeen Lateef
- Department of Microbiology and Immunology, Otago School of Medical Sciences, University of Otago, 720 Cumberland St, Dunedin, 9054, New Zealand.
| | - Gregory Gimenez
- Otago Genomics and Bioinformatics Facility, University of Otago, Dunedin, 9054, New Zealand
| | - Estelle S Baker
- Department of Microbiology and Immunology, Otago School of Medical Sciences, University of Otago, 720 Cumberland St, Dunedin, 9054, New Zealand
| | - Vernon K Ward
- Department of Microbiology and Immunology, Otago School of Medical Sciences, University of Otago, 720 Cumberland St, Dunedin, 9054, New Zealand
| |
Collapse
|
16
|
Nguyen VN, Huang KY, Weng JTY, Lai KR, Lee TY. UbiNet: an online resource for exploring the functional associations and regulatory networks of protein ubiquitylation. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw054. [PMID: 27114492 PMCID: PMC4843525 DOI: 10.1093/database/baw054] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Accepted: 03/20/2016] [Indexed: 12/19/2022]
Abstract
Protein ubiquitylation catalyzed by E3 ubiquitin ligases are crucial in the regulation of many cellular processes. Owing to the high throughput of mass spectrometry-based proteomics, a number of methods have been developed for the experimental determination of ubiquitylation sites, leading to a large collection of ubiquitylation data. However, there exist no resources for the exploration of E3-ligase-associated regulatory networks of for ubiquitylated proteins in humans. Therefore, the UbiNet database was developed to provide a full investigation of protein ubiquitylation networks by incorporating experimentally verified E3 ligases, ubiquitylated substrates and protein-protein interactions (PPIs). To date, UbiNet has accumulated 43 948 experimentally verified ubiquitylation sites from 14 692 ubiquitylated proteins of humans. Additionally, we have manually curated 499 E3 ligases as well as two E1 activating and 46 E2 conjugating enzymes. To delineate the regulatory networks among E3 ligases and ubiquitylated proteins, a total of 430 530 PPIs were integrated into UbiNet for the exploration of ubiquitylation networks with an interactive network viewer. A case study demonstrated that UbiNet was able to decipher a scheme for the ubiquitylation of tumor proteins p63 and p73 that is consistent with their functions. Although the essential role of Mdm2 in p53 regulation is well studied, UbiNet revealed that Mdm2 and additional E3 ligases might be implicated in the regulation of other tumor proteins by protein ubiquitylation. Moreover, UbiNet could identify potential substrates for a specific E3 ligase based on PPIs and substrate motifs. With limited knowledge about the mechanisms through which ubiquitylated proteins are regulated by E3 ligases, UbiNet offers users an effective means for conducting preliminary analyses of protein ubiquitylation. The UbiNet database is now freely accessible via http://csb.cse.yzu.edu.tw/UbiNet/ The content is regularly updated with the literature and newly released data.Database URL: http://csb.cse.yzu.edu.tw/UbiNet/.
Collapse
Affiliation(s)
- Van-Nui Nguyen
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan University of Information and Communication Technology, Thai Nguyen University, Vietnam and
| | - Kai-Yao Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan
| | - Julia Tzu-Ya Weng
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan Innovation Center for Big Data and Digital Convergence, Yuan Ze University, 320, Taiwan
| | - K Robert Lai
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan Innovation Center for Big Data and Digital Convergence, Yuan Ze University, 320, Taiwan
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan Innovation Center for Big Data and Digital Convergence, Yuan Ze University, 320, Taiwan
| |
Collapse
|
17
|
Huang CH, Su MG, Kao HJ, Jhong JH, Weng SL, Lee TY. UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines. BMC SYSTEMS BIOLOGY 2016; 10 Suppl 1:6. [PMID: 26818456 PMCID: PMC4895383 DOI: 10.1186/s12918-015-0246-z] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Background The conjugation of ubiquitin to a substrate protein (protein ubiquitylation), which involves a sequential process – E1 activation, E2 conjugation and E3 ligation, is crucial to the regulation of protein function and activity in eukaryotes. This ubiquitin-conjugation process typically binds the last amino acid of ubiquitin (glycine 76) to a lysine residue of a target protein. The high-throughput of mass spectrometry-based proteomics has stimulated a large-scale identification of ubiquitin-conjugated peptides. Hence, a new web resource, UbiSite, was developed to identify ubiquitin-conjugation site on lysines based on large-scale proteome dataset. Results Given a total of 37,647 ubiquitin-conjugated proteins, including 128026 ubiquitylated peptides, obtained from various resources, this study carries out a large-scale investigation on ubiquitin-conjugation sites based on sequenced and structural characteristics. A TwoSampleLogo reveals that a significant depletion of histidine (H), arginine (R) and cysteine (C) residues around ubiquitylation sites may impact the conjugation of ubiquitins in closed three-dimensional environments. Based on the large-scale ubiquitylation dataset, a motif discovery tool, MDDLogo, has been adopted to characterize the potential substrate motifs for ubiquitin conjugation. Not only are single features such as amino acid composition (AAC), positional weighted matrix (PWM), position-specific scoring matrix (PSSM) and solvent-accessible surface area (SASA) considered, but also the effectiveness of incorporating MDDLogo-identified substrate motifs into a two-layered prediction model is taken into account. Evaluation by five-fold cross-validation showed that PSSM is the best feature in discriminating between ubiquitylation and non-ubiquitylation sites, based on support vector machine (SVM). Additionally, the two-layered SVM model integrating MDDLogo-identified substrate motifs could obtain a promising accuracy and the Matthews Correlation Coefficient (MCC) at 81.06 % and 0.586, respectively. Furthermore, the independent testing showed that the two-layered SVM model could outperform other prediction tools, reaching at 85.10 % sensitivity, 69.69 % specificity, 73.69 % accuracy and the 0.483 of MCC value. Conclusion The independent testing result indicated the effectiveness of incorporating MDDLogo-identified motifs into the prediction of ubiquitylation sites. In order to provide meaningful assistance to researchers interested in large-scale ubiquitinome data, the two-layered SVM model has been implemented onto a web-based system (UbiSite), which is freely available at http://csb.cse.yzu.edu.tw/UbiSite/. Two cases given in the UbiSite provide a demonstration of effective identification of ubiquitylation sites with reference to substrate motifs. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0246-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Chien-Hsun Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Ministry of Health & Welfare, Tao-Yuan Hospital, Taoyuan, 320, Taiwan.
| | - Min-Gang Su
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Hui-Ju Kao
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Jhih-Hua Jhong
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Shun-Long Weng
- Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsin-Chu, 300, Taiwan. .,Mackay Junior College of Medicine, Nursing and Management , Taipei, 112, Taiwan. .,Department of Medicine, Mackay Medical College, New Taipei City, 252, Taiwan.
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, 320, Taiwan.
| |
Collapse
|
18
|
Huang KY, Weng JTY, Lee TY, Weng SL. A new scheme to discover functional associations and regulatory networks of E3 ubiquitin ligases. BMC SYSTEMS BIOLOGY 2016; 10 Suppl 1:3. [PMID: 26818115 PMCID: PMC4895279 DOI: 10.1186/s12918-015-0244-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Background Protein ubiquitination catalyzed by E3 ubiquitin ligases play important modulatory roles in various biological processes. With the emergence of high-throughput mass spectrometry technology, the proteomics research community embraced the development of numerous experimental methods for the determination of ubiquitination sites. The result is an accumulation of ubiquitinome data, coupled with a lack of available resources for investigating the regulatory networks among E3 ligases and ubiquitinated proteins. In this study, by integrating existing ubiquitinome data, experimentally validated E3 ligases and established protein-protein interactions, we have devised a strategy to construct a comprehensive map of protein ubiquitination networks. Results In total, 41,392 experimentally verified ubiquitination sites from 12,786 ubiquitinated proteins of humans have been obtained for this study. Additional 494 E3 ligases along with 1220 functional annotations and 28588 protein domains were manually curated. To characterize the regulatory networks among E3 ligases and ubiquitinated proteins, a well-established network viewer was utilized for the exploration of ubiquitination networks from 40892 protein-protein interactions. The effectiveness of the proposed approach was demonstrated in a case study examining E3 ligases involved in the ubiquitination of tumor suppressor p53. In addition to Mdm2, a known regulator of p53, the investigation also revealed other potential E3 ligases that may participate in the ubiquitination of p53. Conclusion Aside from the ability to facilitate comprehensive investigations of protein ubiquitination networks, by integrating information regarding protein-protein interactions and substrate specificities, the proposed method could discover potential E3 ligases for ubiquitinated proteins. Our strategy presents an efficient means for the preliminary screen of ubiquitination networks and overcomes the challenge as a result of limited knowledge about E3 ligase-regulated ubiquitination. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0244-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kai-Yao Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Julia Tzu-Ya Weng
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Shun-Long Weng
- Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsin-Chu, 300, Taiwan. .,Mackay Junior College of Medicine, Nursing and Management, Taipei, 112, Taiwan. .,Department of Medicine, Mackay Medical College, New Taipei City, 252, Taiwan.
| |
Collapse
|
19
|
Huang KY, Su MG, Kao HJ, Hsieh YC, Jhong JH, Cheng KH, Huang HD, Lee TY. dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins. Nucleic Acids Res 2015; 44:D435-46. [PMID: 26578568 PMCID: PMC4702878 DOI: 10.1093/nar/gkv1240] [Citation(s) in RCA: 131] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 11/02/2015] [Indexed: 01/23/2023] Open
Abstract
Owing to the importance of the post-translational modifications (PTMs) of proteins in regulating biological processes, the dbPTM (http://dbPTM.mbc.nctu.edu.tw/) was developed as a comprehensive database of experimentally verified PTMs from several databases with annotations of potential PTMs for all UniProtKB protein entries. For this 10th anniversary of dbPTM, the updated resource provides not only a comprehensive dataset of experimentally verified PTMs, supported by the literature, but also an integrative interface for accessing all available databases and tools that are associated with PTM analysis. As well as collecting experimental PTM data from 14 public databases, this update manually curates over 12 000 modified peptides, including the emerging S-nitrosylation, S-glutathionylation and succinylation, from approximately 500 research articles, which were retrieved by text mining. As the number of available PTM prediction methods increases, this work compiles a non-homologous benchmark dataset to evaluate the predictive power of online PTM prediction tools. An increasing interest in the structural investigation of PTM substrate sites motivated the mapping of all experimental PTM peptides to protein entries of Protein Data Bank (PDB) based on database identifier and sequence identity, which enables users to examine spatially neighboring amino acids, solvent-accessible surface area and side-chain orientations for PTM substrate sites on tertiary structures. Since drug binding in PDB is annotated, this update identified over 1100 PTM sites that are associated with drug binding. The update also integrates metabolic pathways and protein-protein interactions to support the PTM network analysis for a group of proteins. Finally, the web interface is redesigned and enhanced to facilitate access to this resource.
Collapse
Affiliation(s)
- Kai-Yao Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Min-Gang Su
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Hui-Ju Kao
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Yun-Chung Hsieh
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Jhih-Hua Jhong
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Kuang-Hao Cheng
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Hsien-Da Huang
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan 320, Taiwan
| |
Collapse
|
20
|
Phosphorylation of Single Stranded RNA Virus Proteins and Potential for Novel Therapeutic Strategies. Viruses 2015; 7:5257-73. [PMID: 26473910 PMCID: PMC4632380 DOI: 10.3390/v7102872] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2015] [Revised: 09/23/2015] [Accepted: 09/29/2015] [Indexed: 12/31/2022] Open
Abstract
Post translational modification of proteins is a critical requirement that regulates function. Among the diverse kinds of protein post translational modifications, phosphorylation plays essential roles in protein folding, protein:protein interactions, signal transduction, intracellular localization, transcription regulation, cell cycle progression, survival and apoptosis. Protein phosphorylation is also essential for many intracellular pathogens to establish a productive infection cycle. Preservation of protein phosphorylation moieties in pathogens in a manner that mirrors the host components underscores the co-evolutionary trajectory of pathogens and hosts, and sheds light on how successful pathogens have usurped, either in part or as a whole, the host enzymatic machinery. Phosphorylation of viral proteins for many acute RNA viruses including Flaviviruses and Alphaviruses has been demonstrated to be critical for protein functionality. This review focuses on phosphorylation modifications that have been documented to occur on viral proteins with emphasis on acutely infectious, single stranded RNA viruses. The review additionally explores the possibility of repurposing Food and Drug Administration (FDA) approved inhibitors as antivirals for the treatment of acute RNA viral infections.
Collapse
|
21
|
Bui VM, Lu CT, Ho TT, Lee TY. MDD-SOH: exploiting maximal dependence decomposition to identify S-sulfenylation sites with substrate motifs. Bioinformatics 2015; 32:165-72. [PMID: 26411868 DOI: 10.1093/bioinformatics/btv558] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Accepted: 09/18/2015] [Indexed: 01/12/2023] Open
Abstract
UNLABELLED S-sulfenylation (S-sulphenylation, or sulfenic acid), the covalent attachment of S-hydroxyl (-SOH) to cysteine thiol, plays a significant role in redox regulation of protein functions. Although sulfenic acid is transient and labile, most of its physiological activities occur under control of S-hydroxylation. Therefore, discriminating the substrate site of S-sulfenylated proteins is an essential task in computational biology for the furtherance of protein structures and functions. Research into S-sulfenylated protein is currently very limited, and no dedicated tools are available for the computational identification of SOH sites. Given a total of 1096 experimentally verified S-sulfenylated proteins from humans, this study carries out a bioinformatics investigation on SOH sites based on amino acid composition and solvent-accessible surface area. A TwoSampleLogo indicates that the positively and negatively charged amino acids flanking the SOH sites may impact the formulation of S-sulfenylation in closed three-dimensional environments. In addition, the substrate motifs of SOH sites are studied using the maximal dependence decomposition (MDD). Based on the concept of binary classification between SOH and non-SOH sites, Support vector machine (SVM) is applied to learn the predictive model from MDD-identified substrate motifs. According to the evaluation results of 5-fold cross-validation, the integrated SVM model learned from substrate motifs yields an average accuracy of 0.87, significantly improving the prediction of SOH sites. Furthermore, the integrated SVM model also effectively improves the predictive performance in an independent testing set. Finally, the integrated SVM model is applied to implement an effective web resource, named MDD-SOH, to identify SOH sites with their corresponding substrate motifs. AVAILABILITY AND IMPLEMENTATION The MDD-SOH is now freely available to all interested users at http://csb.cse.yzu.edu.tw/MDDSOH/. All of the data set used in this work is also available for download in the website. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT francis@saturn.yzu.edu.tw.
Collapse
Affiliation(s)
- Van-Minh Bui
- Department of Computer Science and Engineering and
| | | | - Thi-Trang Ho
- Department of Computer Science and Engineering and
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering and Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan 320, Taiwan
| |
Collapse
|
22
|
Erban T, Harant K, Hubalek M, Vitamvas P, Kamler M, Poltronieri P, Tyl J, Markovic M, Titera D. In-depth proteomic analysis of Varroa destructor: Detection of DWV-complex, ABPV, VdMLV and honeybee proteins in the mite. Sci Rep 2015; 5:13907. [PMID: 26358842 PMCID: PMC4566121 DOI: 10.1038/srep13907] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Accepted: 08/10/2015] [Indexed: 11/09/2022] Open
Abstract
We investigated pathogens in the parasitic honeybee mite Varroa destructor using nanoLC-MS/MS (TripleTOF) and 2D-E-MS/MS proteomics approaches supplemented with affinity-chromatography to concentrate trace target proteins. Peptides were detected from the currently uncharacterized Varroa destructor Macula-like virus (VdMLV), the deformed wing virus (DWV)-complex and the acute bee paralysis virus (ABPV). Peptide alignments revealed detection of complete structural DWV-complex block VP2-VP1-VP3, VDV-1 helicase and single-amino-acid substitution A/K/Q in VP1, the ABPV structural block VP1-VP4-VP2-VP3 including uncleaved VP4/VP2, and VdMLV coat protein. Isoforms of viral structural proteins of highest abundance were localized via 2D-E. The presence of all types of capsid/coat proteins of a particular virus suggested the presence of virions in Varroa. Also, matches between the MWs of viral structural proteins on 2D-E and their theoretical MWs indicated that viruses were not digested. The absence/scarce detection of non-structural proteins compared with high-abundance structural proteins suggest that the viruses did not replicate in the mite; hence, virions accumulate in the Varroa gut via hemolymph feeding. Hemolymph feeding also resulted in the detection of a variety of honeybee proteins. The advantages of MS-based proteomics for pathogen detection, false-positive pathogen detection, virus replication, posttranslational modifications, and the presence of honeybee proteins in Varroa are discussed.
Collapse
Affiliation(s)
| | - Karel Harant
- Laboratory of Mass Spectrometry, Charles University in Prague, Faculty of Science, Prague 2, Czechia
| | - Martin Hubalek
- Institute of Organic Chemistry and Biochemistry, Prague 6, Czechia
| | | | - Martin Kamler
- Bee Research Institute at Dol, Libcice nad Vltavou, Czechia
| | | | - Jan Tyl
- Bee Research Institute at Dol, Libcice nad Vltavou, Czechia
| | | | - Dalibor Titera
- Bee Research Institute at Dol, Libcice nad Vltavou, Czechia
| |
Collapse
|
23
|
Abstract
The Arenaviridae are enveloped, negative-sense RNA viruses with several family members that cause hemorrhagic fevers. This work provides immunofluorescence evidence that, unlike those of New World arenaviruses, the replication and transcription complexes (RTC) of lymphocytic choriomeningitis virus (LCMV) colocalize with eukaryotic initiation factor 4E (eIF4E) and that eIF4E may participate in the translation of LCMV mRNA. Additionally, we identify two residues in the LCMV nucleoprotein (NP) that are conserved in every mammalian arenavirus and are required for recombinant LCMV recovery. One of these sites, Y125, was confirmed to be phosphorylated by using liquid chromatography-tandem mass spectrometry (LC-MS/MS). NP Y125 is located in the N-terminal region of NP that is disordered when RNA is bound. The other site, NP T206, was predicted to be a phosphorylation site. Immunofluorescence analysis demonstrated that NP T206 is required for the formation of the punctate RTC that are typically observed during LCMV infection. A minigenome reporter assay using NP mutants, as well as Northern blot analysis, demonstrated that although NP T206A does not form punctate RTC, it can transcribe and replicate a minigenome. However, in the presence of matrix protein (Z) and glycoprotein (GP), translation of the minigenome message with NP T206A was inhibited, suggesting that punctate RTC formation is required to regulate viral replication. Together, these results highlight a significant difference between New and Old World arenaviruses and demonstrate the importance of RTC formation and translation priming in RTC for Old World arenaviruses. Several members of the Arenaviridae cause hemorrhagic fevers and are classified as category A pathogens. Arenavirus replication-transcription complexes (RTC) are nucleated by the viral nucleoprotein. This study demonstrates that the formation of these complexes is required for virus viability and suggests that RTC nucleation is regulated by the phosphorylation of a single nucleoprotein residue. This work adds to the body of knowledge about how these key viral structures are formed and participate in virus replication. Additionally, the fact that Old World arenavirus complexes colocalize with the eukaryotic initiation factor 4E, while New World arenaviruses do not, is only the second notable difference observed between New and Old World arenaviruses, the first being the difference in the glycoprotein receptor.
Collapse
|
24
|
Chen YJ, Lu CT, Huang KY, Wu HY, Chen YJ, Lee TY. GSHSite: exploiting an iteratively statistical method to identify s-glutathionylation sites with substrate specificity. PLoS One 2015; 10:e0118752. [PMID: 25849935 PMCID: PMC4388702 DOI: 10.1371/journal.pone.0118752] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2014] [Accepted: 01/06/2015] [Indexed: 01/13/2023] Open
Abstract
S-glutathionylation, the covalent attachment of a glutathione (GSH) to the sulfur atom of cysteine, is a selective and reversible protein post-translational modification (PTM) that regulates protein activity, localization, and stability. Despite its implication in the regulation of protein functions and cell signaling, the substrate specificity of cysteine S-glutathionylation remains unknown. Based on a total of 1783 experimentally identified S-glutathionylation sites from mouse macrophages, this work presents an informatics investigation on S-glutathionylation sites including structural factors such as the flanking amino acids composition and the accessible surface area (ASA). TwoSampleLogo presents that positively charged amino acids flanking the S-glutathionylated cysteine may influence the formation of S-glutathionylation in closed three-dimensional environment. A statistical method is further applied to iteratively detect the conserved substrate motifs with statistical significance. Support vector machine (SVM) is then applied to generate predictive model considering the substrate motifs. According to five-fold cross-validation, the SVMs trained with substrate motifs could achieve an enhanced sensitivity, specificity, and accuracy, and provides a promising performance in an independent test set. The effectiveness of the proposed method is demonstrated by the correct identification of previously reported S-glutathionylation sites of mouse thioredoxin (TXN) and human protein tyrosine phosphatase 1b (PTP1B). Finally, the constructed models are adopted to implement an effective web-based tool, named GSHSite (http://csb.cse.yzu.edu.tw/GSHSite/), for identifying uncharacterized GSH substrate sites on the protein sequences.
Collapse
Affiliation(s)
- Yi-Ju Chen
- Institute of Chemistry, Academia Sinica, Taipei, Taiwan
| | - Cheng-Tsung Lu
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Kai-Yao Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Hsin-Yi Wu
- Institute of Chemistry, Academia Sinica, Taipei, Taiwan
| | - Yu-Ju Chen
- Institute of Chemistry, Academia Sinica, Taipei, Taiwan
- * E-mail: (TYL); (YJC)
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan
- Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, Taiwan
- * E-mail: (TYL); (YJC)
| |
Collapse
|
25
|
Huang SY, Shi SP, Qiu JD, Liu MC. Using support vector machines to identify protein phosphorylation sites in viruses. J Mol Graph Model 2015; 56:84-90. [DOI: 10.1016/j.jmgm.2014.12.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2014] [Revised: 12/13/2014] [Accepted: 12/16/2014] [Indexed: 10/24/2022]
|
26
|
Wu HY, Lu CT, Kao HJ, Chen YJ, Chen YJ, Lee TY. Characterization and identification of protein O-GlcNAcylation sites with substrate specificity. BMC Bioinformatics 2014; 15 Suppl 16:S1. [PMID: 25521204 PMCID: PMC4290634 DOI: 10.1186/1471-2105-15-s16-s1] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Background Protein O-GlcNAcylation, involving the attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues. Elucidation of O-GlcNAcylation sites on proteins is required in order to decipher its crucial roles in regulating cellular processes and aid in drug design. With an increasing number of O-GlcNAcylation sites identified by mass spectrometry (MS)-based proteomics, several methods have been proposed for the computational identification of O-GlcNAcylation sites. However, no development that focuses on the investigation of O-GlcNAcylated substrate motifs has existed. Thus, we were motivated to design a new method for the identification of protein O-GlcNAcylation sites with the consideration of substrate site specificity. Results In this study, 375 experimentally verified O-GlcNAcylation sites were collected from dbOGAP, which is an integrated resource for protein O-GlcNAcylation. Due to the difficulty in characterizing the substrate motifs by conventional sequence logo analysis, a recursively statistical method has been applied to obtain significant conserved motifs. To construct the predictive models learned from the identified substrate motifs, we adopted Support Vector Machines (SVMs). A five-fold cross validation was used to evaluate the predictive model, achieving sensitivity, specificity, and accuracy of 0.76, 0.80, and 0.78, respectively. Additionally, an independent testing set, which was really blind to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (0.94) and outperform three other O-GlcNAcylation site prediction tools. Conclusion This work proposed a computational method to identify informative substrate motifs for O-GlcNAcylation sites. The evaluation of cross validation and independent testing indicated that the identified motifs were effective in the identification of O-GlcNAcylation sites. A case study demonstrated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation. We also anticipated that the revealed substrate motif may facilitate the study of extensive crosstalk between O-GlcNAcylation and phosphorylation. This method may help unravel their mechanisms and roles in signaling, transcription, chronic disease, and cancer.
Collapse
|
27
|
Foka P, Dimitriadis A, Kyratzopoulou E, Giannimaras DA, Sarno S, Simos G, Georgopoulou U, Mamalaki A. A complex signaling network involving protein kinase CK2 is required for hepatitis C virus core protein-mediated modulation of the iron-regulatory hepcidin gene expression. Cell Mol Life Sci 2014; 71:4243-58. [PMID: 24718935 PMCID: PMC11114079 DOI: 10.1007/s00018-014-1621-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Revised: 02/25/2014] [Accepted: 03/24/2014] [Indexed: 12/18/2022]
Abstract
Hepatitis C virus (HCV) infection is associated with hepatic iron overload and elevated serum iron that correlate to poor antiviral responses. Hepcidin (HAMP), a 25-aa cysteine-rich liver-specific peptide, controls iron homeostasis. Its expression is up-regulated in inflammation and iron excess. HCV-mediated hepcidin regulation remains controversial. Chronic HCV patients possess relatively low hepcidin levels; however, elevated HAMP mRNA has been reported in HCV core transgenic mice and HCV replicon-expressing cells. We investigated the effect of HCV core protein on HAMP gene expression and delineated the complex interplay of molecular mechanisms involved. HCV core protein up-regulated HAMP promoter activity, mRNA, and secreted protein levels. Enhanced promoter activity was abolished by co-transfections of core with HAMP promoter constructs containing mutated/deleted BMP and STAT binding sites. Dominant negative constructs, pharmacological inhibitors, and silencing experiments against STAT3 and SMAD4 confirmed the participation of both pathways in HAMP gene regulation by core protein. STAT3 and SMAD4 expression levels were found increased in the presence of HCV core, which orchestrated SMAD4 translocation into the nucleus and STAT3 phosphorylation. To further understand the mechanisms governing the core effect, the role of the JAK/STAT-activating kinase CK2 was investigated. A CK2-dominant negative construct, a CK2-specific inhibitor, and RNAi interference abrogated the core-induced increase on HAMP promoter activity, mRNA, and protein levels, while CK2 acted in synergy with core to significantly enhance HAMP gene expression. Therefore, HCV core up-regulates HAMP gene transcription via a complex signaling network that requires both SMAD/BMP and STAT3 pathways and CK2 involvement.
Collapse
Affiliation(s)
- Pelagia Foka
- Laboratory of Molecular Biology and Immunobiotechnology, Department of Biochemistry, Hellenic Pasteur Institute, Athens, Greece
- Laboratory of Molecular Virology, Hellenic Pasteur Institute, Athens, Greece
| | - Alexios Dimitriadis
- Laboratory of Molecular Biology and Immunobiotechnology, Department of Biochemistry, Hellenic Pasteur Institute, Athens, Greece
| | - Eleni Kyratzopoulou
- Laboratory of Molecular Biology and Immunobiotechnology, Department of Biochemistry, Hellenic Pasteur Institute, Athens, Greece
| | - Dionysios A. Giannimaras
- Laboratory of Molecular Biology and Immunobiotechnology, Department of Biochemistry, Hellenic Pasteur Institute, Athens, Greece
| | - Stefania Sarno
- Department of Biomedical Sciences, University of Padova, Padua, Italy
| | - George Simos
- Laboratory of Biochemistry, Faculty of Medicine, University of Thessaly, Larissa, Greece
| | - Urania Georgopoulou
- Laboratory of Molecular Virology, Hellenic Pasteur Institute, Athens, Greece
| | - Avgi Mamalaki
- Laboratory of Molecular Biology and Immunobiotechnology, Department of Biochemistry, Hellenic Pasteur Institute, Athens, Greece
| |
Collapse
|
28
|
An intelligent system for identifying acetylated lysine on histones and nonhistone proteins. BIOMED RESEARCH INTERNATIONAL 2014; 2014:528650. [PMID: 25147802 PMCID: PMC4132336 DOI: 10.1155/2014/528650] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2014] [Revised: 06/23/2014] [Accepted: 06/24/2014] [Indexed: 01/15/2023]
Abstract
Lysine acetylation is an important and ubiquitous posttranslational modification conserved in prokaryotes and eukaryotes. This process, which is dynamically and temporally regulated by histone acetyltransferases and deacetylases, is crucial for numerous essential biological processes such as transcriptional regulation, cellular signaling, and stress response. Since the experimental identification of lysine acetylation sites within proteins is time-consuming and laboratory-intensive, several computational approaches have been developed to identify candidates for experimental validation. In this work, acetylated protein data collected from UniProtKB were categorized into histone or nonhistone proteins. Support vector machines (SVMs) were applied to build predictive models by using amino acid pair composition (AAPC) as a feature in a histone model. We combined BLOSUM62 and AAPC features in a nonhistone model. Furthermore, using maximal dependence decomposition (MDD) clustering can enhance the performance of the model on a fivefold cross-validation evaluation to yield a sensitivity of 0.863, specificity of 0.885, accuracy of 0.880, and MCC of 0.706. Additionally, the proposed method is evaluated using independent test sets resulting in a predictive accuracy of 74%. This indicates that the performance of our method is comparable with that of other acetylation prediction methods.
Collapse
|
29
|
Huang KY, Wu HY, Chen YJ, Lu CT, Su MG, Hsieh YC, Tsai CM, Lin KI, Huang HD, Lee TY, Chen YJ. RegPhos 2.0: an updated resource to explore protein kinase-substrate phosphorylation networks in mammals. Database (Oxford) 2014; 2014:bau034. [PMID: 24771658 PMCID: PMC3999940 DOI: 10.1093/database/bau034] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Revised: 03/27/2014] [Accepted: 03/30/2014] [Indexed: 11/13/2022]
Abstract
Protein phosphorylation catalyzed by kinases plays crucial roles in regulating a variety of intracellular processes. Owing to an increasing number of in vivo phosphorylation sites that have been identified by mass spectrometry (MS)-based proteomics, the RegPhos, available online at http://csb.cse.yzu.edu.tw/RegPhos2/, was developed to explore protein phosphorylation networks in human. In this update, we not only enhance the data content in human but also investigate kinase-substrate phosphorylation networks in mouse and rat. The experimentally validated phosphorylation sites as well as their catalytic kinases were extracted from public resources, and MS/MS phosphopeptides were manually curated from research articles. RegPhos 2.0 aims to provide a more comprehensive view of intracellular signaling networks by integrating the information of metabolic pathways and protein-protein interactions. A case study shows that analyzing the phosphoproteome profile of time-dependent cell activation obtained from Liquid chromatography-mass spectrometry (LC-MS/MS) analysis, the RegPhos deciphered not only the consistent scheme in B cell receptor (BCR) signaling pathway but also novel regulatory molecules that may involve in it. With an attempt to help users efficiently identify the candidate biomarkers in cancers, 30 microarray experiments, including 39 cancerous versus normal cells, were analyzed for detecting cancer-specific expressed genes coding for kinases and their substrates. Furthermore, this update features an improved web interface to facilitate convenient access to the exploration of phosphorylation networks for a group of genes/proteins. Database URL: http://csb.cse.yzu.edu.tw/RegPhos2/
Collapse
Affiliation(s)
- Kai-Yao Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan, Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan, Genomics Research Center, Academia Sinica, Taipei 115, Taiwan, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu 300, Taiwan and Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu 300, Taiwan
| | - Hsin-Yi Wu
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan, Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan, Genomics Research Center, Academia Sinica, Taipei 115, Taiwan, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu 300, Taiwan and Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu 300, Taiwan
| | - Yi-Ju Chen
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan, Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan, Genomics Research Center, Academia Sinica, Taipei 115, Taiwan, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu 300, Taiwan and Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu 300, Taiwan
| | - Cheng-Tsung Lu
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan, Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan, Genomics Research Center, Academia Sinica, Taipei 115, Taiwan, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu 300, Taiwan and Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu 300, Taiwan
| | - Min-Gang Su
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan, Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan, Genomics Research Center, Academia Sinica, Taipei 115, Taiwan, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu 300, Taiwan and Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu 300, Taiwan
| | - Yun-Chung Hsieh
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan, Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan, Genomics Research Center, Academia Sinica, Taipei 115, Taiwan, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu 300, Taiwan and Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu 300, Taiwan
| | - Chih-Ming Tsai
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan, Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan, Genomics Research Center, Academia Sinica, Taipei 115, Taiwan, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu 300, Taiwan and Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu 300, Taiwan
| | - Kuo-I Lin
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan, Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan, Genomics Research Center, Academia Sinica, Taipei 115, Taiwan, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu 300, Taiwan and Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu 300, Taiwan
| | - Hsien-Da Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan, Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan, Genomics Research Center, Academia Sinica, Taipei 115, Taiwan, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu 300, Taiwan and Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu 300, Taiwan
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan, Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan, Genomics Research Center, Academia Sinica, Taipei 115, Taiwan, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu 300, Taiwan and Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu 300, Taiwan
| | - Yu-Ju Chen
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan, Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan, Genomics Research Center, Academia Sinica, Taipei 115, Taiwan, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu 300, Taiwan and Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu 300, Taiwan
| |
Collapse
|
30
|
Dapat C, Saito R, Suzuki H, Horigome T. Quantitative phosphoproteomic analysis of host responses in human lung epithelial (A549) cells during influenza virus infection. Virus Res 2014; 179:53-63. [DOI: 10.1016/j.virusres.2013.11.012] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2013] [Revised: 11/05/2013] [Accepted: 11/11/2013] [Indexed: 10/26/2022]
|
31
|
Cáceres A, Perdiguero B, Gómez CE, Cepeda MV, Caelles C, Sorzano CO, Esteban M. Involvement of the cellular phosphatase DUSP1 in vaccinia virus infection. PLoS Pathog 2013; 9:e1003719. [PMID: 24244156 PMCID: PMC3828168 DOI: 10.1371/journal.ppat.1003719] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2012] [Accepted: 09/05/2013] [Indexed: 12/30/2022] Open
Abstract
Poxviruses encode a large variety of proteins that mimic, block or enhance host cell signaling pathways on their own benefit. It has been reported that mitogen-activated protein kinases (MAPKs) are specifically upregulated during vaccinia virus (VACV) infection. Here, we have evaluated the role of the MAPK negative regulator dual specificity phosphatase 1 (DUSP1) in the infection of VACV. We demonstrated that DUSP1 expression is enhanced upon infection with the replicative WR virus and with the attenuated VACV viruses MVA and NYVAC. This upregulation is dependent on early viral gene expression. In the absence of DUSP1 in cultured cells, there is an increased activation of its molecular targets JNK and ERK and an enhanced WR replication. Moreover, DUSP1 knock-out (KO) mice are more susceptible to WR infection as a result of enhanced virus replication in the lungs. Significantly, MVA, which is known to produce non-permissive infections in most mammalian cell lines, is able to grow in DUSP1 KO immortalized murine embryo fibroblasts (MEFs). By confocal and electron microscopy assays, we showed that in the absence of DUSP1 MVA morphogenesis is similar as in permissive cell lines and demonstrated that DUSP1 is involved at the stage of transition between IVN and MV in VACV morphogenesis. In addition, we have observed that the secretion of pro-inflammatory cytokines at early times post-infection in KO mice infected with MVA and NYVAC is increased and that the adaptive immune response is enhanced in comparison with WT-infected mice. Altogether, these findings reveal that DUSP1 is involved in the replication and host range of VACV and in the regulation of host immune responses through the modulation of MAPKs. Thus, in this study we demonstrate that DUSP1 is actively involved in the antiviral host defense mechanism against a poxvirus infection. Phosphorylation is a post-translational modification that is highly conserved throughout the animal kingdom. Viruses have evolved to acquire their own kinases and phosphatases and to be able to modulate host phosphorylation mechanisms on their benefit. DUSP1 is an early induced gene that belongs to the superfamily of Dual-specificity phosphatases and provides an essential negative feedback regulation of MAPKs. DUSP1 is involved in innate and adaptive immune responses against different bacteria and parasites infections. The use of Knock-out technology has allowed us to understand the role of DUSP1 in the context of VACV infection both in cultured cells and in the in vivo mouse model. Here, we have showed that DUSP1 expression is upregulated during VACV infection and that DUSP1 plays an important role in VACV replication. Interestingly, we have demonstrated that the VACV attenuated virus MVA is able to grow in immortalized murine embryo fibroblasts in the absence of DUSP1. In vivo results showed that VACV replication-competent WR pathogenesis is enhanced in the absence of DUSP1. Furthermore, we have demonstrated that DUSP1 is involved in the host innate and adaptive responses against VACV. Altogether, we have presented a novel role for DUSP1 in VACV replication and anti-VACV host immune response.
Collapse
Affiliation(s)
- Ana Cáceres
- Department of Molecular and Cellular Biology, National Centre of Biotechnology, Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Beatriz Perdiguero
- Department of Molecular and Cellular Biology, National Centre of Biotechnology, Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Carmen E. Gómez
- Department of Molecular and Cellular Biology, National Centre of Biotechnology, Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Maria Victoria Cepeda
- Department of Molecular and Cellular Biology, National Centre of Biotechnology, Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Carme Caelles
- Department of Biochemistry and Molecular Biology, School of Pharmacy, University of Barcelona, Barcelona, Spain
| | - Carlos Oscar Sorzano
- Biocomputing Unit, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Mariano Esteban
- Department of Molecular and Cellular Biology, National Centre of Biotechnology, Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
- * E-mail:
| |
Collapse
|
32
|
Huang KY, Lu CT, Bretaña N, Lee TY, Chang TH. ViralPhos: incorporating a recursively statistical method to predict phosphorylation sites on virus proteins. BMC Bioinformatics 2013; 14 Suppl 16:S10. [PMID: 24564381 PMCID: PMC3853219 DOI: 10.1186/1471-2105-14-s16-s10] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Background The phosphorylation of virus proteins by host kinases is linked to viral replication. This leads to an inhibition of normal host-cell functions. Further elucidation of phosphorylation in virus proteins is required in order to aid in drug design and treatment. However, only a few studies have investigated substrate motifs in identifying virus phosphorylation sites. Additionally, existing bioinformatics tool do not consider potential host kinases that may initiate the phosphorylation of a virus protein. Results 329 experimentally verified phosphorylation fragments on 111 virus proteins were collected from virPTM. These were clustered into subgroups of significantly conserved motifs using a recursively statistical method. Two-layered Support Vector Machines (SVMs) were then applied to train a predictive model for the identified substrate motifs. The SVM models were evaluated using a five-fold cross validation which yields an average accuracy of 0.86 for serine, and 0.81 for threonine. Furthermore, the proposed method is shown to perform at par with three other phosphorylation site prediction tools: PPSP, KinasePhos 2.0 and GPS 2.1. Conclusion In this study, we propose a computational method, ViralPhos, which aims to investigate virus substrate site motifs and identify potential phosphorylation sites on virus proteins. We identified informative substrate motifs that matched with several well-studied kinase groups as potential catalytic kinases for virus protein substrates. The identified substrate motifs were further exploited to identify potential virus phosphorylation sites. The proposed method is shown to be capable of predicting virus phosphorylation sites and has been implemented as a web server http://csb.cse.yzu.edu.tw/ViralPhos/.
Collapse
|
33
|
Su MG, Lee TY. Incorporating substrate sequence motifs and spatial amino acid composition to identify kinase-specific phosphorylation sites on protein three-dimensional structures. BMC Bioinformatics 2013; 14 Suppl 16:S2. [PMID: 24564522 PMCID: PMC3853090 DOI: 10.1186/1471-2105-14-s16-s2] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in cellular processes. Given the high-throughput mass spectrometry-based experiments, the desire to annotate the catalytic kinases for in vivo phosphorylation sites has motivated. Thus, a variety of computational methods have been developed for performing a large-scale prediction of kinase-specific phosphorylation sites. However, most of the proposed methods solely rely on the local amino acid sequences surrounding the phosphorylation sites. An increasing number of three-dimensional structures make it possible to physically investigate the structural environment of phosphorylation sites. RESULTS In this work, all of the experimental phosphorylation sites are mapped to the protein entries of Protein Data Bank by sequence identity. It resulted in a total of 4508 phosphorylation sites containing the protein three-dimensional (3D) structures. To identify phosphorylation sites on protein 3D structures, this work incorporates support vector machines (SVMs) with the information of linear motifs and spatial amino acid composition, which is determined for each kinase group by calculating the relative frequencies of 20 amino acid types within a specific radial distance from central phosphorylated amino acid residue. After the cross-validation evaluation, most of the kinase-specific models trained with the consideration of structural information outperform the models considering only the sequence information. Furthermore, the independent testing set which is not included in training set has demonstrated that the proposed method could provide a comparable performance to other popular tools. CONCLUSION The proposed method is shown to be capable of predicting kinase-specific phosphorylation sites on 3D structures and has been implemented as a web server which is freely accessible at http://csb.cse.yzu.edu.tw/PhosK3D/. Due to the difficulty of identifying the kinase-specific phosphorylation sites with similar sequenced motifs, this work also integrates the 3D structural information to improve the cross classifying specificity.
Collapse
|
34
|
Berard A, Kroeker AL, Coombs KM. Transcriptomics and quantitative proteomics in virology. Future Virol 2012. [DOI: 10.2217/fvl.12.112] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
35
|
Lu CT, Huang KY, Su MG, Lee TY, Bretaña NA, Chang WC, Chen YJ, Chen YJ, Huang HD. DbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications. Nucleic Acids Res 2012. [PMID: 23193290 PMCID: PMC3531199 DOI: 10.1093/nar/gks1229] [Citation(s) in RCA: 165] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Protein modification is an extremely important post-translational regulation that adjusts the physical and chemical properties, conformation, stability and activity of a protein; thus altering protein function. Due to the high throughput of mass spectrometry (MS)-based methods in identifying site-specific post-translational modifications (PTMs), dbPTM (http://dbPTM.mbc.nctu.edu.tw/) is updated to integrate experimental PTMs obtained from public resources as well as manually curated MS/MS peptides associated with PTMs from research articles. Version 3.0 of dbPTM aims to be an informative resource for investigating the substrate specificity of PTM sites and functional association of PTMs between substrates and their interacting proteins. In order to investigate the substrate specificity for modification sites, a newly developed statistical method has been applied to identify the significant substrate motifs for each type of PTMs containing sufficient experimental data. According to the data statistics in dbPTM, >60% of PTM sites are located in the functional domains of proteins. It is known that most PTMs can create binding sites for specific protein-interaction domains that work together for cellular function. Thus, this update integrates protein–protein interaction and domain–domain interaction to determine the functional association of PTM sites located in protein-interacting domains. Additionally, the information of structural topologies on transmembrane (TM) proteins is integrated in dbPTM in order to delineate the structural correlation between the reported PTM sites and TM topologies. To facilitate the investigation of PTMs on TM proteins, the PTM substrate sites and the structural topology are graphically represented. Also, literature information related to PTMs, orthologous conservations and substrate motifs of PTMs are also provided in the resource. Finally, this version features an improved web interface to facilitate convenient access to the resource.
Collapse
Affiliation(s)
- Cheng-Tsung Lu
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 320, Taiwan
| | | | | | | | | | | | | | | | | |
Collapse
|