1
|
Zhu F, Yang S, Meng F, Zheng Y, Ku X, Luo C, Hu G, Liang Z. Leveraging Protein Dynamics to Identify Functional Phosphorylation Sites using Deep Learning Models. J Chem Inf Model 2022; 62:3331-3345. [PMID: 35816597 DOI: 10.1021/acs.jcim.2c00484] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Accurate prediction of post-translational modifications (PTMs) is of great significance in understanding cellular processes, by modulating protein structure and dynamics. Nowadays, with the rapid growth of protein data at different "omics" levels, machine learning models largely enriched the prediction of PTMs. However, most machine learning models only rely on protein sequence and little structural information. The lack of the systematic dynamics analysis underlying PTMs largely limits the PTM functional predictions. In this research, we present two dynamics-centric deep learning models, namely, cDL-PAU and cDL-FuncPhos, by incorporating sequence, structure, and dynamics-based features to elucidate the molecular basis and underlying functional landscape of PTMs. cDL-PAU achieved satisfactory area under the curve (AUC) scores of 0.804-0.888 for predicting phosphorylation, acetylation, and ubiquitination (PAU) sites, while cDL-FuncPhos achieved an AUC value of 0.771 for predicting functional phosphorylation (FuncPhos) sites, displaying reliable improvements. Through a feature selection, the dynamics-based coupling and commute ability show large contributions in discovering PAU sites and FuncPhos sites, suggesting the allosteric propensity for important PTMs. The application of cDL-FuncPhos in three oncoproteins not only corroborates its strong performance in FuncPhos prioritization but also gains insight into the physical basis for the functions. The source code and data set of cDL-PAU and cDL-FuncPhos are available at https://github.com/ComputeSuda/PTM_ML.
Collapse
Affiliation(s)
- Fei Zhu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China.,School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Sijie Yang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Fanwang Meng
- Department of Chemistry and Chemical Biology, McMaster University, Hamilton L8S 4L8, Ontario, Canada
| | - Yuxiang Zheng
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Xin Ku
- Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Cheng Luo
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Zhongjie Liang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China.,Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China.,State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| |
Collapse
|
2
|
de Brevern AG, Rebehmed J. Current status of PTMs structural databases: applications, limitations and prospects. Amino Acids 2022; 54:575-590. [PMID: 35020020 DOI: 10.1007/s00726-021-03119-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 12/20/2021] [Indexed: 12/11/2022]
Abstract
Protein 3D structures, determined by their amino acid sequences, are the support of major crucial biological functions. Post-translational modifications (PTMs) play an essential role in regulating these functions by altering the physicochemical properties of proteins. By virtue of their importance, several PTM databases have been developed and released in decades, but very few of these databases incorporate real 3D structural data. Since PTMs influence the function of the protein and their aberrant states are frequently implicated in human diseases, providing structural insights to understand the influence and dynamics of PTMs is crucial for unraveling the underlying processes. This review is dedicated to the current status of databases providing 3D structural data on PTM sites in proteins. Some of these databases are general, covering multiple types of PTMs in different organisms, while others are specific to one particular type of PTM, class of proteins or organism. The importance of these databases is illustrated with two major types of in silico applications: predicting PTM sites in proteins using machine learning approaches and investigating protein structure-function relationships involving PTMs. Finally, these databases suffer from multiple problems and care must be taken when analyzing the PTMs data.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Université de Paris, INSERM, UMR_S 1134, DSIMB, 75739, Paris, France.,Université de la Réunion, INSERM, UMR_S 1134, DSIMB, 97715, Saint-Denis de La Réunion, France.,Laboratoire d'Excellence GR-Ex, 75739, Paris, France
| | - Joseph Rebehmed
- Department of Computer Science and Mathematics, Lebanese American University, Beirut, Lebanon.
| |
Collapse
|
3
|
Jhong JH, Yao L, Pang Y, Li Z, Chung CR, Wang R, Li S, Li W, Luo M, Ma R, Huang Y, Zhu X, Zhang J, Feng H, Cheng Q, Wang C, Xi K, Wu LC, Chang TH, Horng JT, Zhu L, Chiang YC, Wang Z, Lee TY. dbAMP 2.0: updated resource for antimicrobial peptides with an enhanced scanning method for genomic and proteomic data. Nucleic Acids Res 2021; 50:D460-D470. [PMID: 34850155 PMCID: PMC8690246 DOI: 10.1093/nar/gkab1080] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 10/16/2021] [Accepted: 10/25/2021] [Indexed: 12/26/2022] Open
Abstract
The last 18 months, or more, have seen a profound shift in our global experience, with many of us navigating a once-in-100-year pandemic. To date, COVID-19 remains a life-threatening pandemic with little to no targeted therapeutic recourse. The discovery of novel antiviral agents, such as vaccines and drugs, can provide therapeutic solutions to save human beings from severe infections; however, there is no specifically effective antiviral treatment confirmed for now. Thus, great attention has been paid to the use of natural or artificial antimicrobial peptides (AMPs) as these compounds are widely regarded as promising solutions for the treatment of harmful microorganisms. Given the biological significance of AMPs, it was obvious that there was a significant need for a single platform for identifying and engaging with AMP data. This led to the creation of the dbAMP platform that provides comprehensive information about AMPs and facilitates their investigation and analysis. To date, the dbAMP has accumulated 26 447 AMPs and 2262 antimicrobial proteins from 3044 organisms using both database integration and manual curation of >4579 articles. In addition, dbAMP facilitates the evaluation of AMP structures using I-TASSER for automated protein structure prediction and structure-based functional annotation, providing predictive structure information for clinical drug development. Next-generation sequencing (NGS) and third-generation sequencing have been applied to generate large-scale sequencing reads from various environments, enabling greatly improved analysis of genome structure. In this update, we launch an efficient online tool that can effectively identify AMPs from genome/metagenome and proteome data of all species in a short period. In conclusion, these improvements promote the dbAMP as one of the most abundant and comprehensively annotated resources for AMPs. The updated dbAMP is now freely accessible at http://awi.cuhk.edu.cn/dbAMP.
Collapse
Affiliation(s)
- Jhih-Hua Jhong
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Lantian Yao
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Yuxuan Pang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Zhongyan Li
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, Taoyuan 32001, Taiwan
| | - Rulan Wang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Shangfu Li
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Wenshuo Li
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Mengqi Luo
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Renfei Ma
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Yuqi Huang
- School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Xiaoning Zhu
- School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Jiahong Zhang
- School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Hexiang Feng
- School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Qifan Cheng
- School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Chunxuan Wang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Kun Xi
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Li-Ching Wu
- Department of Biomedical Sciences and Engineering, National Central University, Taoyuan 32001, Taiwan
| | - Tzu-Hao Chang
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei 10675, Taiwan
| | - Jorng-Tzong Horng
- Department of Computer Science and Information Engineering, National Central University, Taoyuan 32001, Taiwan
| | - Lizhe Zhu
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Ying-Chih Chiang
- School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Zhuo Wang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen 518172, China
| |
Collapse
|
4
|
Zhang H, He J, Hu G, Zhu F, Jiang H, Gao J, Zhou H, Lin H, Wang Y, Chen K, Meng F, Hao M, Zhao K, Luo C, Liang Z. Dynamics of Post-Translational Modification Inspires Drug Design in the Kinase Family. J Med Chem 2021; 64:15111-15125. [PMID: 34668699 DOI: 10.1021/acs.jmedchem.1c01076] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Post-translational modification (PTM) on protein plays important roles in the regulation of cellular function and disease pathogenesis. The systematic analysis of PTM dynamics presents great opportunities to enlarge the target space by PTM allosteric regulation. Here, we presented a framework by integrating the sequence, structural topology, and particular dynamics features to characterize the functional context and druggabilities of PTMs in the well-known kinase family. The machine learning models with these biophysical features could successfully predict PTMs. On the other hand, PTMs were identified to be significantly enriched in the reported allosteric pockets and the allosteric potential of PTM pockets were thus proposed through these biophysical features. In the end, the covalent inhibitor DC-Srci-6668 targeting the PTM pocket in c-Src kinase was identified, which inhibited the phosphorylation and locked c-Src in the inactive state. Our findings represent a crucial step toward PTM-inspired drug design in the kinase family.
Collapse
Affiliation(s)
- Huimin Zhang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China.,Drug Discovery and Design Center, the Center for Chemical Biology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,School of Life Science and Technology, Shanghai Tech University, 100 Haike Road, Shanghai 201210, China.,University of Chinese Academy of Sciences (UCAS), 19 Yuquan Road, Beijing 100049, China
| | - Jixiao He
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Fei Zhu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Hao Jiang
- Drug Discovery and Design Center, the Center for Chemical Biology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences (UCAS), 19 Yuquan Road, Beijing 100049, China
| | - Jing Gao
- Drug Discovery and Design Center, the Center for Chemical Biology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences (UCAS), 19 Yuquan Road, Beijing 100049, China
| | - Hu Zhou
- Drug Discovery and Design Center, the Center for Chemical Biology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences (UCAS), 19 Yuquan Road, Beijing 100049, China
| | - Hua Lin
- Biomedical Research Center of South China, College of Life Sciences, Fujian Normal University, 1 Keji Road, Fuzhou 350117, China
| | - Yingjuan Wang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Kaixian Chen
- Drug Discovery and Design Center, the Center for Chemical Biology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,School of Life Science and Technology, Shanghai Tech University, 100 Haike Road, Shanghai 201210, China.,University of Chinese Academy of Sciences (UCAS), 19 Yuquan Road, Beijing 100049, China
| | - Fanwang Meng
- Department of Chemistry and Chemical Biology, McMaster University, Hamilton, ON L8S 4L8, Canada
| | - Minghong Hao
- Ensem Therapeutics, Inc., 200 Boston Avenue, Medford, Massachusetts 02155, United States
| | - Kehao Zhao
- School of Pharmacy, Key Laboratory of Molecular Pharmacology and Drug Evaluation (Yantai University), Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong, Yantai University, Yantai 264005, China
| | - Cheng Luo
- Drug Discovery and Design Center, the Center for Chemical Biology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,School of Life Science and Technology, Shanghai Tech University, 100 Haike Road, Shanghai 201210, China.,University of Chinese Academy of Sciences (UCAS), 19 Yuquan Road, Beijing 100049, China.,School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China
| | - Zhongjie Liang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| |
Collapse
|
5
|
Wang HY, Chung CR, Wang Z, Li S, Chu BY, Horng JT, Lu JJ, Lee TY. A large-scale investigation and identification of methicillin-resistant Staphylococcus aureus based on peaks binning of matrix-assisted laser desorption ionization-time of flight MS spectra. Brief Bioinform 2021; 22:bbaa138. [PMID: 32672791 PMCID: PMC8138823 DOI: 10.1093/bib/bbaa138] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 06/01/2020] [Accepted: 06/05/2020] [Indexed: 12/21/2022] Open
Abstract
Recent studies have demonstrated that the matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) could be used to detect superbugs, such as methicillin-resistant Staphylococcus aureus (MRSA). Due to an increasingly clinical need to classify between MRSA and methicillin-sensitive Staphylococcus aureus (MSSA) efficiently and effectively, we were motivated to develop a systematic pipeline based on a large-scale dataset of MS spectra. However, the shifting problem of peaks in MS spectra induced a low effectiveness in the classification between MRSA and MSSA isolates. Unlike previous works emphasizing on specific peaks, this study employs a binning method to cluster MS shifting ions into several representative peaks. A variety of bin sizes were evaluated to coalesce drifted or shifted MS peaks to a well-defined structured data. Then, various machine learning methods were performed to carry out the classification between MRSA and MSSA samples. Totally 4858 MS spectra of unique S. aureus isolates, including 2500 MRSA and 2358 MSSA instances, were collected by Chang Gung Memorial Hospitals, at Linkou and Kaohsiung branches, Taiwan. Based on the evaluation of Pearson correlation coefficients and the strategy of forward feature selection, a total of 200 peaks (with the bin size of 10 Da) were identified as the marker attributes for the construction of predictive models. These selected peaks, such as bins 2410-2419, 2450-2459 and 6590-6599 Da, have indicated remarkable differences between MRSA and MSSA, which were effective in the prediction of MRSA. The independent testing has revealed that the random forest model can provide a promising prediction with the area under the receiver operating characteristic curve (AUC) at 0.8450. When comparing to previous works conducted with hundreds of MS spectra, the proposed scheme demonstrates that incorporating machine learning method with a large-scale dataset of clinical MS spectra may be a feasible means for clinical physicians on the administration of correct antibiotics in shorter turn-around-time, which could reduce mortality, avoid drug resistance and shorten length of stay in hospital in the future.
Collapse
Affiliation(s)
- Hsin-Yao Wang
- Department of Laboratory Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan City, Taiwan
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University
| | - Zhuo Wang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China
| | - Shangfu Li
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China
| | - Bo-Yu Chu
- Department of Computer Science & Engineering, Yuan Ze University, Taoyuan City, Taiwan
| | - Jorng-Tzong Horng
- Department of Laboratory Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan City, Taiwan
| | - Jang-Jih Lu
- Department of Computer Science and Information Engineering, National Central University, Taiwan
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, School of Life and Health Sciences
| |
Collapse
|
6
|
Aggarwal S, Tolani P, Gupta S, Yadav AK. Posttranslational modifications in systems biology. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2021; 127:93-126. [PMID: 34340775 DOI: 10.1016/bs.apcsb.2021.03.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The biological complexity cannot be captured by genes or proteins alone. The protein posttranslational modifications (PTMs) impart functional diversity to the proteome and regulate protein structure, activity, localization and interactions. Their dynamics drive cellular signaling, growth and development while their dysregulation causes many diseases. Mass spectrometry based quantitative profiling of PTMs and bioinformatics analysis tools allow systems level insights into their network architecture. High-resolution profiling of PTM networks will advance disease understanding and precision medicine. It can accelerate the discovery of biomarkers and drug targets. This requires better tools for unbiased, high-throughput and accurate PTM identification, site localization and automated annotation on a systems level.
Collapse
Affiliation(s)
- Suruchi Aggarwal
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India; Department of Molecular Biology and Biotechnology, Cotton University, Guwahati, Assam, India
| | - Priya Tolani
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India
| | - Srishti Gupta
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India; School of Biosciences and Technology, Vellore Institute of Technology, Vellore, India
| | - Amit Kumar Yadav
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India.
| |
Collapse
|
7
|
Yang Y, Wang H, Li W, Wang X, Wei S, Liu Y, Xu Y. Prediction and analysis of multiple protein lysine modified sites based on conditional wasserstein generative adversarial networks. BMC Bioinformatics 2021; 22:171. [PMID: 33789579 PMCID: PMC8010967 DOI: 10.1186/s12859-021-04101-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 03/23/2021] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Protein post-translational modification (PTM) is a key issue to investigate the mechanism of protein's function. With the rapid development of proteomics technology, a large amount of protein sequence data has been generated, which highlights the importance of the in-depth study and analysis of PTMs in proteins. METHOD We proposed a new multi-classification machine learning pipeline MultiLyGAN to identity seven types of lysine modified sites. Using eight different sequential and five structural construction methods, 1497 valid features were remained after the filtering by Pearson correlation coefficient. To solve the data imbalance problem, Conditional Generative Adversarial Network (CGAN) and Conditional Wasserstein Generative Adversarial Network (CWGAN), two influential deep generative methods were leveraged and compared to generate new samples for the types with fewer samples. Finally, random forest algorithm was utilized to predict seven categories. RESULTS In the tenfold cross-validation, accuracy (Acc) and Matthews correlation coefficient (MCC) were 0.8589 and 0.8376, respectively. In the independent test, Acc and MCC were 0.8549 and 0.8330, respectively. The results indicated that CWGAN better solved the existing data imbalance and stabilized the training error. Alternatively, an accumulated feature importance analysis reported that CKSAAP, PWM and structural features were the three most important feature-encoding schemes. MultiLyGAN can be found at https://github.com/Lab-Xu/MultiLyGAN . CONCLUSIONS The CWGAN greatly improved the predictive performance in all experiments. Features derived from CKSAAP, PWM and structure schemes are the most informative and had the greatest contribution to the prediction of PTM.
Collapse
Affiliation(s)
- Yingxi Yang
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Hui Wang
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100080, China
| | - Wen Li
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Xiaobo Wang
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Shizhao Wei
- No. 15 Research Institute, China Electronics Technology Group Corporation, Beijing, 100083, China
| | - Yulong Liu
- No. 15 Research Institute, China Electronics Technology Group Corporation, Beijing, 100083, China
| | - Yan Xu
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing, 100083, China.
| |
Collapse
|
8
|
Di Fiore A, Supuran CT, Scaloni A, De Simone G. Human carbonic anhydrases and post-translational modifications: a hidden world possibly affecting protein properties and functions. J Enzyme Inhib Med Chem 2021; 35:1450-1461. [PMID: 32648529 PMCID: PMC7470082 DOI: 10.1080/14756366.2020.1781846] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Human carbonic anhydrases (CAs) have become a well-recognized target for the design of inhibitors and activators with biomedical applications. Accordingly, an enormous amount of literature is available on their biochemical, functional and structural aspects. Nevertheless post-translational modifications (PTMs) occurring on these enzymes and their functional implications have been poorly investigated so far. To fill this gap, in this review we have analysed all PTMs occurring on human CAs, as deriving from the search in dedicated databases, showing a widespread occurrence of modification events in this enzyme family. By combining these data with sequence alignments, inspection of 3 D structures and available literature, we have summarised the possible functional implications of these PTMs. Although in some cases a clear correlation between a specific PTM and the CA function has been highlighted, many modification events still deserve further dedicated studies.
Collapse
Affiliation(s)
- Anna Di Fiore
- Istituto di Biostrutture e Bioimmagini-National Research Council, Napoli, Italy
| | - Claudiu T Supuran
- NEUROFARBA Department, Pharmaceutical and Nutraceutical Section, University of Firenze, Sesto Fiorentino, Italy
| | - Andrea Scaloni
- Proteomics and Mass Spectrometry Laboratory, ISPAAM, National Research Council, Napoli, Italy
| | | |
Collapse
|
9
|
The tip of the iceberg for diagnostic dilemmas: Performance of current diagnostics and future complementary screening approaches. Eur J Med Genet 2020; 63:104089. [PMID: 33069933 DOI: 10.1016/j.ejmg.2020.104089] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 09/15/2020] [Accepted: 10/12/2020] [Indexed: 11/24/2022]
Abstract
Genetic testing is currently the leading edge of clinical care when it comes to diagnostics. However, many questions remain unanswered even when employing next-generation sequencing techniques due to our inability to decode genetic variations and our limited repertoire of available diagnoses. Accordingly, diagnostic yields for current genomic screenings are <50% and fail to provide the whole picture, leaving the remaining patients without a definitive diagnosis. Human phenotypic/disease expression is explained by alterations not only at the genome, but also at the transcriptome, proteome and metabolome levels. These "higher" complexity levels represent at wealth of information, and diagnostic screenings tests at these levels have been shown to significantly improve diagnostic yields in specific populations compared to conventional diagnostic workup or gold standards in use (7-30% increase in diagnostic yields, depending on the population, approach and gold standard being compared against). However, these are not yet routinely available to clinicians. Due to their dynamic and modifiable nature, tapping into data from different omics will improve our understanding of the pathophysiological bases underlying (many yet to characterize) human disorders. We herein review how alterations at these levels (e.g. post-transcriptional and post-translational) may be pathogenic, how such tests may be implemented and in which situations they are of significant utility.
Collapse
|
10
|
Watanabe Costa R, Batista MF, Meneghelli I, Vidal RO, Nájera CA, Mendes AC, Andrade-Lima IA, da Silveira JF, Lopes LR, Ferreira LRP, Antoneli F, Bahia D. Comparative Analysis of the Secretome and Interactome of Trypanosoma cruzi and Trypanosoma rangeli Reveals Species Specific Immune Response Modulating Proteins. Front Immunol 2020; 11:1774. [PMID: 32973747 PMCID: PMC7481403 DOI: 10.3389/fimmu.2020.01774] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 07/02/2020] [Indexed: 12/04/2022] Open
Abstract
Chagas disease, a zoonosis caused by the flagellate protozoan Trypanosoma cruzi, is a chronic and systemic parasitic infection that affects ~5–7 million people worldwide, mainly in Latin America. Chagas disease is an emerging public health problem due to the lack of vaccines and effective treatments. According to recent studies, several T. cruzi secreted proteins interact with the human host during cell invasion. Moreover, some comparative studies with T. rangeli, which is non-pathogenic in humans, have been performed to identify proteins directly involved in the pathogenesis of the disease. In this study, we present an integrated analysis of canonical putative secreted proteins (PSPs) from both species. Additionally, we propose an interactome with human host and gene family clusters, and a phylogenetic inference of a selected protein. In total, we identified 322 exclusively PSPs in T. cruzi and 202 in T. rangeli. Among the PSPs identified in T. cruzi, we found several trans-sialidases, mucins, MASPs, proteins with phospholipase 2 domains (PLA2-like), and proteins with Hsp70 domains (Hsp70-like) which have been previously characterized and demonstrated to be related to T. cruzi virulence. PSPs found in T. rangeli were related to protozoan metabolism, specifically carboxylases and phosphatases. Furthermore, we also identified PSPs that may interact with the human immune system, including heat shock and MASP proteins, but in a lower number compared to T. cruzi. Interestingly, we describe a hypothetical hybrid interactome of PSPs which reveals that T. cruzi secreted molecules may be down-regulating IL-17 whilst T. rangeli may enhance the production of IL-15. These results will pave the way for a better understanding of the pathophysiology of Chagas disease and may ultimately lead to the identification of molecular targets, such as key PSPs, that could be used to minimize the health outcomes of Chagas disease by modulating the immune response triggered by T. cruzi infection.
Collapse
Affiliation(s)
- Renata Watanabe Costa
- Departamento de Microbiologia, Imunologia e Parasitologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Marina Ferreira Batista
- Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Isabela Meneghelli
- Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Ramon Oliveira Vidal
- The Berlin Institute for Medical Systems Biology-Max Delbrück Center for Molecular Medicine in the Helmholtz Association in Berlin, Berlin, Germany.,Laboratorio Nacional de Biociências (LNBio), Campinas, São Paulo, Brazil
| | - Carlos Alcides Nájera
- Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Ana Clara Mendes
- Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Izabela Augusta Andrade-Lima
- Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - José Franco da Silveira
- Departamento de Microbiologia, Imunologia e Parasitologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Luciano Rodrigo Lopes
- Departamento de Informática em Saúde, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Ludmila Rodrigues Pinto Ferreira
- RNA Systems Biology Lab (RSBL), Departamento de Morfologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Fernando Antoneli
- Departamento de Informática em Saúde, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Diana Bahia
- Departamento de Microbiologia, Imunologia e Parasitologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, Brazil.,Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| |
Collapse
|
11
|
Kao HJ, Nguyen VN, Huang KY, Chang WC, Lee TY. SuccSite: Incorporating Amino Acid Composition and Informative k-spaced Amino Acid Pairs to Identify Protein Succinylation Sites. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 18:208-219. [PMID: 32592791 PMCID: PMC7647693 DOI: 10.1016/j.gpb.2018.10.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 10/01/2018] [Accepted: 10/11/2018] [Indexed: 12/14/2022]
Abstract
Protein succinylation is a biochemical reaction in which a succinyl group (-CO-CH2-CH2-CO-) is attached to the lysine residue of a protein molecule. Lysine succinylation plays important regulatory roles in living cells. However, studies in this field are limited by the difficulty in experimentally identifying the substrate site specificity of lysine succinylation. To facilitate this process, several tools have been proposed for the computational identification of succinylated lysine sites. In this study, we developed an approach to investigate the substrate specificity of lysine succinylated sites based on amino acid composition. Using experimentally verified lysine succinylated sites collected from public resources, the significant differences in position-specific amino acid composition between succinylated and non-succinylated sites were represented using the Two Sample Logo program. These findings enabled the adoption of an effective machine learning method, support vector machine, to train a predictive model with not only the amino acid composition, but also the composition of k-spaced amino acid pairs. After the selection of the best model using a ten-fold cross-validation approach, the selected model significantly outperformed existing tools based on an independent dataset manually extracted from published research articles. Finally, the selected model was used to develop a web-based tool, SuccSite, to aid the study of protein succinylation. Two proteins were used as case studies on the website to demonstrate the effective prediction of succinylation sites. We will regularly update SuccSite by integrating more experimental datasets. SuccSite is freely accessible at http://csb.cse.yzu.edu.tw/SuccSite/.
Collapse
Affiliation(s)
- Hui-Ju Kao
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan, China
| | - Van-Nui Nguyen
- Department of Information Technology, University of Information and Communication Technology, Thai Nguyen 1000, Vietnam
| | - Kai-Yao Huang
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China; Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Wen-Chi Chang
- Institute of Tropical Plant Sciences, Cheng Kung University, Tainan 701, Taiwan, China
| | - Tzong-Yi Lee
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China; Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.
| |
Collapse
|
12
|
Huang KY, Lee TY, Kao HJ, Ma CT, Lee CC, Lin TH, Chang WC, Huang HD. dbPTM in 2019: exploring disease association and cross-talk of post-translational modifications. Nucleic Acids Res 2020; 47:D298-D308. [PMID: 30418626 PMCID: PMC6323979 DOI: 10.1093/nar/gky1074] [Citation(s) in RCA: 138] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 10/19/2018] [Indexed: 12/25/2022] Open
Abstract
The dbPTM (http://dbPTM.mbc.nctu.edu.tw/) has been maintained for over 10 years with the aim to provide functional and structural analyses for post-translational modifications (PTMs). In this update, dbPTM not only integrates more experimentally validated PTMs from available databases and through manual curation of literature but also provides PTM-disease associations based on non-synonymous single nucleotide polymorphisms (nsSNPs). The high-throughput deep sequencing technology has led to a surge in the data generated through analysis of association between SNPs and diseases, both in terms of growth amount and scope. This update thus integrated disease-associated nsSNPs from dbSNP based on genome-wide association studies. The PTM substrate sites located at a specified distance in terms of the amino acids encoded from nsSNPs were deemed to have an association with the involved diseases. In recent years, increasing evidence for crosstalk between PTMs has been reported. Although mass spectrometry-based proteomics has substantially improved our knowledge about substrate site specificity of single PTMs, the fact that the crosstalk of combinatorial PTMs may act in concert with the regulation of protein function and activity is neglected. Because of the relatively limited information about concurrent frequency and functional relevance of PTM crosstalk, in this update, the PTM sites neighboring other PTM sites in a specified window length were subjected to motif discovery and functional enrichment analysis. This update highlights the current challenges in PTM crosstalk investigation and breaks the bottleneck of how proteomics may contribute to understanding PTM codes, revealing the next level of data complexity and proteomic limitation in prospective PTM research.
Collapse
Affiliation(s)
- Kai-Yao Huang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Life and Health Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Life and Health Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Hui-Ju Kao
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
| | - Chen-Tse Ma
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
| | - Chao-Chun Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
| | - Tsai-Hsuan Lin
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
| | - Wen-Chi Chang
- Institute of Tropical Plant Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| | - Hsien-Da Huang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Life and Health Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| |
Collapse
|
13
|
Zuo Y, Jia CZ. CarSite: identifying carbonylated sites of human proteins based on a one-sided selection resampling method. MOLECULAR BIOSYSTEMS 2018; 13:2362-2369. [PMID: 28937156 DOI: 10.1039/c7mb00363c] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Protein carbonylation is one of the most important biomarkers of oxidative protein damage and such protein damage is linked to various diseases and aging. It is thus vital that carbonylation sites are identified accurately. In this study, CarSite, a novel bioinformatics tool, was established to identify carbonylation sites in human proteins. The one-sided selection (OSS) resampling method was used to establish balanced training datasets and this resampling method is demonstrated to perform better than a Monte Carlo resampling method via 10-fold cross-validation tests on the Jia dataset. Moreover, the hybrid combination of position-specific amino acid propensity (PSAAP), composition of k-spaced amino acid pairs (CKSAAP), amino acid composition (AAC), and composition of hydrophobic and hydrophilic amino acids (CHHAA) was selected to optimize the performance of the predictor. On 10-fold cross-validation of the Jia dataset, CarSite obtained rates of sensitivity corresponding to K/P/R/T-type peptides of ∼21%, 22%, 19%, or 18% higher than those obtained by iCar-PseCp, respectively, which was previously considered as the best predictor for identifying carbonylation sites in human proteins. Furthermore, compared with other existing predictors, CarSite obtained much higher sensitivity and accuracy when tested on the same dataset.
Collapse
Affiliation(s)
- Yun Zuo
- Department of Mathematics, Dalian Maritime University, No. 1 Linghai Road, Dalian, 116026, China.
| | | |
Collapse
|
14
|
Su MG, Weng JTY, Hsu JBK, Huang KY, Chi YH, Lee TY. Investigation and identification of functional post-translational modification sites associated with drug binding and protein-protein interactions. BMC SYSTEMS BIOLOGY 2017; 11:132. [PMID: 29322920 PMCID: PMC5763307 DOI: 10.1186/s12918-017-0506-1] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Background Protein post-translational modification (PTM) plays an essential role in various cellular processes that modulates the physical and chemical properties, folding, conformation, stability and activity of proteins, thereby modifying the functions of proteins. The improved throughput of mass spectrometry (MS) or MS/MS technology has not only brought about a surge in proteome-scale studies, but also contributed to a fruitful list of identified PTMs. However, with the increase in the number of identified PTMs, perhaps the more crucial question is what kind of biological mechanisms these PTMs are involved in. This is particularly important in light of the fact that most protein-based pharmaceuticals deliver their therapeutic effects through some form of PTM. Yet, our understanding is still limited with respect to the local effects and frequency of PTM sites near pharmaceutical binding sites and the interfaces of protein-protein interaction (PPI). Understanding PTM’s function is critical to our ability to manipulate the biological mechanisms of protein. Results In this study, to understand the regulation of protein functions by PTMs, we mapped 25,835 PTM sites to proteins with available three-dimensional (3D) structural information in the Protein Data Bank (PDB), including 1785 modified PTM sites on the 3D structure. Based on the acquired structural PTM sites, we proposed to use five properties for the structural characterization of PTM substrate sites: the spatial composition of amino acids, residues and side-chain orientations surrounding the PTM substrate sites, as well as the secondary structure, division of acidity and alkaline residues, and solvent-accessible surface area. We further mapped the structural PTM sites to the structures of drug binding and PPI sites, identifying a total of 1917 PTM sites that may affect PPI and 3951 PTM sites associated with drug-target binding. An integrated analytical platform (CruxPTM), with a variety of methods and online molecular docking tools for exploring the structural characteristics of PTMs, is presented. In addition, all tertiary structures of PTM sites on proteins can be visualized using the JSmol program. Conclusion Resolving the function of PTM sites is important for understanding the role that proteins play in biological mechanisms. Our work attempted to delineate the structural correlation between PTM sites and PPI or drug-target binding. CurxPTM could help scientists narrow the scope of their PTM research and enhance the efficiency of PTM identification in the face of big proteome data. CruxPTM is now available at http://csb.cse.yzu.edu.tw/CruxPTM/. Electronic supplementary material The online version of this article (10.1186/s12918-017-0506-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Min-Gang Su
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan
| | - Julia Tzu-Ya Weng
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan
| | - Justin Bo-Kai Hsu
- Department of Medical Research, Taipei Medical University Hospital, Taipei, 110, Taiwan
| | - Kai-Yao Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan.,Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu City, 300, Taiwan
| | - Yu-Hsiang Chi
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, 320, Taiwan.
| |
Collapse
|
15
|
Weng SL, Huang KY, Kaunang FJ, Huang CH, Kao HJ, Chang TH, Wang HY, Lu JJ, Lee TY. Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features. BMC Bioinformatics 2017; 18:66. [PMID: 28361707 PMCID: PMC5374553 DOI: 10.1186/s12859-017-1472-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Protein carbonylation, an irreversible and non-enzymatic post-translational modification (PTM), is often used as a marker of oxidative stress. When reactive oxygen species (ROS) oxidized the amino acid side chains, carbonyl (CO) groups are produced especially on Lysine (K), Arginine (R), Threonine (T), and Proline (P). Nevertheless, due to the lack of information about the carbonylated substrate specificity, we were encouraged to develop a systematic method for a comprehensive investigation of protein carbonylation sites. RESULTS After the removal of redundant data from multipe carbonylation-related articles, totally 226 carbonylated proteins in human are regarded as training dataset, which consisted of 307, 126, 128, and 129 carbonylation sites for K, R, T and P residues, respectively. To identify the useful features in predicting carbonylation sites, the linear amino acid sequence was adopted not only to build up the predictive model from training dataset, but also to compare the effectiveness of prediction with other types of features including amino acid composition (AAC), amino acid pair composition (AAPC), position-specific scoring matrix (PSSM), positional weighted matrix (PWM), solvent-accessible surface area (ASA), and physicochemical properties. The investigation of position-specific amino acid composition revealed that the positively charged amino acids (K and R) are remarkably enriched surrounding the carbonylated sites, which may play a functional role in discriminating between carbonylation and non-carbonylation sites. A variety of predictive models were built using various features and three different machine learning methods. Based on the evaluation by five-fold cross-validation, the models trained with PWM feature could provide better sensitivity in the positive training dataset, while the models trained with AAindex feature achieved higher specificity in the negative training dataset. Additionally, the model trained using hybrid features, including PWM, AAC and AAindex, obtained best MCC values of 0.432, 0.472, 0.443 and 0.467 on K, R, T and P residues, respectively. CONCLUSION When comparing to an existing prediction tool, the selected models trained with hybrid features provided a promising accuracy on an independent testing dataset. In short, this work not only characterized the carbonylated substrate preference, but also demonstrated that the proposed method could provide a feasible means for accelerating preliminary discovery of protein carbonylation.
Collapse
Affiliation(s)
- Shun-Long Weng
- Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsin-Chu, 300, Taiwan.,Mackay Medicine, Nursing and Management College, Taipei, 112, Taiwan.,Department of Medicine, Mackay Medical College, New Taipei City, 252, Taiwan
| | - Kai-Yao Huang
- Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsin-Chu, 300, Taiwan.,Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan
| | - Fergie Joanda Kaunang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan
| | - Chien-Hsun Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan.,Tao-Yuan Hospital, Ministry of Health & Welfare, Taoyuan, 320, Taiwan
| | - Hui-Ju Kao
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan
| | - Tzu-Hao Chang
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, 110, Taiwan
| | - Hsin-Yao Wang
- Department of Laboratory Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan, 333, Taiwan
| | - Jang-Jih Lu
- Department of Laboratory Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan, 333, Taiwan. .,Department of Medical Biotechnology and Laboratory Science, Chang Gung University, Taoyuan, 333, Taiwan.
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, 320, Taiwan.
| |
Collapse
|
16
|
Vitrac H, MacLean DM, Karlstaedt A, Taegtmeyer H, Jayaraman V, Bogdanov M, Dowhan W. Dynamic Lipid-dependent Modulation of Protein Topology by Post-translational Phosphorylation. J Biol Chem 2016; 292:1613-1624. [PMID: 27974465 DOI: 10.1074/jbc.m116.765719] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 12/13/2016] [Indexed: 01/01/2023] Open
Abstract
Membrane protein topology and folding are governed by structural principles and topogenic signals that are recognized and decoded by the protein insertion and translocation machineries at the time of initial membrane insertion and folding. We previously demonstrated that the lipid environment is also a determinant of initial protein topology, which is dynamically responsive to post-assembly changes in membrane lipid composition. However, the effect on protein topology of post-assembly phosphorylation of amino acids localized within initially cytoplasmically oriented extramembrane domains has never been investigated. Here, we show in a controlled in vitro system that phosphorylation of a membrane protein can trigger a change in topological arrangement. The rate of change occurred on a scale of seconds, comparable with the rates observed upon changes in the protein lipid environment. The rate and extent of topological rearrangement were dependent on the charges of extramembrane domains and the lipid bilayer surface. Using model membranes mimicking the lipid compositions of eukaryotic organelles, we determined that anionic lipids, cholesterol, sphingomyelin, and membrane fluidity play critical roles in these processes. Our results demonstrate how post-translational modifications may influence membrane protein topology in a lipid-dependent manner, both along the organelle trafficking pathway and at their final destination. The results provide further evidence that membrane protein topology is dynamic, integrating for the first time the effect of changes in lipid composition and regulators of cellular processes. The discovery of a new topology regulatory mechanism opens additional avenues for understanding unexplored structure-function relationships and the development of optimized topology prediction tools.
Collapse
Affiliation(s)
- Heidi Vitrac
- From the Department of Biochemistry and Molecular Biology and Center for Membrane Biology, University of Texas McGovern Medical School, Houston, Texas 77030.
| | - David M MacLean
- From the Department of Biochemistry and Molecular Biology and Center for Membrane Biology, University of Texas McGovern Medical School, Houston, Texas 77030
| | - Anja Karlstaedt
- the Department of Internal Medicine, Division of Cardiology, University of Texas McGovern Medical School, Houston, Texas 77030
| | - Heinrich Taegtmeyer
- the Department of Internal Medicine, Division of Cardiology, University of Texas McGovern Medical School, Houston, Texas 77030
| | - Vasanthi Jayaraman
- From the Department of Biochemistry and Molecular Biology and Center for Membrane Biology, University of Texas McGovern Medical School, Houston, Texas 77030
| | - Mikhail Bogdanov
- From the Department of Biochemistry and Molecular Biology and Center for Membrane Biology, University of Texas McGovern Medical School, Houston, Texas 77030
| | - William Dowhan
- From the Department of Biochemistry and Molecular Biology and Center for Membrane Biology, University of Texas McGovern Medical School, Houston, Texas 77030.
| |
Collapse
|
17
|
Brandes N, Ofer D, Linial M. ASAP: a machine learning framework for local protein properties. Database (Oxford) 2016; 2016:baw133. [PMID: 27694209 PMCID: PMC5045867 DOI: 10.1093/database/baw133] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Revised: 08/08/2016] [Accepted: 08/28/2016] [Indexed: 11/14/2022]
Abstract
Determining residue-level protein properties, such as sites of post-translational modifications (PTMs), is vital to understanding protein function. Experimental methods are costly and time-consuming, while traditional rule-based computational methods fail to annotate sites lacking substantial similarity. Machine Learning (ML) methods are becoming fundamental in annotating unknown proteins and their heterogeneous properties. We present ASAP (Amino-acid Sequence Annotation Prediction), a universal ML framework for predicting residue-level properties. ASAP extracts numerous features from raw sequences, and supports easy integration of external features such as secondary structure, solvent accessibility, intrinsically disorder or PSSM profiles. Features are then used to train ML classifiers. ASAP can create new classifiers within minutes for a variety of tasks, including PTM prediction (e.g. cleavage sites by convertase, phosphoserine modification). We present a detailed case study for ASAP: CleavePred, an ASAP-based model to predict protein precursor cleavage sites, with state-of-the-art results. Protein cleavage is a PTM shared by a wide variety of proteins sharing minimal sequence similarity. Current rule-based methods suffer from high false positive rates, making them suboptimal. The high performance of CleavePred makes it suitable for analyzing new proteomes at a genomic scale. The tool is attractive to protein design, mass spectrometry search engines and the discovery of new bioactive peptides from precursors. ASAP functions as a baseline approach for residue-level protein sequence prediction. CleavePred is freely accessible as a web-based application. Both ASAP and CleavePred are open-source with a flexible Python API.Database URL: ASAP's and CleavePred source code, webtool and tutorials are available at: https://github.com/ddofer/asap; http://protonet.cs.huji.ac.il/cleavepred.
Collapse
Affiliation(s)
- Nadav Brandes
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| | - Dan Ofer
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| | - Michal Linial
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| |
Collapse
|
18
|
Bui VM, Weng SL, Lu CT, Chang TH, Weng JTY, Lee TY. SOHSite: incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites. BMC Genomics 2016; 17 Suppl 1:9. [PMID: 26819243 PMCID: PMC4895302 DOI: 10.1186/s12864-015-2299-1] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Background Protein S-sulfenylation is a type of post-translational modification (PTM) involving the covalent binding of a hydroxyl group to the thiol of a cysteine amino acid. Recent evidence has shown the importance of S-sulfenylation in various biological processes, including transcriptional regulation, apoptosis and cytokine signaling. Determining the specific sites of S-sulfenylation is fundamental to understanding the structures and functions of S-sulfenylated proteins. However, the current lack of reliable tools often limits researchers to use expensive and time-consuming laboratory techniques for the identification of S-sulfenylation sites. Thus, we were motivated to develop a bioinformatics method for investigating S-sulfenylation sites based on amino acid compositions and physicochemical properties. Results In this work, physicochemical properties were utilized not only to identify S-sulfenylation sites from 1,096 experimentally verified S-sulfenylated proteins, but also to compare the effectiveness of prediction with other characteristics such as amino acid composition (AAC), amino acid pair composition (AAPC), solvent-accessible surface area (ASA), amino acid substitution matrix (BLOSUM62), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM). Various prediction models were built using support vector machine (SVM) and evaluated by five-fold cross-validation. The model constructed from hybrid features, including PSSM and physicochemical properties, yielded the best performance with sensitivity, specificity, accuracy and MCC measurements of 0.746, 0.737, 0.738 and 0.337, respectively. The selected model also provided a promising accuracy (0.693) on an independent testing dataset. Additionally, we employed TwoSampleLogo to help discover the difference of amino acid composition among S-sulfenylation, S-glutathionylation and S-nitrosylation sites. Conclusion This work proposed a computational method to explore informative features and functions for protein S-sulfenylation. Evaluation by five-fold cross validation indicated that the selected features were effective in the identification of S-sulfenylation sites. Moreover, the independent testing results demonstrated that the proposed method could provide a feasible means for conducting preliminary analyses of protein S-sulfenylation. We also anticipate that the uncovered differences in amino acid composition may facilitate future studies of the extensive crosstalk among S-sulfenylation, S-glutathionylation and S-nitrosylation. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2299-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Van-Minh Bui
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Shun-Long Weng
- Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsin-Chu, 300, Taiwan. .,Mackay Junior College of Medicine, Nursing and Management, Taipei, 112, Taiwan. .,Department of Medicine, Mackay Medical College, New Taipei City, 252, Taiwan.
| | - Cheng-Tsung Lu
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Tzu-Hao Chang
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, 110, Taiwan.
| | - Julia Tzu-Ya Weng
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, 320, Taiwan.
| |
Collapse
|
19
|
Huang KY, Weng JTY, Lee TY, Weng SL. A new scheme to discover functional associations and regulatory networks of E3 ubiquitin ligases. BMC SYSTEMS BIOLOGY 2016; 10 Suppl 1:3. [PMID: 26818115 PMCID: PMC4895279 DOI: 10.1186/s12918-015-0244-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Background Protein ubiquitination catalyzed by E3 ubiquitin ligases play important modulatory roles in various biological processes. With the emergence of high-throughput mass spectrometry technology, the proteomics research community embraced the development of numerous experimental methods for the determination of ubiquitination sites. The result is an accumulation of ubiquitinome data, coupled with a lack of available resources for investigating the regulatory networks among E3 ligases and ubiquitinated proteins. In this study, by integrating existing ubiquitinome data, experimentally validated E3 ligases and established protein-protein interactions, we have devised a strategy to construct a comprehensive map of protein ubiquitination networks. Results In total, 41,392 experimentally verified ubiquitination sites from 12,786 ubiquitinated proteins of humans have been obtained for this study. Additional 494 E3 ligases along with 1220 functional annotations and 28588 protein domains were manually curated. To characterize the regulatory networks among E3 ligases and ubiquitinated proteins, a well-established network viewer was utilized for the exploration of ubiquitination networks from 40892 protein-protein interactions. The effectiveness of the proposed approach was demonstrated in a case study examining E3 ligases involved in the ubiquitination of tumor suppressor p53. In addition to Mdm2, a known regulator of p53, the investigation also revealed other potential E3 ligases that may participate in the ubiquitination of p53. Conclusion Aside from the ability to facilitate comprehensive investigations of protein ubiquitination networks, by integrating information regarding protein-protein interactions and substrate specificities, the proposed method could discover potential E3 ligases for ubiquitinated proteins. Our strategy presents an efficient means for the preliminary screen of ubiquitination networks and overcomes the challenge as a result of limited knowledge about E3 ligase-regulated ubiquitination. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0244-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kai-Yao Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Julia Tzu-Ya Weng
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Shun-Long Weng
- Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsin-Chu, 300, Taiwan. .,Mackay Junior College of Medicine, Nursing and Management, Taipei, 112, Taiwan. .,Department of Medicine, Mackay Medical College, New Taipei City, 252, Taiwan.
| |
Collapse
|
20
|
Kao HJ, Huang CH, Bretaña NA, Lu CT, Huang KY, Weng SL, Lee TY. A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs. BMC Bioinformatics 2015; 16 Suppl 18:S10. [PMID: 26680539 PMCID: PMC4682369 DOI: 10.1186/1471-2105-16-s18-s10] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Protein O-GlcNAcylation, involving the β-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity should aid understanding how O-GlcNAc contributes to diverse cellular processes. Due to an increasing number of O-GlcNAcylated peptides with site-specific information identified by mass spectrometry (MS)-based proteomics, we were motivated to characterize substrate site motifs of O-GlcNAc transferases. In this investigation, a non-redundant dataset of 410 experimentally verified O-GlcNAcylation sites were manually extracted from dbOGAP, OGlycBase and UniProtKB. After detection of conserved motifs by using maximal dependence decomposition, profile hidden Markov model (profile HMM) was adopted to learn a first-layered model for each identified OGT substrate motif. Support Vector Machine (SVM) was then used to generate a second-layered model learned from the output values of profile HMMs in first layer. The two-layered predictive model was evaluated using a five-fold cross validation which yielded a sensitivity of 85.4%, a specificity of 84.1%, and an accuracy of 84.7%. Additionally, an independent testing set from PhosphoSitePlus, which was really non-homologous to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (84.05%) and outperform other O-GlcNAcylation site prediction tools. A case study indicated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation and has been implemented as a web-based system, OGTSite, which is now freely available at http://csb.cse.yzu.edu.tw/OGTSite/.
Collapse
|
21
|
Huang KY, Su MG, Kao HJ, Hsieh YC, Jhong JH, Cheng KH, Huang HD, Lee TY. dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins. Nucleic Acids Res 2015; 44:D435-46. [PMID: 26578568 PMCID: PMC4702878 DOI: 10.1093/nar/gkv1240] [Citation(s) in RCA: 131] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 11/02/2015] [Indexed: 01/23/2023] Open
Abstract
Owing to the importance of the post-translational modifications (PTMs) of proteins in regulating biological processes, the dbPTM (http://dbPTM.mbc.nctu.edu.tw/) was developed as a comprehensive database of experimentally verified PTMs from several databases with annotations of potential PTMs for all UniProtKB protein entries. For this 10th anniversary of dbPTM, the updated resource provides not only a comprehensive dataset of experimentally verified PTMs, supported by the literature, but also an integrative interface for accessing all available databases and tools that are associated with PTM analysis. As well as collecting experimental PTM data from 14 public databases, this update manually curates over 12 000 modified peptides, including the emerging S-nitrosylation, S-glutathionylation and succinylation, from approximately 500 research articles, which were retrieved by text mining. As the number of available PTM prediction methods increases, this work compiles a non-homologous benchmark dataset to evaluate the predictive power of online PTM prediction tools. An increasing interest in the structural investigation of PTM substrate sites motivated the mapping of all experimental PTM peptides to protein entries of Protein Data Bank (PDB) based on database identifier and sequence identity, which enables users to examine spatially neighboring amino acids, solvent-accessible surface area and side-chain orientations for PTM substrate sites on tertiary structures. Since drug binding in PDB is annotated, this update identified over 1100 PTM sites that are associated with drug binding. The update also integrates metabolic pathways and protein-protein interactions to support the PTM network analysis for a group of proteins. Finally, the web interface is redesigned and enhanced to facilitate access to this resource.
Collapse
Affiliation(s)
- Kai-Yao Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Min-Gang Su
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Hui-Ju Kao
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Yun-Chung Hsieh
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Jhih-Hua Jhong
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Kuang-Hao Cheng
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Hsien-Da Huang
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan 320, Taiwan
| |
Collapse
|
22
|
Nguyen VN, Huang KY, Huang CH, Chang TH, Bretaña N, Lai K, Weng J, Lee TY. Characterization and identification of ubiquitin conjugation sites with E3 ligase recognition specificities. BMC Bioinformatics 2015; 16 Suppl 1:S1. [PMID: 25707307 PMCID: PMC4331700 DOI: 10.1186/1471-2105-16-s1-s1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background In eukaryotes, ubiquitin-conjugation is an important mechanism underlying proteasome-mediated degradation of proteins, and as such, plays an essential role in the regulation of many cellular processes. In the ubiquitin-proteasome pathway, E3 ligases play important roles by recognizing a specific protein substrate and catalyzing the attachment of ubiquitin to a lysine (K) residue. As more and more experimental data on ubiquitin conjugation sites become available, it becomes possible to develop prediction models that can be scaled to big data. However, no development that focuses on the investigation of ubiquitinated substrate specificities has existed. Herein, we present an approach that exploits an iteratively statistical method to identify ubiquitin conjugation sites with substrate site specificities. Results In this investigation, totally 6259 experimentally validated ubiquitinated proteins were obtained from dbPTM. After having filtered out homologous fragments with 40% sequence identity, the training data set contained 2658 ubiquitination sites (positive data) and 5532 non-ubiquitinated sites (negative data). Due to the difficulty in characterizing the substrate site specificities of E3 ligases by conventional sequence logo analysis, a recursively statistical method has been applied to obtain significant conserved motifs. The profile hidden Markov model (profile HMM) was adopted to construct the predictive models learned from the identified substrate motifs. A five-fold cross validation was then used to evaluate the predictive model, achieving sensitivity, specificity, and accuracy of 73.07%, 65.46%, and 67.93%, respectively. Additionally, an independent testing set, completely blind to the training data of the predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (76.13%) and outperform other ubiquitination site prediction tool. Conclusion A case study demonstrated the effectiveness of the characterized substrate motifs for identifying ubiquitination sites. The proposed method presents a practical means of preliminary analysis and greatly diminishes the total number of potential targets required for further experimental confirmation. This method may help unravel their mechanisms and roles in E3 recognition and ubiquitin-mediated protein degradation.
Collapse
|
23
|
Wu HY, Lu CT, Kao HJ, Chen YJ, Chen YJ, Lee TY. Characterization and identification of protein O-GlcNAcylation sites with substrate specificity. BMC Bioinformatics 2014; 15 Suppl 16:S1. [PMID: 25521204 PMCID: PMC4290634 DOI: 10.1186/1471-2105-15-s16-s1] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Background Protein O-GlcNAcylation, involving the attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues. Elucidation of O-GlcNAcylation sites on proteins is required in order to decipher its crucial roles in regulating cellular processes and aid in drug design. With an increasing number of O-GlcNAcylation sites identified by mass spectrometry (MS)-based proteomics, several methods have been proposed for the computational identification of O-GlcNAcylation sites. However, no development that focuses on the investigation of O-GlcNAcylated substrate motifs has existed. Thus, we were motivated to design a new method for the identification of protein O-GlcNAcylation sites with the consideration of substrate site specificity. Results In this study, 375 experimentally verified O-GlcNAcylation sites were collected from dbOGAP, which is an integrated resource for protein O-GlcNAcylation. Due to the difficulty in characterizing the substrate motifs by conventional sequence logo analysis, a recursively statistical method has been applied to obtain significant conserved motifs. To construct the predictive models learned from the identified substrate motifs, we adopted Support Vector Machines (SVMs). A five-fold cross validation was used to evaluate the predictive model, achieving sensitivity, specificity, and accuracy of 0.76, 0.80, and 0.78, respectively. Additionally, an independent testing set, which was really blind to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (0.94) and outperform three other O-GlcNAcylation site prediction tools. Conclusion This work proposed a computational method to identify informative substrate motifs for O-GlcNAcylation sites. The evaluation of cross validation and independent testing indicated that the identified motifs were effective in the identification of O-GlcNAcylation sites. A case study demonstrated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation. We also anticipated that the revealed substrate motif may facilitate the study of extensive crosstalk between O-GlcNAcylation and phosphorylation. This method may help unravel their mechanisms and roles in signaling, transcription, chronic disease, and cancer.
Collapse
|
24
|
Chen YJ, Lu CT, Su MG, Huang KY, Ching WC, Yang HH, Liao YC, Chen YJ, Lee TY. dbSNO 2.0: a resource for exploring structural environment, functional and disease association and regulatory network of protein S-nitrosylation. Nucleic Acids Res 2014; 43:D503-11. [PMID: 25399423 PMCID: PMC4383970 DOI: 10.1093/nar/gku1176] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Given the increasing number of proteins reported to be regulated by S-nitrosylation (SNO), it is considered to act, in a manner analogous to phosphorylation, as a pleiotropic regulator that elicits dual effects to regulate diverse pathophysiological processes by altering protein function, stability, and conformation change in various cancers and human disorders. Due to its importance in regulating protein functions and cell signaling, dbSNO (http://dbSNO.mbc.nctu.edu.tw) is extended as a resource for exploring structural environment of SNO substrate sites and regulatory networks of S-nitrosylated proteins. An increasing interest in the structural environment of PTM substrate sites motivated us to map all manually curated SNO peptides (4165 SNO sites within 2277 proteins) to PDB protein entries by sequence identity, which provides the information of spatial amino acid composition, solvent-accessible surface area, spatially neighboring amino acids, and side chain orientation for 298 substrate cysteine residues. Additionally, the annotations of protein molecular functions, biological processes, functional domains and human diseases are integrated to explore the functional and disease associations for S-nitrosoproteome. In this update, users are allowed to search a group of interested proteins/genes and the system reconstructs the SNO regulatory network based on the information of metabolic pathways and protein-protein interactions. Most importantly, an endogenous yet pathophysiological S-nitrosoproteomic dataset from colorectal cancer patients was adopted to demonstrate that dbSNO could discover potential SNO proteins involving in the regulation of NO signaling for cancer pathways.
Collapse
Affiliation(s)
- Yi-Ju Chen
- Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan
| | - Cheng-Tsung Lu
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Min-Gang Su
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Kai-Yao Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Wei-Chieh Ching
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei 114, Taiwan
| | - Hsiao-Hsiang Yang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Yen-Chen Liao
- Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan Department of Chemistry, National Taiwan University, Taipei 114, Taiwan
| | - Yu-Ju Chen
- Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan Department of Chemistry, National Taiwan University, Taipei 114, Taiwan
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan 320, Taiwan
| |
Collapse
|
25
|
|