1
|
Shazia, Ullah FUM, Rho S, Lee MY. Predictive modeling for ubiquitin proteins through advanced machine learning technique. Heliyon 2024; 10:e32517. [PMID: 38975176 PMCID: PMC11225741 DOI: 10.1016/j.heliyon.2024.e32517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 06/05/2024] [Indexed: 07/09/2024] Open
Abstract
Ubiquitination is an essential post-translational modification mechanism involving the ubiquitin protein's bonding to a substrate protein. It is crucial in a variety of physiological activities including cell survival and differentiation, and innate and adaptive immunity. Any alteration in the ubiquitin system leads to the development of various human diseases. Numerous researches show the highly reversibility and dynamic of ubiquitin system, making the experimental identification quite difficult. To solve this issue, this article develops a model using a machine learning approach, tending to improve the ubiquitin protein prediction precisely. We deeply investigate the ubiquitination data that is proceed through different features extraction methods, followed by the classification. The evaluation and assessment are conducted considering Jackknife tests and 10-fold cross-validation. The proposed method demonstrated the remarkable performance in terms of 100 %, 99.88 %, and 99.84 % accuracy on Dataset-I, Dataset-II, and Dataset-III, respectively. Using Jackknife test, the method achieves 100 %, 99.91 %, and 99.99 % for Dataset-I, Dataset-II and Dataset-III, respectively. This analysis concludes that the proposed method outperformed the state-of-the-arts to identify the ubiquitination sites and helpful in the development of current clinical therapies. The source code and datasets will be made available at Github.
Collapse
Affiliation(s)
- Shazia
- Mardan College of Nursing, Bacha Khan Medical College, Mardan, Pakistan
| | - Fath U Min Ullah
- Deparment of Computing, School of Engineering and Computing, University of Central Lancashire, Preston, United Kingdom
| | - Seungmin Rho
- Department of Industrial Security, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Mi Young Lee
- Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
2
|
Pourmirzaei M, Ramazi S, Esmaili F, Shojaeilangari S, Allahvardi A. Machine learning-based approaches for ubiquitination site prediction in human proteins. BMC Bioinformatics 2023; 24:449. [PMID: 38017391 PMCID: PMC10683244 DOI: 10.1186/s12859-023-05581-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 11/23/2023] [Indexed: 11/30/2023] Open
Abstract
Protein ubiquitination is a critical post-translational modification (PTMs) involved in numerous cellular processes. Identifying ubiquitination sites (Ubi-sites) on proteins offers valuable insights into their function and regulatory mechanisms. Due to the cost- and time-consuming nature of traditional approaches for Ubi-site detection, there has been a growing interest in leveraging artificial intelligence for computer-aided Ubi-site prediction. In this study, we collected experimentally verified Ubi-sites of human proteins from the dbPTM database, then conducted comprehensive state-of-the art computational methods along with standard evaluation metrics and a proper validation strategy for Ubi-site prediction. We presented the effectiveness of our framework by comparing ten machine learning (ML) based approaches in three different categories: feature-based conventional ML methods, end-to-end sequence-based deep learning (DL) techniques, and hybrid feature-based DL models. Our results revealed that DL approaches outperformed the classical ML methods, achieving a 0.902 F1-score, 0.8198 accuracy, 0.8786 precision, and 0.9147 recall as the best performance for a DL model using both raw amino acid sequences and hand-crafted features. Interestingly, our experimental results disclosed that the performance of DL methods had a positive correlation with the length of amino acid fragments, suggesting that utilizing the entire sequence can lead to more accurate predictions in future research endeavors. Additionally, we developed a meticulously curated benchmark for Ubi-site prediction in human proteins. This benchmark serves as a valuable resource for future studies, enabling fair and accurate comparisons between different methods. Overall, our work highlights the potential of ML, particularly DL techniques, in predicting Ubi-sites and furthering our knowledge of protein regulation through ubiquitination in cells.
Collapse
Affiliation(s)
- Mahdi Pourmirzaei
- Department of Information Technology, Tarbiat Modares University, 14115-111, Tehran, Iran
| | - Shahin Ramazi
- Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, 14115-111, Tehran, Iran
| | - Farzaneh Esmaili
- Department of Information Technology, Tarbiat Modares University, 14115-111, Tehran, Iran
| | - Seyedehsamaneh Shojaeilangari
- Biomedical Engineering Group, Department of Electrical and Information Technology, Iranian Research Organization for Science and Technology (IROST), 33535111, Tehran, Iran.
| | - Abdollah Allahvardi
- Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, 14115-111, Tehran, Iran
| |
Collapse
|
3
|
Yerukala Sathipati S, Tsai MJ, Shukla SK, Ho SY. Artificial intelligence-driven pan-cancer analysis reveals miRNA signatures for cancer stage prediction. HGG ADVANCES 2023; 4:100190. [PMID: 37124139 PMCID: PMC10130501 DOI: 10.1016/j.xhgg.2023.100190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 03/30/2023] [Indexed: 05/02/2023] Open
Abstract
The ability to detect cancer at an early stage in patients who would benefit from effective therapy is a key factor in increasing survivability. This work proposes an evolutionary supervised learning method called CancerSig to identify cancer stage-specific microRNA (miRNA) signatures for early cancer predictions. CancerSig established a compact panel of miRNA signatures as potential markers from 4,667 patients with 15 different types of cancers for the cancer stage prediction, and achieved a mean performance: 10-fold cross-validation accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve of 84.27% ± 6.31%, 0.81 ± 0.12, 0.80 ± 0.10, and 0.80 ± 0.06, respectively. The pan-cancer analysis of miRNA signatures suggested that three miRNAs, hsa-let-7i-3p, hsa-miR-362-3p, and hsa-miR-3651, contributed significantly toward stage prediction across 8 cancers, and each of the 67 miRNAs of the panel was a biomarker of stage prediction in more than one cancer. CancerSig may serve as the basis for cancer screening and therapeutic selection..
Collapse
Affiliation(s)
- Srinivasulu Yerukala Sathipati
- Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield, WI 54449, USA
- Corresponding author
| | - Ming-Ju Tsai
- Hinda and Arthur Marcus Institute for Aging Research at Hebrew Senior Life, Boston, MA, USA
- Department of Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, MA, USA
| | - Sanjay K. Shukla
- Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield, WI 54449, USA
| | - Shinn-Ying Ho
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- College of Health Sciences, Kaohsiung Medical University, Kaohsiung, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDSB), National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- Corresponding author
| |
Collapse
|
4
|
Wang J, Dai H, Chen T, Liu H, Zhang X, Zhong Q, Lu R. Toward surface defect detection in electronics manufacturing by an accurate and lightweight YOLO-style object detector. Sci Rep 2023; 13:7062. [PMID: 37127646 PMCID: PMC10151317 DOI: 10.1038/s41598-023-33804-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 04/19/2023] [Indexed: 05/03/2023] Open
Abstract
In electronics manufacturing, surface defect detection is very important for product quality control, and defective products can cause severe customer complaints. At the same time, in the manufacturing process, the cycle time of each product is usually very short. Furthermore, high-resolution input images from high-resolution industrial cameras are necessary to meet the requirements for high quality control standards. Hence, how to design an accurate object detector with real-time inference speed that can accept high-resolution input is an important task. In this work, an accurate YOLO-style object detector was designed, ATT-YOLO, which uses only one self-attention module, many-scale feature extraction and integration in the backbone and feature pyramid, and an improved auto-anchor design to address this problem. There are few datasets for surface detection in electronics manufacturing. Hence, we curated a dataset consisting of 14,478 laptop surface defects, on which ATT-YOLO achieved 92.8% mAP0.5 for the binary-class object detection task. We also further verified our design on the COCO benchmark dataset. Considering both computation costs and the performance of object detectors, ATT-YOLO outperforms several state-of-the-art and lightweight object detectors on the COCO dataset. It achieves a 44.9% mAP score and 21.8 GFLOPs, which is better than the compared models including YOLOv8-small (44.9%, 28.6G), YOLOv7-tiny-SiLU (38.7%, 13.8G), YOLOv6-small (43.1%, 44.2G), pp-YOLOE-small (42.7%, 17.4G), YOLOX-small (39.6%, 26.8G), and YOLOv5-small (36.7%, 17.2G). We hope that this work can serve as a useful reference for the utilization of attention-based networks in real-world situations.
Collapse
Affiliation(s)
- Jyunrong Wang
- Hefei University of Technology, Anhui, Hefei, China
- LCFC (Hefei) Electronics Technology Co., Ltd., Anhui, Hefei, China
- Hefei LCFC Information Technology Co., Ltd., Anhui, Hefei, China
| | - Huafeng Dai
- LCFC (Hefei) Electronics Technology Co., Ltd., Anhui, Hefei, China
- Hefei LCFC Information Technology Co., Ltd., Anhui, Hefei, China
- Tsinghua University, Beijing, China
| | - Taogen Chen
- LCFC (Hefei) Electronics Technology Co., Ltd., Anhui, Hefei, China
- Hefei LCFC Information Technology Co., Ltd., Anhui, Hefei, China
| | - Hao Liu
- LCFC (Hefei) Electronics Technology Co., Ltd., Anhui, Hefei, China
- Hefei LCFC Information Technology Co., Ltd., Anhui, Hefei, China
| | - Xuegang Zhang
- LCFC (Hefei) Electronics Technology Co., Ltd., Anhui, Hefei, China
- Hefei LCFC Information Technology Co., Ltd., Anhui, Hefei, China
| | - Quan Zhong
- LCFC (Hefei) Electronics Technology Co., Ltd., Anhui, Hefei, China
- Hefei LCFC Information Technology Co., Ltd., Anhui, Hefei, China
| | - Rongsheng Lu
- Hefei University of Technology, Anhui, Hefei, China.
| |
Collapse
|
5
|
Wang W, Zhang Y, Liu D, Zhang H, Wang X, Zhou Y. PseAraUbi: predicting arabidopsis ubiquitination sites by incorporating the physico-chemical and structural features. PLANT MOLECULAR BIOLOGY 2022; 110:81-92. [PMID: 35773617 DOI: 10.1007/s11103-022-01288-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Accepted: 05/09/2022] [Indexed: 06/15/2023]
Abstract
We makes three kinds of important features from Arabidopsis thaliana: protein secondary structure based on the Chou-Fasman parameter, amino acids hydrophobicity and polarity information, and analyze their properties. Ubiquitination modification is an important post-translational modification of proteins, which participates in the regulation of many important life activities in cells. At present, ubiquitination proteomics research is mostly concentrated in animals and yeasts, while relatively few studies have been carried out in plants. It can be said that the calculation and prediction of Arabidopsis thaliana ubiquitination sites is still in its infancy. Based on this, we describe a calculation method, PseAraUbi (Prediction of Arabidopsis thaliana ubiquitination sites using pseudo amino acid composition), that can effectively detect ubiquitination sites on Arabidopsis thaliana using support vector machine learning classifiers. Based on protein sequence information, extract features from the Chou-Fasman parameter, amino acids hydrophobicity features, polarity information and selected for classification with the Boruta algorithm. PseAraUbi achieves promising performances with an AUC score of 0.953 with fivefold cross-validation on the training dataset, which are significantly better than that of the pioneer Arabidopsis thaliana ubiquitination sites method. We also proved the ability of our proposed method on independent test sets, thus gaining a competitive advantage. In addition, we also in-depth analyzed the physicochemical properties of amino acids in the region adjacent to the ubiquitination site. To facilitate the community, the source code, optimal feature subset, ubiquitination sites dataset in the Arbidopsis proteome are available at GitHub ( https://github.com/HNUBioinformatics/PseAraUbi.git ) for interest users.
Collapse
Affiliation(s)
- Wei Wang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453000, China.
- Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, Xinxiang, China.
| | - Yu Zhang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453000, China
| | - Dong Liu
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453000, China
| | - HongJun Zhang
- School of Computer Science and Technology, Anyang University, Anyang, 455000, China
| | - XianFang Wang
- College of Computer Science and Technology Engineering, Henan Institute of Technology, Xinxiang, 453000, China
| | - Yun Zhou
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453000, China.
| |
Collapse
|
6
|
Iannetta AA, Hicks LM. Maximizing Depth of PTM Coverage: Generating Robust MS Datasets for Computational Prediction Modeling. Methods Mol Biol 2022; 2499:1-41. [PMID: 35696073 DOI: 10.1007/978-1-0716-2317-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Post-translational modifications (PTMs) regulate complex biological processes through the modulation of protein activity, stability, and localization. Insights into the specific modification type and localization within a protein sequence can help ascertain functional significance. Computational models are increasingly demonstrated to offer a low-cost, high-throughput method for comprehensive PTM predictions. Algorithms are optimized using existing experimental PTM data, thus accurate prediction performance relies on the creation of robust datasets. Herein, advancements in mass spectrometry-based proteomics technologies to maximize PTM coverage are reviewed. Further, requisite experimental validation approaches for PTM predictions are explored to ensure that follow-up mechanistic studies are focused on accurate modification sites.
Collapse
Affiliation(s)
- Anthony A Iannetta
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Leslie M Hicks
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
7
|
Luo Y, Jiang J, Zhu J, Huang Q, Li W, Wang Y, Gao Y. A Caps-Ubi Model for Protein Ubiquitination Site Prediction. FRONTIERS IN PLANT SCIENCE 2022; 13:884903. [PMID: 35693166 PMCID: PMC9175003 DOI: 10.3389/fpls.2022.884903] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Accepted: 04/26/2022] [Indexed: 05/12/2023]
Abstract
Ubiquitination, a widespread mechanism of regulating cellular responses in plants, is one of the most important post-translational modifications of proteins in many biological processes and is involved in the regulation of plant disease resistance responses. Predicting ubiquitination is an important technical method for plant protection. Traditional ubiquitination site determination methods are costly and time-consuming, while computational-based prediction methods can accurately and efficiently predict ubiquitination sites. At present, capsule networks and deep learning are used alone for prediction, and the effect is not obvious. The capsule network reflects the spatial position relationship of the internal features of the neural network, but it cannot identify long-distance dependencies or focus on amino acids in protein sequences or their degree of importance. In this study, we investigated the use of convolutional neural networks and capsule networks in deep learning to design a novel model "Caps-Ubi," first using the one-hot and amino acid continuous type hybrid encoding method to characterize ubiquitination sites. The sequence patterns, the dependencies between the encoded protein sequences and the important amino acids in the captured sequences, were then focused on the importance of amino acids in the sequences through the proposed Caps-Ubi model and used for multispecies ubiquitination site prediction. Through relevant experiments, the proposed Caps-Ubi method is superior to other similar methods in predicting ubiquitination sites.
Collapse
Affiliation(s)
- Yin Luo
- School of Life Sciences, East China Normal University, Shanghai, China
| | - Jiulei Jiang
- School of Computer Science and Engineering, Changshu Institute of Technology, Suzhou, China
- *Correspondence: Jiulei Jiang,
| | - Jiajie Zhu
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Qiyi Huang
- School of Life Sciences, East China Normal University, Shanghai, China
- School of Computer Science and Engineering, North Minzu University, Yinchuan, China
| | - Weimin Li
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
- Weimin Li,
| | - Ying Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Suzhou, China
| | - Yamin Gao
- School of Computer Science and Engineering, Changshu Institute of Technology, Suzhou, China
| |
Collapse
|
8
|
Isolated cancer stem cells from human liver cancer: morphological and functional characteristics in primary culture. Clin Transl Oncol 2021; 24:48-56. [PMID: 34169442 DOI: 10.1007/s12094-021-02667-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Accepted: 06/08/2021] [Indexed: 12/24/2022]
Abstract
BACKGROUND Primary liver cancer cells (PLCs) could more directly simulate the human tumor microenvironment. Compared with liver cancer cell lines, PLCs could reflect the human situation. As in previous studies, tumor stem cells were a small number of cancer cells in the microenvironment and considered to be one of the origins of liver cancer. This study aimed to screen stem cells in PLCs, analyze their biological characteristics, propose the possibility that liver cancer originated from stem cells. METHODS Liver cancer tissues of 17 patients were taken from the Affiliated Hospital of Guangdong Medical College, and PLCs were isolated by tissue slice method. The proliferation, tumor formation in nude mice, stem protein expression of PLCs were observed. C-kit+ liver cancer cells were screened and their biological characteristics were analyzed. RESULTS PLCs could be stably passaged. Transmission electron microscopy indicated that the nucleus was irregular, there were many mitochondria, and the endoplasmic reticulum was irregularly distributed. PLCs could express E-Cadherin, Oct-4, β-Catenin, Sox2, CD326, C-kit, GPC3, Nanog. The proliferation curve of PLCs and Hep3B cells were similar, and they all could form tumors in nude mice. Flow-sorted C-kit+ PLCs, as well as C-kit+ Hep3B cells could highly express Bmi1, Sox2, Oct4, Notch1, Nanog, C-kit, β-Catenin, Smo, Nestin, ABCG2, ABCB1. And they also could clone and form tumors in vivo. But C-kit+ PLCs were more sensitive to chemotherapy drugs than C-kit+ liver cancer cell lines. CONCLUSION C-kit+ PLCs had the characteristics of tumor stem cells and were more sensitive to chemotherapy drugs.
Collapse
|
9
|
Siraj A, Lim DY, Tayara H, Chong KT. UbiComb: A Hybrid Deep Learning Model for Predicting Plant-Specific Protein Ubiquitylation Sites. Genes (Basel) 2021; 12:genes12050717. [PMID: 34064731 PMCID: PMC8151217 DOI: 10.3390/genes12050717] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 05/06/2021] [Accepted: 05/07/2021] [Indexed: 12/11/2022] Open
Abstract
Protein ubiquitylation is an essential post-translational modification process that performs a critical role in a wide range of biological functions, even a degenerative role in certain diseases, and is consequently used as a promising target for the treatment of various diseases. Owing to the significant role of protein ubiquitylation, these sites can be identified by enzymatic approaches, mass spectrometry analysis, and combinations of multidimensional liquid chromatography and tandem mass spectrometry. However, these large-scale experimental screening techniques are time consuming, expensive, and laborious. To overcome the drawbacks of experimental methods, machine learning and deep learning-based predictors were considered for prediction in a timely and cost-effective manner. In the literature, several computational predictors have been published across species; however, predictors are species-specific because of the unclear patterns in different species. In this study, we proposed a novel approach for predicting plant ubiquitylation sites using a hybrid deep learning model by utilizing convolutional neural network and long short-term memory. The proposed method uses the actual protein sequence and physicochemical properties as inputs to the model and provides more robust predictions. The proposed predictor achieved the best result with accuracy values of 80% and 81% and F-scores of 79% and 82% on the 10-fold cross-validation and an independent dataset, respectively. Moreover, we also compared the testing of the independent dataset with popular ubiquitylation predictors; the results demonstrate that our model significantly outperforms the other methods in prediction classification results.
Collapse
Affiliation(s)
- Arslan Siraj
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Korea; (A.S.); (D.Y.L.)
| | - Dae Yeong Lim
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Korea; (A.S.); (D.Y.L.)
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Korea
- Correspondence: (H.T.); (K.T.C.)
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Korea; (A.S.); (D.Y.L.)
- Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Korea
- Correspondence: (H.T.); (K.T.C.)
| |
Collapse
|
10
|
Patwardhan A, Cheng N, Trejo J. Post-Translational Modifications of G Protein-Coupled Receptors Control Cellular Signaling Dynamics in Space and Time. Pharmacol Rev 2021; 73:120-151. [PMID: 33268549 PMCID: PMC7736832 DOI: 10.1124/pharmrev.120.000082] [Citation(s) in RCA: 75] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
G protein-coupled receptors (GPCRs) are a large family comprising >800 signaling receptors that regulate numerous cellular and physiologic responses. GPCRs have been implicated in numerous diseases and represent the largest class of drug targets. Although advances in GPCR structure and pharmacology have improved drug discovery, the regulation of GPCR function by diverse post-translational modifications (PTMs) has received minimal attention. Over 200 PTMs are known to exist in mammalian cells, yet only a few have been reported for GPCRs. Early studies revealed phosphorylation as a major regulator of GPCR signaling, whereas later reports implicated a function for ubiquitination, glycosylation, and palmitoylation in GPCR biology. Although our knowledge of GPCR phosphorylation is extensive, our knowledge of the modifying enzymes, regulation, and function of other GPCR PTMs is limited. In this review we provide a comprehensive overview of GPCR post-translational modifications with a greater focus on new discoveries. We discuss the subcellular location and regulatory mechanisms that control post-translational modifications of GPCRs. The functional implications of newly discovered GPCR PTMs on receptor folding, biosynthesis, endocytic trafficking, dimerization, compartmentalized signaling, and biased signaling are also provided. Methods to detect and study GPCR PTMs as well as PTM crosstalk are further highlighted. Finally, we conclude with a discussion of the implications of GPCR PTMs in human disease and their importance for drug discovery. SIGNIFICANCE STATEMENT: Post-translational modification of G protein-coupled receptors (GPCRs) controls all aspects of receptor function; however, the detection and study of diverse types of GPCR modifications are limited. A thorough understanding of the role and mechanisms by which diverse post-translational modifications regulate GPCR signaling and trafficking is essential for understanding dysregulated mechanisms in disease and for improving and refining drug development for GPCRs.
Collapse
Affiliation(s)
- Anand Patwardhan
- Department of Pharmacology and the Biomedical Sciences Graduate Program, School of Medicine, University of California, San Diego, La Jolla, California
| | - Norton Cheng
- Department of Pharmacology and the Biomedical Sciences Graduate Program, School of Medicine, University of California, San Diego, La Jolla, California
| | - JoAnn Trejo
- Department of Pharmacology and the Biomedical Sciences Graduate Program, School of Medicine, University of California, San Diego, La Jolla, California
| |
Collapse
|
11
|
Wang H, Wang Z, Li Z, Lee TY. Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites. Front Cell Dev Biol 2020; 8:572195. [PMID: 33102477 PMCID: PMC7554246 DOI: 10.3389/fcell.2020.572195] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Accepted: 08/24/2020] [Indexed: 12/17/2022] Open
Abstract
Protein ubiquitylation is an important posttranslational modification (PTM), which is involved in diverse biological processes and plays an essential role in the regulation of physiological mechanisms and diseases. The Protein Lysine Modifications Database (PLMD) has accumulated abundant ubiquitylated proteins with their substrate sites for more than 20 kinds of species. Numerous works have consequently developed a variety of ubiquitylation site prediction tools across all species, mainly relying on the predefined sequence features and machine learning algorithms. However, the difference in ubiquitylated patterns between these species stays unclear. In this work, the sequence-based characterization of ubiquitylated substrate sites has revealed remarkable differences among plants, animals, and fungi. Then an improved word-embedding scheme based on the transfer learning strategy was incorporated with the multilayer convolutional neural network (CNN) for identifying protein ubiquitylation sites. For the prediction of plant ubiquitylation sites, the proposed deep learning scheme could outperform the machine learning-based methods, with the accuracy of 75.6%, precision of 73.3%, recall of 76.7%, F-score of 0.7493, and 0.82 AUC on the independent testing set. Although the ubiquitylated specificity of substrate sites is complicated, this work has demonstrated that the application of the word-embedding method can enable the extraction of informative features and help the identification of ubiquitylated sites. To accelerate the investigation of protein ubiquitylation, the data sets and source code used in this study are freely available at https://github.com/wang-hong-fei/DL-plant-ubsites-prediction.
Collapse
Affiliation(s)
- Hongfei Wang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China
| | - Zhuo Wang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China.,School of Life Sciences, University of Science and Technology of China, Hefei, China
| | - Zhongyan Li
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China.,School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, China
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China.,School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, China
| |
Collapse
|
12
|
Luo F, Wang M, Liu Y, Zhao XM, Li A. DeepPhos: prediction of protein phosphorylation sites with deep learning. Bioinformatics 2020; 35:2766-2773. [PMID: 30601936 PMCID: PMC6691328 DOI: 10.1093/bioinformatics/bty1051] [Citation(s) in RCA: 105] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 11/19/2018] [Accepted: 12/12/2018] [Indexed: 11/28/2022] Open
Abstract
Motivation Phosphorylation is the most studied post-translational modification, which is crucial for multiple biological processes. Recently, many efforts have been taken to develop computational predictors for phosphorylation site prediction, but most of them are based on feature selection and discriminative classification. Thus, it is useful to develop a novel and highly accurate predictor that can unveil intricate patterns automatically for protein phosphorylation sites. Results In this study we present DeepPhos, a novel deep learning architecture for prediction of protein phosphorylation. Unlike multi-layer convolutional neural networks, DeepPhos consists of densely connected convolutional neuron network blocks which can capture multiple representations of sequences to make final phosphorylation prediction by intra block concatenation layers and inter block concatenation layers. DeepPhos can also be used for kinase-specific prediction varying from group, family, subfamily and individual kinase level. The experimental results demonstrated that DeepPhos outperforms competitive predictors in general and kinase-specific phosphorylation site prediction. Availability and implementation The source code of DeepPhos is publicly deposited at https://github.com/USTCHIlab/DeepPhos. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fenglin Luo
- School of Information Science and Technology
| | - Minghui Wang
- School of Information Science and Technology.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH, China
| | - Yu Liu
- School of Information Science and Technology
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Ao Li
- School of Information Science and Technology.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH, China
| |
Collapse
|
13
|
Chen Z, Liu X, Li F, Li C, Marquez-Lago T, Leier A, Akutsu T, Webb GI, Xu D, Smith AI, Li L, Chou KC, Song J. Large-scale comparative assessment of computational predictors for lysine post-translational modification sites. Brief Bioinform 2019; 20:2267-2290. [PMID: 30285084 PMCID: PMC6954452 DOI: 10.1093/bib/bby089] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 08/17/2018] [Accepted: 08/18/2018] [Indexed: 12/22/2022] Open
Abstract
Lysine post-translational modifications (PTMs) play a crucial role in regulating diverse functions and biological processes of proteins. However, because of the large volumes of sequencing data generated from genome-sequencing projects, systematic identification of different types of lysine PTM substrates and PTM sites in the entire proteome remains a major challenge. In recent years, a number of computational methods for lysine PTM identification have been developed. These methods show high diversity in their core algorithms, features extracted and feature selection techniques and evaluation strategies. There is therefore an urgent need to revisit these methods and summarize their methodologies, to improve and further develop computational techniques to identify and characterize lysine PTMs from the large amounts of sequence data. With this goal in mind, we first provide a comprehensive survey on a large collection of 49 state-of-the-art approaches for lysine PTM prediction. We cover a variety of important aspects that are crucial for the development of successful predictors, including operating algorithms, sequence and structural features, feature selection, model performance evaluation and software utility. We further provide our thoughts on potential strategies to improve the model performance. Second, in order to examine the feasibility of using deep learning for lysine PTM prediction, we propose a novel computational framework, termed MUscADEL (Multiple Scalable Accurate Deep Learner for lysine PTMs), using deep, bidirectional, long short-term memory recurrent neural networks for accurate and systematic mapping of eight major types of lysine PTMs in the human and mouse proteomes. Extensive benchmarking tests show that MUscADEL outperforms current methods for lysine PTM characterization, demonstrating the potential and power of deep learning techniques in protein PTM prediction. The web server of MUscADEL, together with all the data sets assembled in this study, is freely available at http://muscadel.erc.monash.edu/. We anticipate this comprehensive review and the application of deep learning will provide practical guide and useful insights into PTM prediction and inspire future bioinformatics studies in the related fields.
Collapse
Affiliation(s)
- Zhen Chen
- School of Basic Medical Science, Qingdao University, Dengzhou Road, Qingdao, Shandong, China
| | - Xuhan Liu
- Medicinal Chemistry, Leiden Academic Centre for Drug Research,Einsteinweg, Leiden, The Netherlands
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- Institute of Molecular Systems Biology, ETH Zürich,Auguste-Piccard-Hof, Zürich, Switzerland
| | - Tatiana Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research,Kyoto University, Uji, Kyoto, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | - Dakang Xu
- Faculty of Medical Laboratory Science, Ruijin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Department of Molecular and Translational Science, Faculty of Medicine, Hudson Institute of Medical Research, Monash University, Melbourne, VIC, Australia
| | - Alexander Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Lei Li
- School of Basic Medical Science, Qingdao University, Dengzhou Road, Qingdao, Shandong, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
14
|
Dai Y, Holland PWH. The Interaction of Natural Selection and GC Skew May Drive the Fast Evolution of a Sand Rat Homeobox Gene. Mol Biol Evol 2019; 36:1473-1480. [PMID: 30968125 PMCID: PMC6573468 DOI: 10.1093/molbev/msz080] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Several processes can lead to strong GC skew in localized genomic regions. In most cases, GC skew should not affect conserved amino acids because natural selection will purge deleterious alleles. However, in the gerbil subfamily of rodents, several conserved genes have undergone radical alteration in association with strong GC skew. An extreme example concerns the highly conserved homeobox gene Pdx1, which is uniquely divergent and GC rich in the sand rat Psammomys obesus and close relatives. Here, we investigate the antagonistic interplay between very rare amino acid changes driven by GC skew and the force of natural selection. Using ectopic protein expression in cell culture, pulse-chase labeling, in vitro mutagenesis, and drug treatment, we compare properties of mouse and sand rat Pdx1 proteins. We find that amino acid change driven by GC skew resulted in altered protein stability, with a significantly longer protein half-life for sand rat Pdx1. Using a reversible inhibitor of the 26S proteasome, MG132, we find that sand rat and mouse Pdx1 are both degraded through the ubiquitin proteasome pathway. However, in vitro mutagenesis reveals this pathway operates through different amino acid residues. We propose that GC skew caused loss of a key ubiquitination site, conserved through vertebrate evolution, and that sand rat Pdx1 evolved or fixed a new ubiquitination site to compensate. Our results give molecular insight into the power of natural selection in the face of maladaptive changes driven by strong GC skew.
Collapse
Affiliation(s)
- Yichen Dai
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | | |
Collapse
|
15
|
Fu H, Yang Y, Wang X, Wang H, Xu Y. DeepUbi: a deep learning framework for prediction of ubiquitination sites in proteins. BMC Bioinformatics 2019; 20:86. [PMID: 30777029 PMCID: PMC6379983 DOI: 10.1186/s12859-019-2677-9] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 02/12/2019] [Indexed: 01/22/2023] Open
Abstract
Background Protein ubiquitination occurs when the ubiquitin protein binds to a target protein residue of lysine (K), and it is an important regulator of many cellular functions, such as signal transduction, cell division, and immune reactions, in eukaryotes. Experimental and clinical studies have shown that ubiquitination plays a key role in several human diseases, and recent advances in proteomic technology have spurred interest in identifying ubiquitination sites. However, most current computing tools for predicting target sites are based on small-scale data and shallow machine learning algorithms. Results As more experimentally validated ubiquitination sites emerge, we need to design a predictor that can identify lysine ubiquitination sites in large-scale proteome data. In this work, we propose a deep learning predictor, DeepUbi, based on convolutional neural networks. Four different features are adopted from the sequences and physicochemical properties. In a 10-fold cross validation, DeepUbi obtains an AUC (area under the Receiver Operating Characteristic curve) of 0.9, and the accuracy, sensitivity and specificity exceeded 85%. The more comprehensive indicator, MCC, reaches 0.78. We also develop a software package that can be freely downloaded from https://github.com/Sunmile/DeepUbi. Conclusion Our results show that DeepUbi has excellent performance in predicting ubiquitination based on large data. Electronic supplementary material The online version of this article (10.1186/s12859-019-2677-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hongli Fu
- Department of Information and Computing Science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Yingxi Yang
- Department of Information and Computing Science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Xiaobo Wang
- Department of Information and Computing Science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Hui Wang
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
| | - Yan Xu
- Department of Information and Computing Science, University of Science and Technology Beijing, Beijing, 100083, China. .,Beijing Key Laboratory for Magneto-photoelectrical Composite and Interface Science, University of Science and Technology Beijing, Beijing, 100083, China.
| |
Collapse
|
16
|
He F, Wang R, Li J, Bao L, Xu D, Zhao X. Large-scale prediction of protein ubiquitination sites using a multimodal deep architecture. BMC SYSTEMS BIOLOGY 2018; 12:109. [PMID: 30463553 PMCID: PMC6249717 DOI: 10.1186/s12918-018-0628-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
BACKGROUND Ubiquitination, which is also called "lysine ubiquitination", occurs when an ubiquitin is attached to lysine (K) residues in targeting proteins. As one of the most important post translational modifications (PTMs), it plays the significant role not only in protein degradation, but also in other cellular functions. Thus, systematic anatomy of the ubiquitination proteome is an appealing and challenging research topic. The existing methods for identifying protein ubiquitination sites can be divided into two kinds: mass spectrometry and computational methods. Mass spectrometry-based experimental methods can discover ubiquitination sites from eukaryotes, but are time-consuming and expensive. Therefore, it is priority to develop computational approaches that can effectively and accurately identify protein ubiquitination sites. RESULTS The existing computational methods usually require feature engineering, which may lead to redundancy and biased representations. While deep learning is able to excavate underlying characteristics from large-scale training data via multiple-layer networks and non-linear mapping operations. In this paper, we proposed a deep architecture within multiple modalities to identify the ubiquitination sites. First, according to prior knowledge and biological knowledge, we encoded protein sequence fragments around candidate ubiquitination sites into three modalities, namely raw protein sequence fragments, physico-chemical properties and sequence profiles, and designed different deep network layers to extract the hidden representations from them. Then, the generative deep representations corresponding to three modalities were merged to build the final model. We performed our algorithm on the available largest scale protein ubiquitination sites database PLMD, and achieved 66.4% specificity, 66.7% sensitivity, 66.43% accuracy, and 0.221 MCC value. A number of comparative experiments also indicated that our multimodal deep architecture outperformed several popular protein ubiquitination site prediction tools. CONCLUSION The results of comparative experiments validated the effectiveness of our deep network and also displayed that our method outperformed several popular protein ubiquitination site prediction tools. The source codes of our proposed method are available at https://github.com/jiagenlee/deepUbiquitylation .
Collapse
Affiliation(s)
- Fei He
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China.,Institution of Computational Biology, Northeast Normal University, Changchun, 130117, China
| | - Rui Wang
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Jiagen Li
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Lingling Bao
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Dong Xu
- Department of Electrical Engineering and Computer Science Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
| | - Xiaowei Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China. .,Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, People's Republic of China.
| |
Collapse
|
17
|
Identifying a miRNA signature for predicting the stage of breast cancer. Sci Rep 2018; 8:16138. [PMID: 30382159 PMCID: PMC6208346 DOI: 10.1038/s41598-018-34604-3] [Citation(s) in RCA: 79] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Accepted: 10/12/2018] [Indexed: 12/13/2022] Open
Abstract
Breast cancer is a heterogeneous disease and one of the most common cancers among women. Recently, microRNAs (miRNAs) have been used as biomarkers due to their effective role in cancer diagnosis. This study proposes a support vector machine (SVM)-based classifier SVM-BRC to categorize patients with breast cancer into early and advanced stages. SVM-BRC uses an optimal feature selection method, inheritable bi-objective combinatorial genetic algorithm, to identify a miRNA signature which is a small set of informative miRNAs while maximizing prediction accuracy. MiRNA expression profiles of a 386-patient cohort of breast cancer were retrieved from The Cancer Genome Atlas. SVM-BRC identified 34 of 503 miRNAs as a signature and achieved a 10-fold cross-validation mean accuracy, sensitivity, specificity, and Matthews correlation coefficient of 80.38%, 0.79, 0.81, and 0.60, respectively. Functional enrichment of the 10 highest ranked miRNAs was analysed in terms of Kyoto Encyclopedia of Genes and Genomes and Gene Ontology annotations. Kaplan-Meier survival analysis of the highest ranked miRNAs revealed that four miRNAs, hsa-miR-503, hsa-miR-1307, hsa-miR-212 and hsa-miR-592, were significantly associated with the prognosis of patients with breast cancer.
Collapse
|
18
|
Liu Y, Wang M, Xi J, Luo F, Li A. PTM-ssMP: A Web Server for Predicting Different Types of Post-translational Modification Sites Using Novel Site-specific Modification Profile. Int J Biol Sci 2018; 14:946-956. [PMID: 29989096 PMCID: PMC6036757 DOI: 10.7150/ijbs.24121] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Accepted: 01/24/2018] [Indexed: 12/26/2022] Open
Abstract
Protein post-translational modifications (PTMs) are chemical modifications of a protein after its translation. Owing to its play an important role in deep understanding of various biological processes and the development of effective drugs, PTM site prediction have become a hot topic in bioinformatics. Recently, many online tools are developed to prediction various types of PTM sites, most of which are based on local sequence and some biological information. However, few of existing tools consider the relations between different PTMs for their prediction task. Here, we develop a web server called PTM-ssMP to predict PTM site, which adopts site-specific modification profile (ssMP) to efficiently extract and encode the information of both proximal PTMs and local sequence simultaneously. In PTM-ssMP we provide efficient prediction of multiple types of PTM site including phosphorylation, lysine acetylation, ubiquitination, sumoylation, methylation, O-GalNAc, O-GlcNAc, sulfation and proteolytic cleavage. To assess the performance of PTM-ssMP, a large number of experimentally verified PTM sites are collected from several sources and used to train and test the prediction models. Our results suggest that ssMP consistently contributes to remarkable improvement of prediction performance. In addition, results of independent tests demonstrate that PTM-ssMP compares favorably with other existing tools for different PTM types. PTM-ssMP is implemented as an online web server with user-friendly interface, which is freely available at http://bioinformatics.ustc.edu.cn/PTM-ssMP/index/.
Collapse
Affiliation(s)
- Yu Liu
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH230027, China
| | - Jianing Xi
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China
| | - Fenglin Luo
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH230027, China
| |
Collapse
|
19
|
Tsai MJ, Wang JR, Yang CD, Kao KC, Huang WL, Huang HY, Tseng CP, Huang HD, Ho SY. PredCRP: predicting and analysing the regulatory roles of CRP from its binding sites in Escherichia coli. Sci Rep 2018; 8:951. [PMID: 29343727 PMCID: PMC5772556 DOI: 10.1038/s41598-017-18648-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2017] [Accepted: 12/13/2017] [Indexed: 02/04/2023] Open
Abstract
Cyclic AMP receptor protein (CRP), a global regulator in Escherichia coli, regulates more than 180 genes via two roles: activation and repression. Few methods are available for predicting the regulatory roles from the binding sites of transcription factors. This work proposes an accurate method PredCRP to derive an optimised model (named PredCRP-model) and a set of four interpretable rules (named PredCRP-ruleset) for predicting and analysing the regulatory roles of CRP from sequences of CRP-binding sites. A dataset consisting of 169 CRP-binding sites with regulatory roles strongly supported by evidence was compiled. The PredCRP-model, using 12 informative features of CRP-binding sites, and cooperating with a support vector machine achieved a training and test accuracy of 0.98 and 0.93, respectively. PredCRP-ruleset has two activation rules and two repression rules derived using the 12 features and the decision tree method C4.5. This work further screened and identified 23 previously unobserved regulatory interactions in Escherichia coli. Using quantitative PCR for validation, PredCRP-model and PredCRP-ruleset achieved a test accuracy of 0.96 (=22/23) and 0.91 (=21/23), respectively. The proposed method is suitable for designing predictors for regulatory roles of all global regulators in Escherichia coli. PredCRP can be accessed at https://github.com/NctuICLab/PredCRP.
Collapse
Affiliation(s)
- Ming-Ju Tsai
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Jyun-Rong Wang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Chi-Dung Yang
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan.,Institute of Population Health Sciences, National Health Research Institutes, Miaoli, Taiwan
| | - Kuo-Ching Kao
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Wen-Lin Huang
- Department and Institute of Industrial Engineering and Management, Minghsin University of Science and Technology, Hsinchu, Taiwan
| | - Hsi-Yuan Huang
- Department of Laboratory Medicine, China Medical University Hospital, Taichung, Taiwan
| | - Ching-Ping Tseng
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
| | - Hsien-Da Huang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan.,Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
| | - Shinn-Ying Ho
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan. .,Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan.
| |
Collapse
|
20
|
Li GXH, Vogel C, Choi H. PTMscape: an open source tool to predict generic post-translational modifications and map modification crosstalk in protein domains and biological processes. Mol Omics 2018; 14:197-209. [PMID: 29876573 PMCID: PMC6115748 DOI: 10.1039/c8mo00027a] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
PTMscape predicts PTM sites using descriptors of sequence and physico-chemical microenvironment, and tests enrichment of single or pairs of PTMs in protein domains.
While tandem mass spectrometry can detect post-translational modifications (PTM) at the proteome scale, reported PTM sites are often incomplete and include false positives. Computational approaches can complement these datasets by additional predictions, but most available tools use prediction models pre-trained for single PTM type by the developers and it remains a difficult task to perform large-scale batch prediction for multiple PTMs with flexible user control, including the choice of training data. We developed an R package called PTMscape which predicts PTM sites across the proteome based on a unified and comprehensive set of descriptors of the physico-chemical microenvironment of modified sites, with additional downstream analysis modules to test enrichment of individual or pairs of PTMs in protein domains. PTMscape is flexible in the ability to process any major modifications, such as phosphorylation and ubiquitination, while achieving the sensitivity and specificity comparable to single-PTM methods and outperforming other multi-PTM tools. Applying this framework, we expanded proteome-wide coverage of five major PTMs affecting different residues by prediction, especially for lysine and arginine modifications. Using a combination of experimentally acquired sites (PSP) and newly predicted sites, we discovered that the crosstalk among multiple PTMs occur more frequently than by random chance in key protein domains such as histone, protein kinase, and RNA recognition motifs, spanning various biological processes such as RNA processing, DNA damage response, signal transduction, and regulation of cell cycle. These results provide a proteome-scale analysis of crosstalk among major PTMs and can be easily extended to other types of PTM.
Collapse
Affiliation(s)
- Ginny X H Li
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore.
| | | | | |
Collapse
|