1
|
Wang M, Jia J, Xu F, Zhou H, Liu Y, Yu B. Res-GCN: Identification of protein phosphorylation sites using graph convolutional network and residual network. Comput Biol Chem 2024; 112:108183. [PMID: 39208554 DOI: 10.1016/j.compbiolchem.2024.108183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 08/02/2024] [Accepted: 08/22/2024] [Indexed: 09/04/2024]
Abstract
An essential post-translational modification, phosphorylation is intimately related with a wide range of biological activities. The advancement of effective computational methods for correctly recognizing phosphorylation sites is important for in-depth understanding of various physiological phenomena. However, the traditional method of identifying phosphorylation sites experimentally is time-consuming and laborious, which makes it difficult to meet the processing demands of today's big data. This research proposes the use of a novel model, Res-GCN, to recognize the phosphorylation sites of SARS-CoV-2. Firstly, eight feature extraction strategies are utilized to digitize the protein sequence from multiple viewpoints, including amino acid property encodings (AAindex), pseudo-amino acid composition (PseAAC), adapted normal distribution bi-profile Bayes (ANBPB), dipeptide composition (DC), binary encoding (BE), enhanced amino acid composition (EAAC), Word2Vec, and BLOSUM62 matrices. Secondly, elastic net is utilized to eliminate redundant data in the fused matrix. Finally, a combination of graph convolutional network (GCN) and residual network (ResNet) is used to classify the phosphorylated sites and output predictions using a fully connected layer (FC). The performance of Res-GCN is tested by 5-fold cross-validation and independent testing, and excellent results are obtained on S/T and Y datasets. This demonstrates that the Res-GCN model exhibits exceptional predictive performance and generalizability.
Collapse
Affiliation(s)
- Minghui Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Jihua Jia
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China; School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Fei Xu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Hongyan Zhou
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Yushuang Liu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China.
| | - Bin Yu
- School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China; School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei 230026, China.
| |
Collapse
|
2
|
Hu F, Li W, Li Y, Hou C, Ma J, Jia C. O-GlcNAcPRED-DL: Prediction of Protein O-GlcNAcylation Sites Based on an Ensemble Model of Deep Learning. J Proteome Res 2024; 23:95-106. [PMID: 38054441 DOI: 10.1021/acs.jproteome.3c00458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
O-linked β-N-acetylglucosamine (O-GlcNAc) is a post-translational modification (i.e., O-GlcNAcylation) on serine/threonine residues of proteins, regulating a plethora of physiological and pathological events. As a dynamic process, O-GlcNAc functions in a site-specific manner. However, the experimental identification of the O-GlcNAc sites remains challenging in many scenarios. Herein, by leveraging the recent progress in cataloguing experimentally identified O-GlcNAc sites and advanced deep learning approaches, we establish an ensemble model, O-GlcNAcPRED-DL, a deep learning-based tool, for the prediction of O-GlcNAc sites. In brief, to make a benchmark O-GlcNAc data set, we extracted the information on O-GlcNAc from the recently constructed database O-GlcNAcAtlas, which contains thousands of experimentally identified and curated O-GlcNAc sites on proteins from multiple species. To overcome the imbalance between positive and negative data sets, we selected five groups of negative data sets in humans and mice to construct an ensemble predictor based on connection of a convolutional neural network and bidirectional long short-term memory. By taking into account three types of sequence information, we constructed four network frameworks, with the systematically optimized parameters used for the models. The thorough comparison analysis on two independent data sets of humans and mice and six independent data sets from other species demonstrated remarkably increased sensitivity and accuracy of the O-GlcNAcPRED-DL models, outperforming other existing tools. Moreover, a user-friendly Web server for O-GlcNAcPRED-DL has been constructed, which is freely available at http://oglcnac.org/pred_dl.
Collapse
Affiliation(s)
- Fengzhu Hu
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Weiyu Li
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, District of Columbia 20007, United States
| | - Yaoxiang Li
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, District of Columbia 20007, United States
| | - Chunyan Hou
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, District of Columbia 20007, United States
| | - Junfeng Ma
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, District of Columbia 20007, United States
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
3
|
Yin H, Zheng X, Tang X, Zang Z, Li B, He S, Shen R, Yang H, Li S. Potential biomarkers and lncRNA-mRNA regulatory networks in invasive growth hormone-secreting pituitary adenomas. J Endocrinol Invest 2021; 44:1947-1959. [PMID: 33559847 DOI: 10.1007/s40618-021-01510-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 01/15/2021] [Indexed: 02/06/2023]
Abstract
PURPOSE Growth hormone-secreting pituitary adenomas (GH-PAs) are common subtypes of functional PAs. Invasive GH-PAs play a key role in restricting poor outcomes. The transcriptional changes in GH-PAs were evaluated. METHODS In this study, the transcriptome analysis of six different GH-PA samples was performed. The functional roles, co-regulatory network, and chromosome location of differentially expressed (DE) genes in invasive GH-PAs were explored. RESULTS Bioinformatic analysis revealed 101 DE mRNAs and 70 DE long non-coding RNAs (lncRNAs) between invasive and non-invasive GH-PAs. Functional enrichment analysis showed that epithelial cell differentiation and development pathways were suppressed in invasive GH-PAs, whereas the pathways of olfactory transduction, retinol metabolism, drug metabolism-cytochrome P450, and metabolism of xenobiotics by cytochrome P450 had an active trend. In the protein-protein interaction network, 11 main communities were characterized by cell- adhesion, -motility, and -cycle; transport process; phosphorus and hormone metabolic processes. The SGK1 gene was suggested to play a role in the invasiveness of GH-PAs. Furthermore, the up-regulated genes OR51B6, OR52E4, OR52E8, OR52E6, OR52N2, MAGEA6, MAGEC1, ST8SIA6-AS1, and the down-regulated genes GAD1-AS1 and SPINT1-AS1 were identified in the competing endogenous RNA network. The RT-qPCR results further supported the aberrant expression of those genes. Finally, the enrichment of DE genes in chromosome 11p15 and 12p13 regions were detected. CONCLUSION Our findings provide a new perspective for studies evaluating the underlying mechanism of invasive GH-PAs.
Collapse
Affiliation(s)
- H Yin
- Department of Neurosurgery, Xinqiao Hospital, The Army Medical University, Chongqing, China
| | - X Zheng
- Department of Neurosurgery, Xinqiao Hospital, The Army Medical University, Chongqing, China
| | - X Tang
- Department of Neurosurgery, Xinqiao Hospital, The Army Medical University, Chongqing, China
| | - Z Zang
- Department of Neurosurgery, Xinqiao Hospital, The Army Medical University, Chongqing, China
| | - B Li
- College of Life Sciences, Chongqing Normal University, Chongqing, China
| | - S He
- Department of Neurosurgery, Xinqiao Hospital, The Army Medical University, Chongqing, China
| | - R Shen
- Department of Endocrinology, Xinqiao Hospital, The Army Medical University, Chongqing, China
| | - H Yang
- Department of Neurosurgery, Xinqiao Hospital, The Army Medical University, Chongqing, China.
| | - S Li
- Department of Neurosurgery, Xinqiao Hospital, The Army Medical University, Chongqing, China.
| |
Collapse
|
4
|
MicroRNA-34a: the bad guy in age-related vascular diseases. Cell Mol Life Sci 2021; 78:7355-7378. [PMID: 34698884 PMCID: PMC8629897 DOI: 10.1007/s00018-021-03979-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 09/08/2021] [Accepted: 10/12/2021] [Indexed: 12/12/2022]
Abstract
The age-related vasculature alteration is the prominent risk factor for vascular diseases (VD), namely, atherosclerosis, abdominal aortic aneurysm, vascular calcification (VC) and pulmonary arterial hypertension (PAH). The chronic sterile low-grade inflammation state, alias inflammaging, characterizes elderly people and participates in VD development. MicroRNA34-a (miR-34a) is emerging as an important mediator of inflammaging and VD. miR-34a increases with aging in vessels and induces senescence and the acquisition of the senescence-associated secretory phenotype (SASP) in vascular smooth muscle (VSMCs) and endothelial (ECs) cells. Similarly, other VD risk factors, including dyslipidemia, hyperglycemia and hypertension, modify miR-34a expression to promote vascular senescence and inflammation. miR-34a upregulation causes endothelial dysfunction by affecting ECs nitric oxide bioavailability, adhesion molecules expression and inflammatory cells recruitment. miR-34a-induced senescence facilitates VSMCs osteoblastic switch and VC development in hyperphosphatemia conditions. Conversely, atherogenic and hypoxic stimuli downregulate miR-34a levels and promote VSMCs proliferation and migration during atherosclerosis and PAH. MiR34a genetic ablation or miR-34a inhibition by anti-miR-34a molecules in different experimental models of VD reduce vascular inflammation, senescence and apoptosis through sirtuin 1 Notch1, and B-cell lymphoma 2 modulation. Notably, pleiotropic drugs, like statins, liraglutide and metformin, affect miR-34a expression. Finally, human studies report that miR-34a levels associate to atherosclerosis and diabetes and correlate with inflammatory factors during aging. Herein, we comprehensively review the current knowledge about miR-34a-dependent molecular and cellular mechanisms activated by VD risk factors and highlight the diagnostic and therapeutic potential of modulating its expression in order to reduce inflammaging and VD burn and extend healthy lifespan.
Collapse
|
5
|
Lv H, Dao FY, Guan ZX, Yang H, Li YW, Lin H. Deep-Kcr: accurate detection of lysine crotonylation sites using deep learning method. Brief Bioinform 2020; 22:5937175. [PMID: 33099604 DOI: 10.1093/bib/bbaa255] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 08/31/2020] [Accepted: 09/08/2020] [Indexed: 12/23/2022] Open
Abstract
As a newly discovered protein posttranslational modification, histone lysine crotonylation (Kcr) involved in cellular regulation and human diseases. Various proteomics technologies have been developed to detect Kcr sites. However, experimental approaches for identifying Kcr sites are often time-consuming and labor-intensive, which is difficult to widely popularize in large-scale species. Computational approaches are cost-effective and can be used in a high-throughput manner to generate relatively precise identification. In this study, we develop a deep learning-based method termed as Deep-Kcr for Kcr sites prediction by combining sequence-based features, physicochemical property-based features and numerical space-derived information with information gain feature selection. We investigate the performances of convolutional neural network (CNN) and five commonly used classifiers (long short-term memory network, random forest, LogitBoost, naive Bayes and logistic regression) using 10-fold cross-validation and independent set test. Results show that CNN could always display the best performance with high computational efficiency on large dataset. We also compare the Deep-Kcr with other existing tools to demonstrate the excellent predictive power and robustness of our method. Based on the proposed model, a webserver called Deep-Kcr was established and is freely accessible at http://lin-group.cn/server/Deep-Kcr.
Collapse
Affiliation(s)
- Hao Lv
- Center for Informational Biology at the University of Electronic Science and Technology of China
| | - Fu-Ying Dao
- Center for Informational Biology at the University of Electronic Science and Technology of China
| | - Zheng-Xing Guan
- Center for Informational Biology at the University of Electronic Science and Technology of China
| | - Hui Yang
- Center for Informational Biology at the University of Electronic Science and Technology of China
| | | | - Hao Lin
- Center for Informational Biology at the University of Electronic Science and Technology of China
| |
Collapse
|