1
|
Yang S, Fan X, Yu W. Regulatory Mechanism of Protein Crotonylation and Its Relationship with Cancer. Cells 2024; 13:1812. [PMID: 39513918 PMCID: PMC11545499 DOI: 10.3390/cells13211812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 10/26/2024] [Accepted: 10/31/2024] [Indexed: 11/16/2024] Open
Abstract
Crotonylation is a recently discovered protein acyl modification that shares many enzymes with acetylation. However, it possesses a distinct regulatory mechanism and biological function due to its unique crotonyl structure. Since the discovery of crotonylation in 2011, numerous crotonylation sites have been identified in both histones and other proteins. In recent studies, crotonylation was found to play a role in various diseases and biological processes. This paper reviews the initial discovery and regulatory mechanisms of crotonylation, including various writer, reader, and eraser proteins. Finally, we emphasize the relationship of dysregulated protein crotonylation with eight common malignancies, including cervical, prostate, liver, and lung cancer, providing new potential therapeutic targets.
Collapse
Affiliation(s)
- Siyi Yang
- Institute of Biochemistry, College of Life Sciences and Medicine, Zhejiang Sci-Tech University, Hangzhou 310018, China;
- Zhejiang Provincial Key Laboratory of Silkworm Bioreactor and Biomedicine, Hangzhou 310018, China
| | - Xinyi Fan
- Faculty of Arts and Science, University of Toronto, Toronto, ON M5S 1A1, Canada;
| | - Wei Yu
- Institute of Biochemistry, College of Life Sciences and Medicine, Zhejiang Sci-Tech University, Hangzhou 310018, China;
- Zhejiang Provincial Key Laboratory of Silkworm Bioreactor and Biomedicine, Hangzhou 310018, China
| |
Collapse
|
2
|
Qin Z, Ren H, Zhao P, Wang K, Liu H, Miao C, Du Y, Li J, Wu L, Chen Z. Current computational tools for protein lysine acylation site prediction. Brief Bioinform 2024; 25:bbae469. [PMID: 39316944 PMCID: PMC11421846 DOI: 10.1093/bib/bbae469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 08/20/2024] [Accepted: 09/07/2024] [Indexed: 09/26/2024] Open
Abstract
As a main subtype of post-translational modification (PTM), protein lysine acylations (PLAs) play crucial roles in regulating diverse functions of proteins. With recent advancements in proteomics technology, the identification of PTM is becoming a data-rich field. A large amount of experimentally verified data is urgently required to be translated into valuable biological insights. With computational approaches, PLA can be accurately detected across the whole proteome, even for organisms with small-scale datasets. Herein, a comprehensive summary of 166 in silico PLA prediction methods is presented, including a single type of PLA site and multiple types of PLA sites. This recapitulation covers important aspects that are critical for the development of a robust predictor, including data collection and preparation, sample selection, feature representation, classification algorithm design, model evaluation, and method availability. Notably, we discuss the application of protein language models and transfer learning to solve the small-sample learning issue. We also highlight the prediction methods developed for functionally relevant PLA sites and species/substrate/cell-type-specific PLA sites. In conclusion, this systematic review could potentially facilitate the development of novel PLA predictors and offer useful insights to researchers from various disciplines.
Collapse
Affiliation(s)
- Zhaohui Qin
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Haoran Ren
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Kaiyuan Wang
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Huixia Liu
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Chunbo Miao
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Yanxiu Du
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Junzhou Li
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Liuji Wu
- National Key Laboratory of Wheat and Maize Crop Science, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| |
Collapse
|
3
|
Ansar Khawaja S, Alturise F, Alkhalifah T, Khan SA, Khan YD. Gluconeogenesis unraveled: A proteomic Odyssey with machine learning. Methods 2024; 232:29-42. [PMID: 39276958 DOI: 10.1016/j.ymeth.2024.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 08/05/2024] [Accepted: 09/01/2024] [Indexed: 09/17/2024] Open
Abstract
The metabolic pathway known as gluconeogenesis, which produces glucose from non-carbohydrate substrates, is essential for maintaining balanced blood sugar levels while fasting. It's extremely important to anticipate gluconeogenesis rates accurately to recognize metabolic disorders and create efficient treatment strategies. The implementation of deep learning and machine learning methods to forecast complex biological processes has been gaining popularity in recent years. The recognition of both the regulation of the pathway and possible therapeutic applications of proteins depends on accurate identification associated with their gluconeogenesis patterns. This article analyzes the uses of machine learning and deep learning models, to predict gluconeogenesis efficiency. The study also discusses the challenges that come with restricted data availability and model interpretability, as well as possible applications in personalized healthcare, metabolic disease treatment, and the discovery of drugs. The predictor utilizes statistics moments on the structures of gluconeogenesis and their enzymes, while Random Forest is utilized as a classifier to ensure the accuracy of this model in identifying the best outcomes. The method was validated utilizing the independent test, self-consistency, 10k fold cross-validations, and jackknife test which achieved 92.33 %, 91.87%, 87.88%, and 87.02%. An accurate prediction of gluconeogenesis has significant implications for understanding metabolic disorders and developing targeted therapies. This study contributes to the rising field of predictive biology by mixing algorithms for deep learning, and machine learning, with metabolic pathways.
Collapse
Affiliation(s)
- Seher Ansar Khawaja
- Department of Computer Science, University of Management and Technology, Lahore, Paksistan
| | - Fahad Alturise
- Department of Cybersecurity, College of Computer, Qassim University, Buraydah, Saudi Arabia.
| | - Tamim Alkhalifah
- Deparment of Computer Engineering, College of Computer, Qassim University, Buraydah, Saudi Arabia.
| | - Sher Afzal Khan
- Deparment of Computer Sciences, Abdul Wali Khan University, Mardan, Pakistan.
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore, Paksistan.
| |
Collapse
|
4
|
Naseem A, Khan YD. An intelligent model for prediction of abiotic stress-responsive microRNAs in plants using statistical moments based features and ensemble approaches. Methods 2024; 228:65-79. [PMID: 38768931 DOI: 10.1016/j.ymeth.2024.05.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 04/30/2024] [Accepted: 05/10/2024] [Indexed: 05/22/2024] Open
Abstract
This study proposed an intelligent model for predicting abiotic stress-responsive microRNAs in plants. MicroRNAs (miRNAs) are short RNA molecules regulates the stress in genes. Experimental methods are costly and time-consuming, as compare to in-silico prediction. Addressing this gap, the study seeks to develop an efficient computational model for plant stress response prediction. The two benchmark datasets for MiRNA and Pre-MiRNA dataset have been acquired in this study. Four ensemble approaches such as bagging, boosting, stacking, and blending have been employed. Classifiers such as Random Forest (RF), Extra Trees (ET), Ada Boost (ADB), Light Gradient Boosting Machine (LGBM), and Support Vector Machine (SVM). Stacking and Blending employed all stated classifiers as base learners and Logistic Regression (LR) as Meta Classifier. There have been a total of four types of testing used, including independent set, self-consistency, cross-validation with 5 and 10 folds, and jackknife. This study has utilized evaluation metrics such as accuracy score, specificity, sensitivity, Mathew's correlation coefficient (MCC), and AUC. Our proposed methodology has outperformed existing state of the art study in both datasets based on independent set testing. The SVM-based approach has exhibited accuracy score of 0.659 for the MiRNA dataset, which is better than the previous study. The ET classifier has surpassed the accuracy of Pre-MiRNA dataset as compared to the existing benchmark study, achieving an impressive score of 0.67. The proposed method can be used in future research to predict abiotic stresses in plants.
Collapse
Affiliation(s)
- Ansar Naseem
- Department of Artificial Intelligence, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan.
| |
Collapse
|
5
|
Gao J, Zhao Y, Chen C, Ning Q. MVNN-HNHC:A multi-view neural network for identification of human non-histone crotonylation sites. Anal Biochem 2024; 687:115426. [PMID: 38141798 DOI: 10.1016/j.ab.2023.115426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 11/21/2023] [Accepted: 12/06/2023] [Indexed: 12/25/2023]
Abstract
Crotonylation on lysine sites in human non-histone proteins plays a crucial role in biology activities. However, because traditional experimental methods for crotonylation site identification are time-consuming and labor-intensive, computational prediction methods have become increasingly popular in recent years. Despite its significance, crotonylation site prediction has received less attention in non-histone proteins than in histones. In this study, we proposed a Multi-View Neural Network for identification of Human Non-Histone Crotonylation sites, named MVNN-HNHC. MVNN-HNHC integrated multi-view encoding features and adaptive encoding features through multi-channel neural network to deeply learn about attribute differences between crotonylation sites and non-crotonylation sites from various aspects. In MVNN-HNHC, convolutional neural networks can obtain local information from these features, and bidirectional long short term memory networks were utilized to extract sequence information. Then, we employ the attention mechanism to fuse the outputs of various feature extraction modules. Finally, the fully connection network acted as the classifier to predict whether a lysine site was crotonylation site or non-crotonylation site. Performance metrics on independent test set, including sensitivity, specificity, accuracy, Matthews correlation coefficient, and area under the curve (AUC) values reach 80.06 %, 75.77 %, 77.06 %, 0.5203, and 0.7792, respectively. To verify the effectiveness of this method, we carry out a series of experiments and the results show that MVNN-HNHC is an effective tool for predicting crotonylation sites in non-histone proteins. The data and code are available on https://github.com/xbbxhbc/junjun0612.git.
Collapse
Affiliation(s)
- Jun Gao
- Department of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Yaomiao Zhao
- Department of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Chen Chen
- Naval Architecture and Ocean Engineering College, Dalian Maritime University, Dalian, 116026, China.
| | - Qiao Ning
- Department of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China.
| |
Collapse
|
6
|
Jiang Y, Yan R, Wang X. PlantNh-Kcr: a deep learning model for predicting non-histone crotonylation sites in plants. PLANT METHODS 2024; 20:28. [PMID: 38360730 PMCID: PMC10870457 DOI: 10.1186/s13007-024-01157-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/07/2024] [Indexed: 02/17/2024]
Abstract
BACKGROUND Lysine crotonylation (Kcr) is a crucial protein post-translational modification found in histone and non-histone proteins. It plays a pivotal role in regulating diverse biological processes in both animals and plants, including gene transcription and replication, cell metabolism and differentiation, as well as photosynthesis. Despite the significance of Kcr, detection of Kcr sites through biological experiments is often time-consuming, expensive, and only a fraction of crotonylated peptides can be identified. This reality highlights the need for efficient and rapid prediction of Kcr sites through computational methods. Currently, several machine learning models exist for predicting Kcr sites in humans, yet models tailored for plants are rare. Furthermore, no downloadable Kcr site predictors or datasets have been developed specifically for plants. To address this gap, it is imperative to integrate existing Kcr sites detected in plant experiments and establish a dedicated computational model for plants. RESULTS Most plant Kcr sites are located on non-histones. In this study, we collected non-histone Kcr sites from five plants, including wheat, tabacum, rice, peanut, and papaya. We then conducted a comprehensive analysis of the amino acid distribution surrounding these sites. To develop a predictive model for plant non-histone Kcr sites, we combined a convolutional neural network (CNN), a bidirectional long short-term memory network (BiLSTM), and attention mechanism to build a deep learning model called PlantNh-Kcr. On both five-fold cross-validation and independent tests, PlantNh-Kcr outperformed multiple conventional machine learning models and other deep learning models. Furthermore, we conducted an analysis of species-specific effect on the PlantNh-Kcr model and found that a general model trained using data from multiple species outperforms species-specific models. CONCLUSION PlantNh-Kcr represents a valuable tool for predicting plant non-histone Kcr sites. We expect that this model will aid in addressing key challenges and tasks in the study of plant crotonylation sites.
Collapse
Affiliation(s)
- Yanming Jiang
- College of Mathematics and Computer Sciences, Shanxi Normal University, Taiyuan, 030031, China
| | - Renxiang Yan
- The Key Laboratory of Marine Enzyme Engineering of Fujian Province, Fuzhou University, Fuzhou, 350002, China
- College of Biological Science and Engineering, Fuzhou University, Fuzhou, 350002, China
| | - Xiaofeng Wang
- College of Mathematics and Computer Sciences, Shanxi Normal University, Taiyuan, 030031, China.
| |
Collapse
|
7
|
Perveen G, Alturise F, Alkhalifah T, Daanial Khan Y. Hemolytic-Pred: A machine learning-based predictor for hemolytic proteins using position and composition-based features. Digit Health 2023; 9:20552076231180739. [PMID: 37434723 PMCID: PMC10331097 DOI: 10.1177/20552076231180739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 05/22/2023] [Indexed: 07/13/2023] Open
Abstract
Objective The objective of this study is to propose a novel in-silico method called Hemolytic-Pred for identifying hemolytic proteins based on their sequences, using statistical moment-based features, along with position-relative and frequency-relative information. Methods Primary sequences were transformed into feature vectors using statistical and position-relative moment-based features. Varying machine learning algorithms were employed for classification. Computational models were rigorously evaluated using four different validation. The Hemolytic-Pred webserver is available for further analysis at http://ec2-54-160-229-10.compute-1.amazonaws.com/. Results XGBoost outperformed the other six classifiers with an accuracy value of 0.99, 0.98, 0.97, and 0.98 for self-consistency test, 10-fold cross-validation, Jackknife test, and independent set test, respectively. The proposed method with the XGBoost classifier is a workable and robust solution for predicting hemolytic proteins efficiently and accurately. Conclusions The proposed method of Hemolytic-Pred with XGBoost classifier is a reliable tool for the timely identification of hemolytic cells and diagnosis of various related severe disorders. The application of Hemolytic-Pred can yield profound benefits in the medical field.
Collapse
Affiliation(s)
- Gulnaz Perveen
- Department of Computer Science, School
of Systems and Technology, University of Management and Technology, Lahore, Punjab,
Pakistan
| | - Fahad Alturise
- Department of Computer, College of
Science and Arts in Ar Rass Qassim University, Buraidah, Qassim, Saudi Arabia
| | - Tamim Alkhalifah
- Department of Computer, College of
Science and Arts in Ar Rass Qassim University, Buraidah, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School
of Systems and Technology, University of Management and Technology, Lahore, Punjab,
Pakistan
| |
Collapse
|
8
|
Butt AH, Alkhalifah T, Alturise F, Khan YD. A machine learning technique for identifying DNA enhancer regions utilizing CIS-regulatory element patterns. Sci Rep 2022; 12:15183. [PMID: 36071071 PMCID: PMC9452539 DOI: 10.1038/s41598-022-19099-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 08/24/2022] [Indexed: 11/26/2022] Open
Abstract
Enhancers regulate gene expression, by playing a crucial role in the synthesis of RNAs and proteins. They do not directly encode proteins or RNA molecules. In order to control gene expression, it is important to predict enhancers and their potency. Given their distance from the target gene, lack of common motifs, and tissue/cell specificity, enhancer regions are thought to be difficult to predict in DNA sequences. Recently, a number of bioinformatics tools were created to distinguish enhancers from other regulatory components and to pinpoint their advantages. However, because the quality of its prediction method needs to be improved, its practical application value must also be improved. Based on nucleotide composition and statistical moment-based features, the current study suggests a novel method for identifying enhancers and non-enhancers and evaluating their strength. The proposed study outperformed state-of-the-art techniques using fivefold and tenfold cross-validation in terms of accuracy. The accuracy from the current study results in 86.5% and 72.3% in enhancer site and its strength prediction respectively. The results of the suggested methodology point to the potential for more efficient and successful outcomes when statistical moment-based features are used. The current study's source code is available to the research community at https://github.com/csbioinfopk/enpred.
Collapse
Affiliation(s)
- Ahmad Hassan Butt
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Saudi Arabia.
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
9
|
Dou L, Zhang Z, Xu L, Zou Q. iKcr_CNN: A novel computational tool for imbalance classification of human nonhistone crotonylation sites based on convolutional neural networks with focal loss. Comput Struct Biotechnol J 2022; 20:3268-3279. [PMID: 35832615 PMCID: PMC9251780 DOI: 10.1016/j.csbj.2022.06.032] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 06/13/2022] [Accepted: 06/13/2022] [Indexed: 11/26/2022] Open
Abstract
Lysine crotonylation (Kcr) is a newly discovered protein post-translational modification and has been proved to be widely involved in various biological processes and human diseases. Thus, the accurate and fast identification of this modification became the preliminary task in investigating the related biological functions. Due to the long duration, high cost and intensity of traditional high-throughput experimental techniques, constructing bioinformatics predictors based on machine learning algorithms is treated as a most popular solution. Although dozens of predictors have been reported to identify Kcr sites, only two, nhKcr and DeepKcrot, focused on human nonhistone protein sequences. Moreover, due to the imbalance nature of data distribution, associated detection performance is severely biased towards the major negative samples and remains much room for improvement. In this research, we developed a convolutional neural network framework, dubbed iKcr_CNN, to identify the human nonhistone Kcr modification. To overcome the imbalance issue (Kcr: 15,274; non-Kcr: 74,018 with imbalance ratio: 1:4), we applied the focal loss function instead of the standard cross-entropy as the indicator to optimize the model, which not only assigns different weights to samples belonging to different categories but also distinguishes easy- and hard-classified samples. Ultimately, the obtained model presents more balanced prediction scores between real-world positive and negative samples than existing tools. The user-friendly web server is accessible at ikcrcnn.webmalab.cn/, and the involved Python scripts can be conveniently downloaded at github.com/lijundou/iKcr_CNN/. The proposed model may serve as an efficient tool to assist academicians with their experimental researches.
Collapse
|
10
|
βLact-Pred: A Predictor Developed for Identification of Beta-Lactamases Using Statistical Moments and PseAAC via 5-Step Rule. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:8974265. [PMID: 34956358 PMCID: PMC8709780 DOI: 10.1155/2021/8974265] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 11/22/2021] [Indexed: 12/02/2022]
Abstract
Beta-lactamase (β-lactamase) produced by different bacteria confers resistance against β-lactam-containing drugs. The gene encoding β-lactamase is plasmid-borne and can easily be transferred from one bacterium to another during conjugation. By such transformations, the recipient also acquires resistance against the drugs of the β-lactam family. β-Lactam antibiotics play a vital significance in clinical treatment of disastrous diseases like soft tissue infections, gonorrhoea, skin infections, urinary tract infections, and bronchitis. Herein, we report a prediction classifier named as βLact-Pred for the identification of β-lactamase proteins. The computational model uses the primary amino acid sequence structure as its input. Various metrics are derived from the primary structure to form a feature vector. Experimentally determined data of positive and negative beta-lactamases are collected and transformed into feature vectors. An operating algorithm based on the artificial neural network is used by integrating the position relative features and sequence statistical moments in PseAAC for training the neural networks. The results for the proposed computational model were validated by employing numerous types of approach, i.e., self-consistency testing, jackknife testing, cross-validation, and independent testing. The overall accuracy of the predictor for self-consistency, jackknife testing, cross-validation, and independent testing presents 99.76%, 96.07%, 94.20%, and 91.65%, respectively, for the proposed model. Stupendous experimental results demonstrated that the proposed predictor “βLact-Pred” has surpassed results from the existing methods.
Collapse
|
11
|
Tng SS, Le NQK, Yeh HY, Chua MCH. Improved Prediction Model of Protein Lysine Crotonylation Sites Using Bidirectional Recurrent Neural Networks. J Proteome Res 2021; 21:265-273. [PMID: 34812044 DOI: 10.1021/acs.jproteome.1c00848] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Histone lysine crotonylation (Kcr) is a post-translational modification of histone proteins that is involved in the regulation of gene transcription, acute and chronic kidney injury, spermatogenesis, depression, cancer, and so forth. The identification of Kcr sites in proteins is important for characterizing and regulating primary biological mechanisms. The use of computational approaches such as machine learning and deep learning algorithms have emerged in recent years as the traditional wet-lab experiments are time-consuming and costly. We propose as part of this study a deep learning model based on a recurrent neural network (RNN) termed as Sohoko-Kcr for the prediction of Kcr sites. Through the embedded encoding of the peptide sequences, we investigate the efficiency of RNN-based models such as long short-term memory (LSTM), bidirectional LSTM (BiLSTM), and bidirectional gated recurrent unit (BiGRU) networks using cross-validation and independent tests. We also established the comparison between Sohoko-Kcr and other published tools to verify the efficiency of our model based on 3-fold, 5-fold, and 10-fold cross-validations using independent set tests. The results then show that the BiGRU model has consistently displayed outstanding performance and computational efficiency. Based on the proposed model, a webserver called Sohoko-Kcr was deployed for free use and is accessible at https://sohoko-research-9uu23.ondigitalocean.app.
Collapse
Affiliation(s)
- Sian Soo Tng
- Institute of Systems Science, National University of Singapore, 29 Heng Mui Keng Terrace, Singapore 119620, Singapore
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan.,Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan.,Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan
| | - Hui-Yuan Yeh
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Avenue, Singapore 639818, Singapore
| | - Matthew Chin Heng Chua
- Institute of Systems Science, National University of Singapore, 29 Heng Mui Keng Terrace, Singapore 119620, Singapore
| |
Collapse
|
12
|
Alzahrani E, Alghamdi W, Ullah MZ, Khan YD. Identification of stress response proteins through fusion of machine learning models and statistical paradigms. Sci Rep 2021; 11:21767. [PMID: 34741132 PMCID: PMC8571424 DOI: 10.1038/s41598-021-99083-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 09/13/2021] [Indexed: 11/08/2022] Open
Abstract
Proteins are a vital component of cells that perform physiological functions to ensure smooth operations of bodily functions. Identification of a protein's function involves a detailed understanding of the structure of proteins. Stress proteins are essential mediators of several responses to cellular stress and are categorized based on their structural characteristics. These proteins are found to be conserved across many eukaryotic and prokaryotic linkages and demonstrate varied crucial functional activities inside a cell. The in-vivo, ex vivo, and in-vitro identification of stress proteins are a time-consuming and costly task. This study is aimed at the identification of stress protein sequences with the aid of mathematical modelling and machine learning methods to supplement the aforementioned wet lab methods. The model developed using Random Forest showed remarkable results with 91.1% accuracy while models based on neural network and support vector machine showed 87.7% and 47.0% accuracy, respectively. Based on evaluation results it was concluded that random-forest based classifier surpassed all other predictors and is suitable for use in practical applications for the identification of stress proteins. Live web server is available at http://biopred.org/stressprotiens , while the webserver code available is at https://github.com/abdullah5naveed/SRP_WebServer.git.
Collapse
Affiliation(s)
- Ebraheem Alzahrani
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P. O. Box 80203, Jeddah, 21589, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, P. O. Box 80221, Jeddah, 21589, Saudi Arabia
| | - Malik Zaka Ullah
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P. O. Box 80203, Jeddah, 21589, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore, 54770, Pakistan.
| |
Collapse
|
13
|
Sohrawordi M, Hossain MA. Prediction of lysine formylation sites using support vector machine based on the sample selection from majority classes and synthetic minority over-sampling techniques. Biochimie 2021; 192:125-135. [PMID: 34627982 DOI: 10.1016/j.biochi.2021.10.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 10/03/2021] [Accepted: 10/05/2021] [Indexed: 12/22/2022]
Abstract
Lysine formylation is a newly discovered and mostly interested type of post-translational modification (PTM) that is generally found on core and linker histone proteins of prokaryote and eukaryote and plays various important roles on the regulation of various cellular mechanisms. Hence, it is very urgent to properly identify formylation site in protein for understanding the molecular mechanism of formylation deeply and defining drug for relevant diseases. As experimentally identification of formylation site using traditional processes are expensive and time consuming, a simple and high speedy mathematical model for predicting accurately lysine formylation sites is highly desired. A useful computational model named PLF_SVM is deigned and proposed in this study by using binary encoding (BE), amino acid composition (AAC), reverse position relative incidence matrix (RPRIM), position relative incidence matrix (PRIM), and position specific amino acid propensity (PSAAP) feature generation methods for predicting formylated and non-formylated lysine sites. Besides, the Synthetic Minority Oversampling Technique (SMOTE) and a proposed sample selection strategy named EnSVM are applied to handle the imbalance training dataset problem. Thereafter, the optimal number of features are selected by F-score method to train the model. Finally, it has been seen that PLF_SVM outperforms the state-of-the-art approaches in validation and independent test with an accuracy of 98.61% and 98.77% respectively. At https://plf-svm.herokuapp.com/, a user-friendly web tool is also created for identifying formylation sites. Therefore, the proposed method may be helpful guideline for the analysis and prediction of formylated lysine and knowing the process of cellular regulation.
Collapse
Affiliation(s)
- Md Sohrawordi
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh; Dept. of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh.
| | - Md Ali Hossain
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| |
Collapse
|
14
|
Subba P, Prasad TSK. Protein Crotonylation Expert Review: A New Lens to Take Post-Translational Modifications and Cell Biology to New Heights. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2021; 25:617-625. [PMID: 34582706 DOI: 10.1089/omi.2021.0132] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Genome regulation, temporal and spatial variations in cell function, continues to puzzle and interest life scientists who aim to unravel the molecular basis of human health and disease, not to mention plant biology and ecosystem diversity. Despite important advances in epigenomics and protein post-translational modifications over the past decade, there is a need for new conceptual lenses to understand biological mechanisms that can help unravel the fundamental regulatory questions in genomes and the cell. To these ends, lys crotonylation (Kcr) is a reversible protein modification catalyzed by protein crotonyl transferases and decrotonylases. First identified on histones, Kcr regulates cellular processes at the chromatin level. Research thus far has revealed that Kcr marks promoter sites of active genes and potential enhancers. Eventually, Kcr on a number of nonhistone proteins was reported. The abundance of Kcr on ribosomal and myofilament proteins indicates its functional roles in protein synthesis and muscle contraction. Kcr has also been associated with pluripotency, spermiogenesis, and DNA repair. In plants, large-scale mass spectrometry-based experiments validated the roles of Kcr in photosynthesis. In this expert review, we present the latest thinking and findings on lys crotonylation with an eye to regulation of cell biology. We discuss the enrichment techniques, putative biological functions, and challenges associated with studying this protein modification with vast biological implications. Finally, we reflect on the future outlook about the broader relevance of Kcr in animals, microbes, and plant species.
Collapse
Affiliation(s)
- Pratigya Subba
- Center for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore, India
| | | |
Collapse
|
15
|
Islam MKB, Rahman J, Hasan MAM, Ahmad S. predForm-Site: Formylation site prediction by incorporating multiple features and resolving data imbalance. Comput Biol Chem 2021; 94:107553. [PMID: 34384997 DOI: 10.1016/j.compbiolchem.2021.107553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 06/22/2021] [Accepted: 07/28/2021] [Indexed: 10/20/2022]
Abstract
Formylation is one of the newly discovered post-translational modifications in lysine residue which is responsible for different kinds of diseases. In this work, a novel predictor, named predForm-Site, has been developed to predict formylation sites with higher accuracy. We have integrated multiple sequence features for developing a more informative representation of formylation sites. Moreover, decision function of the underlying classifier have been optimized on skewed formylation dataset during prediction model training for prediction quality improvement. On the dataset used by LFPred and Formator predictor, predForm-Site achieved 99.5% sensitivity, 99.8% specificity and 99.8% overall accuracy with AUC of 0.999 in the jackknife test. In the independent test, it has also achieved more than 97% sensitivity and 99% specificity. Similarly, in benchmarking with recent method CKSAAP_FormSite, the proposed predictor significantly outperformed in all the measures, particularly sensitivity by around 20%, specificity by nearly 30% and overall accuracy by more than 22%. These experimental results show that the proposed predForm-Site can be used as a complementary tool for the fast exploration of formylation sites. For convenience of the scientific community, predForm-Site has been deployed as an online tool, accessible at http://103.99.176.239:8080/predForm-Site.
Collapse
Affiliation(s)
- Md Khaled Ben Islam
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; Department of Computer Science & Engineering, Pabna University of Science and Technology, Pabna, Bangladesh.
| | - Julia Rahman
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; Department of Computer Science & Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh.
| | - Md Al Mehedi Hasan
- Department of Computer Science & Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Shamim Ahmad
- Department of Computer Science & Engineering, Rajshahi University, Rajshahi, Bangladesh
| |
Collapse
|
16
|
Jiang G, Li C, Lu M, Lu K, Li H. Protein lysine crotonylation: past, present, perspective. Cell Death Dis 2021; 12:703. [PMID: 34262024 PMCID: PMC8280118 DOI: 10.1038/s41419-021-03987-z] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 05/28/2021] [Accepted: 05/31/2021] [Indexed: 02/08/2023]
Abstract
Lysine crotonylation has been discovered in histone and non-histone proteins and found to be involved in diverse diseases and biological processes, such as neuropsychiatric disease, carcinogenesis, spermatogenesis, tissue injury, and inflammation. The unique carbon–carbon π-bond structure indicates that lysine crotonylation may use distinct regulatory mechanisms from the widely studied other types of lysine acylation. In this review, we discussed the regulation of lysine crotonylation by enzymatic and non-enzymatic mechanisms, the recognition of substrate proteins, the physiological functions of lysine crotonylation and its cross-talk with other types of modification. The tools and methods for prediction and detection of lysine crotonylation were also described.
Collapse
Affiliation(s)
- Gaoyue Jiang
- West China Second University Hospital, State Key Laboratory of Biotherapy, and Key Laboratory of Birth Defects and Related Diseases of Women and Children, Ministry of Education, Sichuan University, 610041, Chengdu, China
| | - Chunxia Li
- Department of Neurosurgery, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and The Research Units of West China, Chinese Academy of Medical Sciences, Chengdu, China
| | - Meng Lu
- Department of Neurosurgery, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and The Research Units of West China, Chinese Academy of Medical Sciences, Chengdu, China
| | - Kefeng Lu
- Department of Neurosurgery, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and The Research Units of West China, Chinese Academy of Medical Sciences, Chengdu, China.
| | - Huihui Li
- West China Second University Hospital, State Key Laboratory of Biotherapy, and Key Laboratory of Birth Defects and Related Diseases of Women and Children, Ministry of Education, Sichuan University, 610041, Chengdu, China.
| |
Collapse
|
17
|
Chen YZ, Wang ZZ, Wang Y, Ying G, Chen Z, Song J. nhKcr: a new bioinformatics tool for predicting crotonylation sites on human nonhistone proteins based on deep learning. Brief Bioinform 2021; 22:6277413. [PMID: 34002774 DOI: 10.1093/bib/bbab146] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 03/18/2021] [Accepted: 03/25/2021] [Indexed: 12/20/2022] Open
Abstract
Lysine crotonylation (Kcr) is a newly discovered type of protein post-translational modification and has been reported to be involved in various pathophysiological processes. High-resolution mass spectrometry is the primary approach for identification of Kcr sites. However, experimental approaches for identifying Kcr sites are often time-consuming and expensive when compared with computational approaches. To date, several predictors for Kcr site prediction have been developed, most of which are capable of predicting crotonylation sites on either histones alone or mixed histone and nonhistone proteins together. These methods exhibit high diversity in their algorithms, encoding schemes, feature selection techniques and performance assessment strategies. However, none of them were designed for predicting Kcr sites on nonhistone proteins. Therefore, it is desirable to develop an effective predictor for identifying Kcr sites from the large amount of nonhistone sequence data. For this purpose, we first provide a comprehensive review on six methods for predicting crotonylation sites. Second, we develop a novel deep learning-based computational framework termed as CNNrgb for Kcr site prediction on nonhistone proteins by integrating different types of features. We benchmark its performance against multiple commonly used machine learning classifiers (including random forest, logitboost, naïve Bayes and logistic regression) by performing both 10-fold cross-validation and independent test. The results show that the proposed CNNrgb framework achieves the best performance with high computational efficiency on large datasets. Moreover, to facilitate users' efforts to investigate Kcr sites on human nonhistone proteins, we implement an online server called nhKcr and compare it with other existing tools to illustrate the utility and robustness of our method. The nhKcr web server and all the datasets utilized in this study are freely accessible at http://nhKcr.erc.monash.edu/.
Collapse
Affiliation(s)
- Yong-Zi Chen
- Laboratory of Tumor Cell Biology, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, China
| | | | | | - Guoguang Ying
- Laboratory of Tumor Cell Biology in Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, China
| | - Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, China
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash University, Australia
| |
Collapse
|
18
|
Li A, Deng Y, Tan Y, Chen M. A Transfer Learning-Based Approach for Lysine Propionylation Prediction. Front Physiol 2021; 12:658633. [PMID: 33967828 PMCID: PMC8096918 DOI: 10.3389/fphys.2021.658633] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 03/15/2021] [Indexed: 12/12/2022] Open
Abstract
Lysine propionylation is a newly discovered posttranslational modification (PTM) and plays a key role in the cellular process. Although proteomics techniques was capable of detecting propionylation, large-scale detection was still challenging. To bridge this gap, we presented a transfer learning-based method for computationally predicting propionylation sites. The recurrent neural network-based deep learning model was trained firstly by the malonylation and then fine-tuned by the propionylation. The trained model served as feature extractor where protein sequences as input were translated into numerical vectors. The support vector machine was used as the final classifier. The proposed method reached a matthews correlation coefficient (MCC) of 0.6615 on the 10-fold crossvalidation and 0.3174 on the independent test, outperforming state-of-the-art methods. The enrichment analysis indicated that the propionylation was associated with these GO terms (GO:0016620, GO:0051287, GO:0003735, GO:0006096, and GO:0005737) and with metabolism. We developed a user-friendly online tool for predicting propoinylation sites which is available at http://47.113.117.61/.
Collapse
Affiliation(s)
- Ang Li
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| | - Yingwei Deng
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| | - Yan Tan
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| | - Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| |
Collapse
|
19
|
Dou L, Yang F, Xu L, Zou Q. A comprehensive review of the imbalance classification of protein post-translational modifications. Brief Bioinform 2021; 22:6217722. [PMID: 33834199 DOI: 10.1093/bib/bbab089] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 02/17/2021] [Accepted: 02/24/2021] [Indexed: 12/13/2022] Open
Abstract
Post-translational modifications (PTMs) play significant roles in regulating protein structure, activity and function, and they are closely involved in various pathologies. Therefore, the identification of associated PTMs is the foundation of in-depth research on related biological mechanisms, disease treatments and drug design. Due to the high cost and time consumption of high-throughput sequencing techniques, developing machine learning-based predictors has been considered an effective approach to rapidly recognize potential modified sites. However, the imbalanced distribution of true and false PTM sites, namely, the data imbalance problem, largely effects the reliability and application of prediction tools. In this article, we conduct a systematic survey of the research progress in the imbalanced PTMs classification. First, we describe the modeling process in detail and outline useful data imbalance solutions. Then, we summarize the recently proposed bioinformatics tools based on imbalanced PTM data and simultaneously build a convenient website, ImClassi_PTMs (available at lab.malab.cn/∼dlj/ImbClassi_PTMs/), to facilitate the researchers to view. Moreover, we analyze the challenges of current computational predictors and propose some suggestions to improve the efficiency of imbalance learning. We hope that this work will provide comprehensive knowledge of imbalanced PTM recognition and contribute to advanced predictors in the future.
Collapse
Affiliation(s)
- Lijun Dou
- University of Electronic Science and Technology of China and the Shenzhen Polytechnic, China
| | - Fenglong Yang
- University of Electronic Science and Technology of China and the Shenzhen Polytechnic, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
20
|
Ahmed S, Rahman A, Hasan MAM, Islam MKB, Rahman J, Ahmad S. predPhogly-Site: Predicting phosphoglycerylation sites by incorporating probabilistic sequence-coupling information into PseAAC and addressing data imbalance. PLoS One 2021; 16:e0249396. [PMID: 33793659 PMCID: PMC8016359 DOI: 10.1371/journal.pone.0249396] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 03/18/2021] [Indexed: 12/14/2022] Open
Abstract
Post-translational modification (PTM) involves covalent modification after the biosynthesis process and plays an essential role in the study of cell biology. Lysine phosphoglycerylation, a newly discovered reversible type of PTM that affects glycolytic enzyme activities, and is responsible for a wide variety of diseases, such as heart failure, arthritis, and degeneration of the nervous system. Our goal is to computationally characterize potential phosphoglycerylation sites to understand the functionality and causality more accurately. In this study, a novel computational tool, referred to as predPhogly-Site, has been developed to predict phosphoglycerylation sites in the protein. It has effectively utilized the probabilistic sequence-coupling information among the nearby amino acid residues of phosphoglycerylation sites along with a variable cost adjustment for the skewed training dataset to enhance the prediction characteristics. It has achieved around 99% accuracy with more than 0.96 MCC and 0.97 AUC in both 10-fold cross-validation and independent test. Even, the standard deviation in 10-fold cross-validation is almost negligible. This performance indicates that predPhogly-Site remarkably outperformed the existing prediction tools and can be used as a promising predictor, preferably with its web interface at http://103.99.176.239/predPhogly-Site.
Collapse
Affiliation(s)
- Sabit Ahmed
- Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
- * E-mail:
| | - Afrida Rahman
- Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Md. Al Mehedi Hasan
- Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Md Khaled Ben Islam
- Computer Science and Engineering, Pabna University of Science and Technology, Pabna, Bangladesh
| | - Julia Rahman
- Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Shamim Ahmad
- Computer Science and Engineering, University of Rajshahi, Rajshahi, Bangladesh
| |
Collapse
|
21
|
Jing XY, Li FM. Predicting Cell Wall Lytic Enzymes Using Combined Features. Front Bioeng Biotechnol 2021; 8:627335. [PMID: 33585423 PMCID: PMC7874139 DOI: 10.3389/fbioe.2020.627335] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 12/04/2020] [Indexed: 11/13/2022] Open
Abstract
Due to the overuse of antibiotics, people are worried that existing antibiotics will become ineffective against pathogens with the rapid rise of antibiotic-resistant strains. The use of cell wall lytic enzymes to destroy bacteria has become a viable alternative to avoid the crisis of antimicrobial resistance. In this paper, an improved method for cell wall lytic enzymes prediction was proposed and the amino acid composition (AAC), the dipeptide composition (DC), the position-specific score matrix auto-covariance (PSSM-AC), and the auto-covariance average chemical shift (acACS) were selected to predict the cell wall lytic enzymes with support vector machine (SVM). In order to overcome the imbalanced data classification problems and remove redundant or irrelevant features, the synthetic minority over-sampling technique (SMOTE) was used to balance the dataset. The F-score was used to select features. The Sn, Sp, MCC, and Acc were 99.35%, 99.02%, 0.98, and 99.19% with jackknife test using the optimized combination feature AAC+DC+acACS+PSSM-AC. The Sn, Sp, MCC, and Acc of cell wall lytic enzymes in our predictive model were higher than those in existing methods. This improved method may be helpful for protein function prediction.
Collapse
Affiliation(s)
- Xiao-Yang Jing
- College of Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Feng-Min Li
- College of Science, Inner Mongolia Agricultural University, Hohhot, China
| |
Collapse
|
22
|
Wang R, Wang Z, Wang H, Pang Y, Lee TY. Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian. Sci Rep 2020; 10:20447. [PMID: 33235255 PMCID: PMC7686339 DOI: 10.1038/s41598-020-77173-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Accepted: 11/03/2020] [Indexed: 12/14/2022] Open
Abstract
Lysine crotonylation (Kcr) is a type of protein post-translational modification (PTM), which plays important roles in a variety of cellular regulation and processes. Several methods have been proposed for the identification of crotonylation. However, most of these methods can predict efficiently only on histone or non-histone protein. Therefore, this work aims to give a more balanced performance in different species, here plant (non-histone) and mammalian (histone) are involved. SVM (support vector machine) and RF (random forest) were employed in this study. According to the results of cross-validations, the RF classifier based on EGAAC attribute achieved the best predictive performance which performs competitively good as existed methods, meanwhile more robust when dealing with imbalanced datasets. Moreover, an independent test was carried out, which compared the performance of this study and existed methods based on the same features or the same classifier. The classifiers of SVM and RF could achieve best performances with 92% sensitivity, 88% specificity, 90% accuracy, and an MCC of 0.80 in the mammalian dataset, and 77% sensitivity, 83% specificity, 70% accuracy and 0.54 MCC in a relatively small dataset of mammalian and a large-scaled plant dataset respectively. Moreover, a cross-species independent testing was also carried out in this study, which has proved the species diversity in plant and mammalian.
Collapse
Affiliation(s)
- Rulan Wang
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, Guangdong, People's Republic of China
| | - Zhuo Wang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, Guangdong, People's Republic of China.,School of Life Sciences, University of Science and Technology of China, Hefei, 230026, Anhui, People's Republic of China
| | - Hongfei Wang
- Department of Orthopaedics and Traumatology, The University of Hong Kong, Pok Fu Lam, Hong Kong
| | - Yuxuan Pang
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, Guangdong, People's Republic of China
| | - Tzong-Yi Lee
- School of Life and Health Sciences, The Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, Guangdong, People's Republic of China.
| |
Collapse
|
23
|
Mahmood MK, Ehsan A, Khan YD, Chou KC. iHyd-LysSite (EPSV): Identifying Hydroxylysine Sites in Protein Using Statistical Formulation by Extracting Enhanced Position and Sequence Variant Feature Technique. Curr Genomics 2020; 21:536-545. [PMID: 33214770 PMCID: PMC7604750 DOI: 10.2174/1389202921999200831142629] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 05/14/2020] [Accepted: 05/15/2020] [Indexed: 11/22/2022] Open
Abstract
Introduction Hydroxylation is one of the most important post-translational modifications (PTM) in cellular functions and is linked to various diseases. The addition of one of the hydroxyl groups (OH) to the lysine sites produces hydroxylysine when undergoes chemical modification. Methods The method which is used in this study for identifying hydroxylysine sites based on powerful mathematical and statistical methodology incorporating the sequence-order effect and composition of each object within protein sequences. This predictor is called “iHyd-LysSite (EPSV)” (identifying hydroxylysine sites by extracting enhanced position and sequence variant technique). The prediction of hydroxylysine sites by experimental methods is difficult, laborious and highly expensive. In silico technique is an alternative approach to identify hydroxylysine sites in proteins. Results The experimental results require that the predictive model should have high sensitivity and specificity values and must be more accurate. The self-consistency, independent, 10-fold cross-validation and jackknife tests are performed for validation purposes. These tests are resulted by using three renowned classifiers, Neural Networks (NN), Random Forest (RF) and Support Vector Machine (SVM) with the demanding prediction rate. The overall predictive outcomes are extraordinarily superior to the results obtained by previous predictors. The proposed model contributed an excellent prediction rate in the system for NN, RF, and SVM classifiers. The sensitivity and specificity results using all these classifiers for jackknife test are 96.08%, 94.99%, 98.16% and 97.52%, 98.52%, 80.95%. Conclusion The results obtained by the proposed tool show that this method may meet the future demand of hydroxylysine sites with a better prediction rate over the existing methods.
Collapse
Affiliation(s)
- Muhammad Khalid Mahmood
- 1Department of Mathematics, University of the Punjab, Lahore, Pakistan; 2Faculty of Information Technology, University of Management and Tecnology, Lahore, Pakistan; 3Gordon Life Science Institute, Boston, MA02478, USA
| | - Asma Ehsan
- 1Department of Mathematics, University of the Punjab, Lahore, Pakistan; 2Faculty of Information Technology, University of Management and Tecnology, Lahore, Pakistan; 3Gordon Life Science Institute, Boston, MA02478, USA
| | - Yaser Daanial Khan
- 1Department of Mathematics, University of the Punjab, Lahore, Pakistan; 2Faculty of Information Technology, University of Management and Tecnology, Lahore, Pakistan; 3Gordon Life Science Institute, Boston, MA02478, USA
| | - Kuo-Chen Chou
- 1Department of Mathematics, University of the Punjab, Lahore, Pakistan; 2Faculty of Information Technology, University of Management and Tecnology, Lahore, Pakistan; 3Gordon Life Science Institute, Boston, MA02478, USA
| |
Collapse
|
24
|
Lv H, Dao FY, Guan ZX, Yang H, Li YW, Lin H. Deep-Kcr: accurate detection of lysine crotonylation sites using deep learning method. Brief Bioinform 2020; 22:5937175. [PMID: 33099604 DOI: 10.1093/bib/bbaa255] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 08/31/2020] [Accepted: 09/08/2020] [Indexed: 12/23/2022] Open
Abstract
As a newly discovered protein posttranslational modification, histone lysine crotonylation (Kcr) involved in cellular regulation and human diseases. Various proteomics technologies have been developed to detect Kcr sites. However, experimental approaches for identifying Kcr sites are often time-consuming and labor-intensive, which is difficult to widely popularize in large-scale species. Computational approaches are cost-effective and can be used in a high-throughput manner to generate relatively precise identification. In this study, we develop a deep learning-based method termed as Deep-Kcr for Kcr sites prediction by combining sequence-based features, physicochemical property-based features and numerical space-derived information with information gain feature selection. We investigate the performances of convolutional neural network (CNN) and five commonly used classifiers (long short-term memory network, random forest, LogitBoost, naive Bayes and logistic regression) using 10-fold cross-validation and independent set test. Results show that CNN could always display the best performance with high computational efficiency on large dataset. We also compare the Deep-Kcr with other existing tools to demonstrate the excellent predictive power and robustness of our method. Based on the proposed model, a webserver called Deep-Kcr was established and is freely accessible at http://lin-group.cn/server/Deep-Kcr.
Collapse
Affiliation(s)
- Hao Lv
- Center for Informational Biology at the University of Electronic Science and Technology of China
| | - Fu-Ying Dao
- Center for Informational Biology at the University of Electronic Science and Technology of China
| | - Zheng-Xing Guan
- Center for Informational Biology at the University of Electronic Science and Technology of China
| | - Hui Yang
- Center for Informational Biology at the University of Electronic Science and Technology of China
| | | | - Hao Lin
- Center for Informational Biology at the University of Electronic Science and Technology of China
| |
Collapse
|
25
|
Some illuminating remarks on molecular genetics and genomics as well as drug development. Mol Genet Genomics 2020; 295:261-274. [PMID: 31894399 DOI: 10.1007/s00438-019-01634-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 12/05/2019] [Indexed: 02/07/2023]
Abstract
Facing the explosive growth of biological sequences unearthed in the post-genomic age, one of the most important but also most difficult problems in computational biology is how to express a biological sequence with a discrete model or a vector, but still keep it with considerable sequence-order information or its special pattern. To deal with such a challenging problem, the ideas of "pseudo amino acid components" and "pseudo K-tuple nucleotide composition" have been proposed. The ideas and their approaches have further stimulated the birth for "distorted key theory", "wenxing diagram", and substantially strengthening the power in treating the multi-label systems, as well as the establishment of the famous "5-steps rule". All these logic developments are quite natural that are very useful not only for theoretical scientists but also for experimental scientists in conducting genetics/genomics analysis and drug development. Presented in this review paper are also their future perspectives; i.e., their impacts will become even more significant and propounding.
Collapse
|
26
|
Shao YT, Liu XX, Lu Z, Chou KC. pLoc_Deep-mHum: Predict Subcellular Localization of Human Proteins by Deep Learning. ACTA ACUST UNITED AC 2020. [DOI: 10.4236/ns.2020.127042] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
27
|
Lu Z, Chou KC. pLoc_Deep-mGpos: Predict Subcellular Localization of Gram Positive Bacteria Proteins by Deep Learning. ACTA ACUST UNITED AC 2020. [DOI: 10.4236/jbise.2020.135005] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
28
|
Shao Y, Chou KC. pLoc_Deep-mEuk: Predict Subcellular Localization of Eukaryotic Proteins by Deep Learning. ACTA ACUST UNITED AC 2020. [DOI: 10.4236/ns.2020.126034] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|