1
|
Zhra M, Qasem RJ, Aldossari F, Saleem R, Aljada A. A Comprehensive Exploration of Caspase Detection Methods: From Classical Approaches to Cutting-Edge Innovations. Int J Mol Sci 2024; 25:5460. [PMID: 38791499 PMCID: PMC11121653 DOI: 10.3390/ijms25105460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/07/2024] [Accepted: 05/09/2024] [Indexed: 05/26/2024] Open
Abstract
The activation of caspases is a crucial event and an indicator of programmed cell death, also known as apoptosis. These enzymes play a central role in cancer biology and are considered one promising target for current and future advancements in therapeutic interventions. Traditional methods of measuring caspase activity such as antibody-based methods provide fundamental insights into their biological functions, and are considered essential tools in the fields of cell and cancer biology, pharmacology and toxicology, and drug discovery. However, traditional methods, though extensively used, are now recognized as having various shortcomings. In addition, these methods fall short of providing solutions to and matching the needs of the rapid and expansive progress achieved in studying caspases. For these reasons, there has been a continuous improvement in detection methods for caspases and the network of pathways involved in their activation and downstream signaling. Over the past decade, newer methods based on cutting-edge state-of-the-art technologies have been introduced to the biomedical community. These methods enable both the temporal and spatial monitoring of the activity of caspases and their downstream substrates, and with enhanced accuracy and precision. These include fluorescent-labeled inhibitors (FLIs) for live imaging, single-cell live imaging, fluorescence resonance energy transfer (FRET) sensors, and activatable multifunctional probes for in vivo imaging. Recently, the recruitment of mass spectrometry (MS) techniques in the investigation of these enzymes expanded the repertoire of tools available for the identification and quantification of caspase substrates, cleavage products, and post-translational modifications in addition to unveiling the complex regulatory networks implicated. Collectively, these methods are enabling researchers to unravel much of the complex cellular processes involved in apoptosis, and are helping generate a clearer and comprehensive understanding of caspase-mediated proteolysis during apoptosis. Herein, we provide a comprehensive review of various assays and detection methods as they have evolved over the years, so to encourage further exploration of these enzymes, which should have direct implications for the advancement of therapeutics for cancer and other diseases.
Collapse
Affiliation(s)
- Mahmoud Zhra
- Department of Biochemistry and Molecular Medicine, College of Medicine, Alfaisal University, Riyadh 11533, Saudi Arabia
| | - Rani J. Qasem
- Department of Pharmacology and Pharmacy Practice, College of Pharmacy, Middle East University, Amman 11831, Jordan
| | - Fai Aldossari
- Zoology Department, College of Science, King Saud University, Riyadh 12372, Saudi Arabia
| | - Rimah Saleem
- Department of Biochemistry and Molecular Medicine, College of Medicine, Alfaisal University, Riyadh 11533, Saudi Arabia
| | - Ahmad Aljada
- Department of Biochemistry and Molecular Medicine, College of Medicine, Alfaisal University, Riyadh 11533, Saudi Arabia
| |
Collapse
|
2
|
Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2499:285-322. [PMID: 35696087 DOI: 10.1007/978-1-0716-2317-6_15] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Posttranslational modification (PTM ) is a ubiquitous phenomenon in both eukaryotes and prokaryotes which gives rise to enormous proteomic diversity. PTM mostly comes in two flavors: covalent modification to polypeptide chain and proteolytic cleavage. Understanding and characterization of PTM is a fundamental step toward understanding the underpinning of biology. Recent advances in experimental approaches, mainly mass-spectrometry-based approaches, have immensely helped in obtaining and characterizing PTMs. However, experimental approaches are not enough to understand and characterize more than 450 different types of PTMs and complementary computational approaches are becoming popular. Recently, due to the various advancements in the field of Deep Learning (DL), along with the explosion of applications of DL to various fields, the field of computational prediction of PTM has also witnessed the development of a plethora of deep learning (DL)-based approaches. In this book chapter, we first review some recent DL-based approaches in the field of PTM site prediction. In addition, we also review the recent advances in the not-so-studied PTM , that is, proteolytic cleavage predictions. We describe advances in PTM prediction by highlighting the Deep learning architecture, feature encoding, novelty of the approaches, and availability of the tools/approaches. Finally, we provide an outlook and possible future research directions for DL-based approaches for PTM prediction.
Collapse
|
3
|
Caspase-Mediated Cleavage of the Transcription Factor Sp3: Possible Relevance to Cancer and the Lytic Cycle of Kaposi's Sarcoma-Associated Herpesvirus. Microbiol Spectr 2022; 10:e0146421. [PMID: 35019687 PMCID: PMC8754129 DOI: 10.1128/spectrum.01464-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
The open reading frame 50 (ORF50) protein of Kaposi's sarcoma-associated herpesvirus (KSHV) is the master regulator essential for initiating the viral lytic cycle. Previously, we have demonstrated that the ORF50 protein can cooperate with Sp3 to synergistically activate a set of viral and cellular gene promoters through highly conserved ORF50-responsive elements that harbor a Sp3-binding motif. Herein, we show that Sp3 undergoes proteolytic cleavage during the viral lytic cycle, and the cleavage of Sp3 is dependent on caspase activation. Since similar cleavage patterns of Sp3 could be detected in both KSHV-positive and KSHV-negative lymphoma cells undergoing apoptosis, the proteolytic cleavage of Sp3 could be a common event during apoptosis. Mutational analysis identifies 12 caspase cleavage sites in Sp3, which are situated at the aspartate (D) positions D17, D19, D180, D273, D275, D293, D304 (or D307), D326, D344, D530, D543, and D565. Importantly, we noticed that three stable Sp3 C-terminal fragments generated through cleavage at D530, D543, or D565 encompass an intact DNA-binding domain. Like the full-length Sp3, the C-terminal fragments of Sp3 could still retain the ability to cooperate with ORF50 protein to activate specific viral and cellular gene promoters synergistically. Collectively, our findings suggest that despite the proteolytic cleavage of Sp3 under apoptotic conditions, the resultant Sp3 fragments may retain biological activities important for the viral lytic cycle or for cellular apoptosis. IMPORTANCE The ORF50 protein of Kaposi's sarcoma-associated herpesvirus (KSHV) is the key viral protein that controls the switch from latency to lytic reactivation. It is a potent transactivator that can activate target gene promoters via interacting with other cellular DNA-binding transcription factors, such as Sp3. In this report, we show that Sp3 is proteolytically cleaved during the viral lytic cycle, and up to 12 caspase cleavage sites are identified in Sp3. Despite the proteolytic cleavage of Sp3, several resulting C-terminal fragments that have intact zinc-finger DNA-binding domains still retain substantial influence in the synergy with ORF50 to activate specific gene promoters. Overall, our studies elucidate the caspase-mediated cleavage of Sp3 and uncover how ORF50 utilizes the cleavage fragments of Sp3 to transactivate specific viral and cellular gene promoters.
Collapse
|
4
|
Chen Z, Jiang Y, Zhang X, Zheng R, Qiu R, Sun Y, Zhao C, Shang H. ResNet18DNN: prediction approach of drug-induced liver injury by deep neural network with ResNet18. Brief Bioinform 2021; 23:6457162. [PMID: 34882224 DOI: 10.1093/bib/bbab503] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 09/27/2021] [Accepted: 11/02/2021] [Indexed: 01/22/2023] Open
Abstract
Drug-induced liver injury (DILI) has always been the focus of clinicians and drug researchers. How to improve the performance of the DILI prediction model to accurately predict liver injury was an urgent problem for researchers in the field of medical research. In order to solve this scientific problem, this research collected a comprehensive and accurate dataset of DILI with high recognition and high quality based on clinically confirmed DILI compound datasets, including 1446 chemical compounds. Then, the residual neural network with 18-layer by using more 5-layer blocks (ResNet18) with deep neural network (ResNet18DNN) model was proposed to predict DILI, which was an improved model for DILI prediction through vectorization of compound structure image. In predicting DILI, the ResNet18DNN learned greatly and outperformed the existing state-of-the-art DILI predictors. The results of DILI prediction model based on ResNet18DNN showed that the AUC (area under the curve), accuracy, recall, precision, F1-score and specificity of the training set were 0.973, 0.992, 0.995, 0.994, 0.995 and 0.975; those of test set were, respectively, 0.958, 0.976, 0.935, 0.947, 0.926 and 0.913, which were better than the performance of previously published described DILI prediction models. This method adopted ResNet18 embedding method to vectorize molecular structure images and the evaluation indicators of Resnet18DNN were obtained after 10 000 iterations. This prediction approach will greatly improve the performance of the predictive model of DILI and provide an accurate and precise early warning method for DILI in drug development and clinical medication.
Collapse
Affiliation(s)
- Zhao Chen
- Key Laboratory of Chinese Internal Medicine of Ministry of Education, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing 100700, China
| | - Yin Jiang
- Key Laboratory of Chinese Internal Medicine of Ministry of Education, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing 100700, China
| | - Xiaoyu Zhang
- Key Laboratory of Chinese Internal Medicine of Ministry of Education, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing 100700, China
| | - Rui Zheng
- Key Laboratory of Chinese Internal Medicine of Ministry of Education, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing 100700, China
| | - Ruijin Qiu
- Key Laboratory of Chinese Internal Medicine of Ministry of Education, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing 100700, China
| | - Yang Sun
- Key Laboratory of Chinese Internal Medicine of Ministry of Education, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing 100700, China
| | - Chen Zhao
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Hongcai Shang
- Key Laboratory of Chinese Internal Medicine of Ministry of Education, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing 100700, China.,College of Integrated Traditional Chinese and Western Medicine, Hunan University of Chinese Medicine, Changsha, Hunan 410208, China
| |
Collapse
|
5
|
Zhang D, Xu ZC, Su W, Yang YH, Lv H, Yang H, Lin H. iCarPS: a computational tool for identifying protein carbonylation sites by novel encoded features. Bioinformatics 2021; 37:171-177. [PMID: 32766811 DOI: 10.1093/bioinformatics/btaa702] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 07/12/2020] [Accepted: 07/28/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Protein carbonylation is one of the most important oxidative stress-induced post-translational modifications, which is generally characterized as stability, irreversibility and relative early formation. It plays a significant role in orchestrating various biological processes and has been already demonstrated to be related to many diseases. However, the experimental technologies for carbonylation sites identification are not only costly and time consuming, but also unable of processing a large number of proteins at a time. Thus, rapidly and effectively identifying carbonylation sites by computational methods will provide key clues for the analysis of occurrence and development of diseases. RESULTS In this study, we developed a predictor called iCarPS to identify carbonylation sites based on sequence information. A novel feature encoding scheme called residues conical coordinates combined with their physicochemical properties was proposed to formulate carbonylated protein and non-carbonylated protein samples. To remove potential redundant features and improve the prediction performance, a feature selection technique was used. The accuracy and robustness of iCarPS were proved by experiments on training and independent datasets. Comparison with other published methods demonstrated that the proposed method is powerful and could provide powerful performance for carbonylation sites identification. AVAILABILITY AND IMPLEMENTATION Based on the proposed model, a user-friendly webserver and a software package were constructed, which can be freely accessed at http://lin-group.cn/server/iCarPS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dan Zhang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhao-Chun Xu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen 333403, China
| | - Wei Su
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yu-He Yang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lv
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Yang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
6
|
Zhang ZM, Wang JS, Zulfiqar H, Lv H, Dao FY, Lin H. Early Diagnosis of Pancreatic Ductal Adenocarcinoma by Combining Relative Expression Orderings With Machine-Learning Method. Front Cell Dev Biol 2020; 8:582864. [PMID: 33178697 PMCID: PMC7593596 DOI: 10.3389/fcell.2020.582864] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Accepted: 09/15/2020] [Indexed: 12/16/2022] Open
Abstract
Pancreatic ductal adenocarcinoma (PDAC) is an aggressive and lethal cancer deeply affecting human health. Diagnosing early-stage PDAC is the key point to PDAC patients' survival. However, the biomarkers for diagnosing early PDAC are inexact in most cases. Therefore, it is highly desirable to identify an effective PDAC diagnostic biomarker. In the current work, we designed a novel computational approach based on within-sample relative expression orderings (REOs). A feature selection technique called minimum redundancy maximum relevance was used to pick out optimal REOs. We then compared the performances of different classification algorithms for discriminating PDAC and its adjacent normal tissues from non-PDAC tissues. The support vector machine algorithm is the best one for identifying early PDAC diagnostic biomarker. At first, a signature composed of nine gene pairs was acquired from microarray gene expression data sets. These gene pairs could produce satisfactory classification accuracy up to 97.53% in fivefold cross-validation. Subsequently, two types of data from diverse platforms, namely, microarray and RNA-Seq, were used to validate this signature. For microarray data, all (100.00%) of 115 PDAC tissues and all (100.00%) of 31 PDAC adjacent normal tissues were correctly recognized as PDAC. In addition, 88.24% of 17 non-PDAC (normal or pancreatitis) tissues were correctly classified. For the RNA-Seq data, all (100.00%) of 177 PDAC tissues and all (100.00%) of 4 PDAC adjacent normal tissues were correctly recognized as PDAC. Validation results demonstrated that the signature had a good cross-platform effect for early detection of PDAC. This work developed a new robust signature that might be a promising biomarker for early PDAC diagnosis.
Collapse
Affiliation(s)
- Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, Center for Informational Biology, School of Life Sciences and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jia-Shu Wang
- Key Laboratory for Neuro-Information of Ministry of Education, Center for Informational Biology, School of Life Sciences and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hasan Zulfiqar
- Key Laboratory for Neuro-Information of Ministry of Education, Center for Informational Biology, School of Life Sciences and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lv
- Key Laboratory for Neuro-Information of Ministry of Education, Center for Informational Biology, School of Life Sciences and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, Center for Informational Biology, School of Life Sciences and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, Center for Informational Biology, School of Life Sciences and Technology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
7
|
Román-Meléndez GD, Venkataraman T, Monaco DR, Larman HB. Protease Activity Profiling via Programmable Phage Display of Comprehensive Proteome-Scale Peptide Libraries. Cell Syst 2020; 11:375-381.e4. [PMID: 33099407 DOI: 10.1016/j.cels.2020.08.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Revised: 06/10/2020] [Accepted: 08/18/2020] [Indexed: 12/28/2022]
Abstract
Endopeptidases catalyze the internal cleavage of proteins, playing pivotal roles in protein turnover, substrate maturation, and the activation of signaling cascades. A broad range of biological functions in health and disease are controlled by proteases, yet assays to characterize their activities at a proteomic scale do not exist. To address this unmet need, we developed Sensing EndoPeptidase Activity via Release and recapture using flAnking Tag Epitopes (SEPARATE), which uses a monovalent phage display of the human proteome at a 90-aa peptide resolution. We demonstrate that SEPARATE is compatible with several human proteases from distinct catalytic classes, including caspase-1, ADAM17, and thrombin. Both well-characterized and newly identified substrates of these enzymes were detected in the assay. SEPARATE was used to discover a non-canonical caspase-1 substrate, the E3 ubiquitin ligase HUWE1, a key mediator of apoptotic cell death. SEPARATE enables efficient, unbiased assessment of endopeptidase activity by using a phage-displayed proteome. A record of this paper's Transparent Peer Review process is included in the Supplemental Information.
Collapse
Affiliation(s)
- Gabriel D Román-Meléndez
- Institute for Cell Engineering, Immunology Division, Department of Pathology, Johns Hopkins University, Baltimore, MD, USA 21205
| | - Thiagarajan Venkataraman
- Institute for Cell Engineering, Immunology Division, Department of Pathology, Johns Hopkins University, Baltimore, MD, USA 21205
| | - Daniel R Monaco
- Institute for Cell Engineering, Immunology Division, Department of Pathology, Johns Hopkins University, Baltimore, MD, USA 21205
| | - H Benjamin Larman
- Institute for Cell Engineering, Immunology Division, Department of Pathology, Johns Hopkins University, Baltimore, MD, USA 21205.
| |
Collapse
|
8
|
Dao FY, Lv H, Yang YH, Zulfiqar H, Gao H, Lin H. Computational identification of N6-methyladenosine sites in multiple tissues of mammals. Comput Struct Biotechnol J 2020; 18:1084-1091. [PMID: 32435427 PMCID: PMC7229270 DOI: 10.1016/j.csbj.2020.04.015] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Revised: 04/20/2020] [Accepted: 04/21/2020] [Indexed: 12/12/2022] Open
Abstract
N6-methyladenosine (m6A) is the methylation of the adenosine at the nitrogen-6 position, which is the most abundant RNA methylation modification and involves a series of important biological processes. Accurate identification of m6A sites in genome-wide is invaluable for better understanding their biological functions. In this work, an ensemble predictor named iRNA-m6A was established to identify m6A sites in multiple tissues of human, mouse and rat based on the data from high-throughput sequencing techniques. In the proposed predictor, RNA sequences were encoded by physical-chemical property matrix, mono-nucleotide binary encoding and nucleotide chemical property. Subsequently, these features were optimized by using minimum Redundancy Maximum Relevance (mRMR) feature selection method. Based on the optimal feature subset, the best m6A classification models were trained by Support Vector Machine (SVM) with 5-fold cross-validation test. Prediction results on independent dataset showed that our proposed method could produce the excellent generalization ability. We also established a user-friendly webserver called iRNA-m6A which can be freely accessible at http://lin-group.cn/server/iRNA-m6A. This tool will provide more convenience to users for studying m6A modification in different tissues.
Collapse
Affiliation(s)
| | | | - Yu-He Yang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hasan Zulfiqar
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Gao
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
9
|
Marini S, Vitali F, Rampazzi S, Demartini A, Akutsu T. Protease target prediction via matrix factorization. Bioinformatics 2019; 35:923-929. [PMID: 30169576 DOI: 10.1093/bioinformatics/bty746] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 08/20/2018] [Accepted: 08/27/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Protein cleavage is an important cellular event, involved in a myriad of processes, from apoptosis to immune response. Bioinformatics provides in silico tools, such as machine learning-based models, to guide the discovery of targets for the proteases responsible for protein cleavage. State-of-the-art models have a scope limited to specific protease families (such as Caspases), and do not explicitly include biological or medical knowledge (such as the hierarchical protein domain similarity or gene-gene interactions). To fill this gap, we present a novel approach for protease target prediction based on data integration. RESULTS By representing protease-protein target information in the form of relational matrices, we design a model (i) that is general and not limited to a single protease family, and (b) leverages on the available knowledge, managing extremely sparse data from heterogeneous data sources, including primary sequence, pathways, domains and interactions. When compared with other algorithms on test data, our approach provides a better performance even for models specifically focusing on a single protease family. AVAILABILITY AND IMPLEMENTATION https://gitlab.com/smarini/MaDDA/ (Matlab code and utilized data.). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Simone Marini
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Francesca Vitali
- Department of Medicine, Center for Biomedical Informatics and Biostatistics, BIO5 Institute), University of Arizona, Tucson, AZ, USA
| | - Sara Rampazzi
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Andrea Demartini
- Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan
| |
Collapse
|
10
|
Li F, Wang Y, Li C, Marquez-Lago TT, Leier A, Rawlings ND, Haffari G, Revote J, Akutsu T, Chou KC, Purcell AW, Pike RN, Webb GI, Ian Smith A, Lithgow T, Daly RJ, Whisstock JC, Song J. Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods. Brief Bioinform 2019; 20:2150-2166. [PMID: 30184176 PMCID: PMC6954447 DOI: 10.1093/bib/bby077] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 07/26/2018] [Accepted: 08/01/2018] [Indexed: 01/06/2023] Open
Abstract
The roles of proteolytic cleavage have been intensively investigated and discussed during the past two decades. This irreversible chemical process has been frequently reported to influence a number of crucial biological processes (BPs), such as cell cycle, protein regulation and inflammation. A number of advanced studies have been published aiming at deciphering the mechanisms of proteolytic cleavage. Given its significance and the large number of functionally enriched substrates targeted by specific proteases, many computational approaches have been established for accurate prediction of protease-specific substrates and their cleavage sites. Consequently, there is an urgent need to systematically assess the state-of-the-art computational approaches for protease-specific cleavage site prediction to further advance the existing methodologies and to improve the prediction performance. With this goal in mind, in this article, we carefully evaluated a total of 19 computational methods (including 8 scoring function-based methods and 11 machine learning-based methods) in terms of their underlying algorithm, calculated features, performance evaluation and software usability. Then, extensive independent tests were performed to assess the robustness and scalability of the reviewed methods using our carefully prepared independent test data sets with 3641 cleavage sites (specific to 10 proteases). The comparative experimental results demonstrate that PROSPERous is the most accurate generic method for predicting eight protease-specific cleavage sites, while GPS-CCD and LabCaS outperformed other predictors for calpain-specific cleavage sites. Based on our review, we then outlined some potential ways to improve the prediction performance and ease the computational burden by applying ensemble learning, deep learning, positive unlabeled learning and parallel and distributed computing techniques. We anticipate that our study will serve as a practical and useful guide for interested readers to further advance next-generation bioinformatics tools for protease-specific cleavage site prediction.
Collapse
Affiliation(s)
- Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yanan Wang
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Department of Biology, Institute of Molecular Systems Biology,ETH Zürich, Zürich 8093, Switzerland
| | - Tatiana T Marquez-Lago
- Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Neil D Rawlings
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Wellcome Trust Genome Campus,Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gholamreza Haffari
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Jerico Revote
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Anthony W Purcell
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Robert N Pike
- La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - A Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Trevor Lithgow
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, Victoria 3800, Australia
| | - Roger J Daly
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - James C Whisstock
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
11
|
Cheng N, Li M, Zhao L, Zhang B, Yang Y, Zheng CH, Xia J. Comparison and integration of computational methods for deleterious synonymous mutation prediction. Brief Bioinform 2019; 21:970-981. [DOI: 10.1093/bib/bbz047] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 03/28/2019] [Accepted: 03/29/2019] [Indexed: 01/03/2023] Open
Abstract
Abstract
Synonymous mutations do not change the encoded amino acids but may alter the structure or function of an mRNA in ways that impact gene function. Advances in next generation sequencing technologies have detected numerous synonymous mutations in the human genome. Several computational models have been proposed to predict deleterious synonymous mutations, which have greatly facilitated the development of this important field. Consequently, there is an urgent need to assess the state-of-the-art computational methods for deleterious synonymous mutation prediction to further advance the existing methodologies and to improve performance. In this regard, we systematically compared a total of 10 computational methods (including specific method for deleterious synonymous mutation and general method for single nucleotide mutation) in terms of the algorithms used, calculated features, performance evaluation and software usability. In addition, we constructed two carefully curated independent test datasets and accordingly assessed the robustness and scalability of these different computational methods for the identification of deleterious synonymous mutations. In an effort to improve predictive performance, we established an ensemble model, named Prediction of Deleterious Synonymous Mutation (PrDSM), which averages the ratings generated by the three most accurate predictors. Our benchmark tests demonstrated that the ensemble model PrDSM outperformed the reviewed tools for the prediction of deleterious synonymous mutations. Using the ensemble model, we developed an accessible online predictor, PrDSM, available at http://bioinfo.ahu.edu.cn:8080/PrDSM/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for deleterious synonymous mutation prediction.
Collapse
Affiliation(s)
- Na Cheng
- Institutes of Physical Science and Information Technology, Anhui University
| | - Menglu Li
- School of Computer Science and Technology, Anhui University
| | - Le Zhao
- School of Computer Science and Technology, Anhui University
| | - Bo Zhang
- School of Computer Science and Technology, Anhui University
| | - Yuhua Yang
- School of Computer Science and Technology, Anhui University
| | - Chun-Hou Zheng
- School of Computer Science and Technology, Anhui University
| | - Junfeng Xia
- Institutes of Physical Science and Information Technology, Anhui University
| |
Collapse
|
12
|
Radchenko T, Fontaine F, Morettoni L, Zamora I. Software-aided workflow for predicting protease-specific cleavage sites using physicochemical properties of the natural and unnatural amino acids in peptide-based drug discovery. PLoS One 2019; 14:e0199270. [PMID: 30620739 PMCID: PMC6324806 DOI: 10.1371/journal.pone.0199270] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 12/18/2018] [Indexed: 12/03/2022] Open
Abstract
Peptide drugs have been used in the treatment of multiple pathologies. During peptide discovery, it is crucially important to be able to map the potential sites of cleavages of the proteases. This knowledge is used to later chemically modify the peptide drug to adapt it for the therapeutic use, making peptide stable against individual proteases or in complex medias. In some other cases it needed to make it specifically unstable for some proteases, as peptides could be used as a system to target delivery drugs on specific tissues or cells. The information about proteases, their sites of cleavages and substrates are widely spread across publications and collected in databases such as MEROPS. Therefore, it is possible to develop models to improve the understanding of the potential peptide drug proteolysis. We propose a new workflow to derive protease specificity rules and predict the potential scissile bonds in peptides for individual proteases. WebMetabase stores the information from experimental or external sources in a chemically aware database where each peptide and site of cleavage is represented as a sequence of structural blocks connected by amide bonds and characterized by its physicochemical properties described by Volsurf descriptors. Thus, this methodology could be applied in the case of non-standard amino acid. A frequency analysis can be performed in WebMetabase to discover the most frequent cleavage sites. These results were used to train several models using logistic regression, support vector machine and ensemble tree classifiers to map cleavage sites for several human proteases from four different families (serine, cysteine, aspartic and matrix metalloproteases). Finally, we compared the predictive performance of the developed models with other available public tools PROSPERous and SitePrediction.
Collapse
Affiliation(s)
- Tatiana Radchenko
- Pompeu Fabra University, Barcelona, Spain
- Lead Molecular Design, S. L, Sant Cugat del Vallés, Spain
- * E-mail: (TR); (IZ)
| | | | | | - Ismael Zamora
- Pompeu Fabra University, Barcelona, Spain
- Lead Molecular Design, S. L, Sant Cugat del Vallés, Spain
- * E-mail: (TR); (IZ)
| |
Collapse
|