1
|
Radchenko T, Fontaine F, Morettoni L, Zamora I. Software-aided workflow for predicting protease-specific cleavage sites using physicochemical properties of the natural and unnatural amino acids in peptide-based drug discovery. PLoS One 2019; 14:e0199270. [PMID: 30620739 PMCID: PMC6324806 DOI: 10.1371/journal.pone.0199270] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 12/18/2018] [Indexed: 12/03/2022] Open
Abstract
Peptide drugs have been used in the treatment of multiple pathologies. During peptide discovery, it is crucially important to be able to map the potential sites of cleavages of the proteases. This knowledge is used to later chemically modify the peptide drug to adapt it for the therapeutic use, making peptide stable against individual proteases or in complex medias. In some other cases it needed to make it specifically unstable for some proteases, as peptides could be used as a system to target delivery drugs on specific tissues or cells. The information about proteases, their sites of cleavages and substrates are widely spread across publications and collected in databases such as MEROPS. Therefore, it is possible to develop models to improve the understanding of the potential peptide drug proteolysis. We propose a new workflow to derive protease specificity rules and predict the potential scissile bonds in peptides for individual proteases. WebMetabase stores the information from experimental or external sources in a chemically aware database where each peptide and site of cleavage is represented as a sequence of structural blocks connected by amide bonds and characterized by its physicochemical properties described by Volsurf descriptors. Thus, this methodology could be applied in the case of non-standard amino acid. A frequency analysis can be performed in WebMetabase to discover the most frequent cleavage sites. These results were used to train several models using logistic regression, support vector machine and ensemble tree classifiers to map cleavage sites for several human proteases from four different families (serine, cysteine, aspartic and matrix metalloproteases). Finally, we compared the predictive performance of the developed models with other available public tools PROSPERous and SitePrediction.
Collapse
Affiliation(s)
- Tatiana Radchenko
- Pompeu Fabra University, Barcelona, Spain
- Lead Molecular Design, S. L, Sant Cugat del Vallés, Spain
- * E-mail: (TR); (IZ)
| | | | | | - Ismael Zamora
- Pompeu Fabra University, Barcelona, Spain
- Lead Molecular Design, S. L, Sant Cugat del Vallés, Spain
- * E-mail: (TR); (IZ)
| |
Collapse
|
2
|
Abstract
Calpain, an intracellular Ca2+-dependent cysteine protease, is known to play a role in a wide range of metabolic pathways through limited proteolysis of its substrates. However, only a limited number of these substrates are currently known, with the exact mechanism of substrate recognition and cleavage by calpain still largely unknown.Current sequencing technologies have made it possible to compile large amounts of cleavage data and brought greater understanding of the underlying protein interactions. However, the practical impossibility of exhaustively retrieving substrate sequences through experimentation alone has created the need for efficient computational prediction methods. Such methods must be able to quickly mark substrate candidates and putative cleavage sites for further analysis. While many methods exist for both calpain and other types of proteolytic actions, the expected reliability of these methods depends heavily on the type and complexity of proteolytic action, as well as the availability of well-labeled experimental datasets, which both vary greatly across enzyme families.This chapter introduces CalCleaveMKL: a tool for calpain cleavage prediction based on multiple kernel learning, an extension to the classic support vector machine framework that is able to train complex models based on rich, heterogeneous feature sets, leading to significantly improved prediction quality. Along with its improved accuracy, the method used by CalCleaveMKL provided numerous insights on the respective importance of sequence-related features, such as solvent accessibility and secondary structure. It notably demonstrated there existed significant specificity differences across calpain subtypes, despite previous assumption to the contrary.An online implementation of this prediction tool is available at http://calpain.org .
Collapse
|
3
|
Kumar P, Bhadauria AS, Singh AK, Saha S. Betulinic acid as apoptosis activator: Molecular mechanisms, mathematical modeling and chemical modifications. Life Sci 2018; 209:24-33. [PMID: 30076920 DOI: 10.1016/j.lfs.2018.07.056] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Revised: 07/16/2018] [Accepted: 07/30/2018] [Indexed: 01/11/2023]
Abstract
A natural product betulinic acid (BA) has gained a huge significance in the recent years for its strong cytotoxicity. Surprisingly, in spite of being an interesting cancer protecting agent on a variety of tumor cells, the normal cells and tissues are rarely affected by BA. Betulinic acid and analogues (BAs) generally exert through the mechanisms that provokes an event of direct cell death and bypass the resistance to normal chemotherapeutics. Although the major mechanism associated with its ability to induce direct cell death is mitochondrial apoptosis, there are several other mechanisms explored recently. Importantly, mathematical modeling of apoptosis has been an important tool to explore the precise mechanism involved in mitochondrial apoptosis. Thus, this review is an endeavor to sum up the molecular mechanisms underlying the action of BA and future directions to apply mathematical modeling technique to better understand the precise mechanism of BA-induced apoptosis. The last section of the review encompasses the plausible structural modifications and formulations to enhance the therapeutic efficacy of BA.
Collapse
Affiliation(s)
- Pranesh Kumar
- Department of Pharmaceutical Sciences, Babasaheb Bhimrao Ambedkar University, Vidya Vihar, Raebareli Road, Lucknow 226025, India
| | - Archana S Bhadauria
- Department of Mathematics and Statistics, Deen Dayal Upadhyaya Gorakhpur University, Gorakhpur 273009, India
| | - Ashok K Singh
- Department of Pharmaceutical Sciences, Babasaheb Bhimrao Ambedkar University, Vidya Vihar, Raebareli Road, Lucknow 226025, India
| | - Sudipta Saha
- Department of Pharmaceutical Sciences, Babasaheb Bhimrao Ambedkar University, Vidya Vihar, Raebareli Road, Lucknow 226025, India.
| |
Collapse
|
4
|
Bianco F, Eisenman ST, Colmenares Aguilar MG, Bonora E, Clavenzani P, Linden DR, De Giorgio R, Farrugia G, Gibbons SJ. Expression of RAD21 immunoreactivity in myenteric neurons of the human and mouse small intestine. Neurogastroenterol Motil 2018; 30:e13429. [PMID: 30069982 PMCID: PMC6150808 DOI: 10.1111/nmo.13429] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Revised: 06/20/2018] [Accepted: 06/22/2018] [Indexed: 12/23/2022]
Abstract
BACKGROUND RAD21 is a double-strand-break repair protein and component of the cohesin complex with key roles in cellular functions. A RAD21 loss-of-function mutation was found in cases of chronic intestinal pseudo-obstruction (CIPO) with associated enteric neuronal loss. Analysis of RAD21 expression in the enteric nervous system is lacking, thus we aimed to characterize RAD21 immunoreactivity (IR) in myenteric ganglia. METHODS Double labeling immunofluorescence in mouse and human jejunum was used to determine colocalization of RAD21 with HuC/D, PGP9.5, neuronal nitric oxide synthase (nNOS), neuropeptide Y (NPY), choline acetyl transferase (ChAT), Kit, platelet-derived growth factor receptor-α (PDGFRα), and glial fibrillary acid protein (GFAP) IRs. RESULTS A subset of PGP9.5- and HuC/D-IR neuronal cell bodies and nerve fibers in the myenteric plexus of human and mouse small intestine also displayed cytoplasmic RAD21-IR Cytoplasmic RAD21-IR was found in 43% of HuC/D-IR neurons in adult and neonatal mice but did not colocalize with nNOS. A subset of ChAT-positive neurons had cytoplasmic RAD21-IR Punctate RAD21-IR was restricted to the nucleus in most cell types consistent with labeling of the cohesin complex. Cytoplasmic RAD21-IR was not detected in interstitial cells of Cajal, fibroblast-like cells or glia. Subsets of neurons in primary culture exhibited cytoplasmic RAD21-IR Suppression of RAD21 expression by shRNA knockdown abolished RAD21-IR in cultured neurons. CONCLUSIONS Our data showing cytoplasmic RAD21 expression in enteric neurons provide a basis toward understanding how mutations of this gene may contribute to altered neuronal function/survival thus leading to gut-motor abnormalities.
Collapse
Affiliation(s)
- F Bianco
- Department of Medical and Surgical Sciences (DIMEC), University of Bologna, Bologna, Italy
- Department of Veterinary Medical Sciences (DIMEVET), University of Bologna, Bologna, Italy
| | - S T Eisenman
- Enteric NeuroScience Program, Mayo Clinic, Rochester, MN, USA
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, USA
| | - M G Colmenares Aguilar
- Enteric NeuroScience Program, Mayo Clinic, Rochester, MN, USA
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, USA
| | - E Bonora
- Department of Medical and Surgical Sciences (DIMEC), University of Bologna, Bologna, Italy
| | - P Clavenzani
- Department of Veterinary Medical Sciences (DIMEVET), University of Bologna, Bologna, Italy
| | - D R Linden
- Enteric NeuroScience Program, Mayo Clinic, Rochester, MN, USA
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, USA
| | - R De Giorgio
- Department of Medical Sciences, Nuovo Arcispedale S.Anna, University of Ferrara, Ferrara, Italy
| | - G Farrugia
- Enteric NeuroScience Program, Mayo Clinic, Rochester, MN, USA
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, USA
| | - S J Gibbons
- Enteric NeuroScience Program, Mayo Clinic, Rochester, MN, USA
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
5
|
Fu L, Zhang S, Zhang L, Tong X, Zhang J, Zhang Y, Ouyang L, Liu B, Huang J. Systems biology network-based discovery of a small molecule activator BL-AD008 targeting AMPK/ZIPK and inducing apoptosis in cervical cancer. Oncotarget 2015; 6:8071-88. [PMID: 25797270 PMCID: PMC4480736 DOI: 10.18632/oncotarget.3513] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Accepted: 02/03/2015] [Indexed: 02/05/2023] Open
Abstract
The aim of this study was to discover a small molecule activator BL-AD008 targeting AMPK/ZIPK and inducing apoptosis in cervical cancer. In this study, we systematically constructed the global protein-protein interaction (PPI) network and predicted apoptosis-related protein connections by the Naïve Bayesian model. Then, we identified some classical apoptotic PPIs and other previously unrecognized PPIs between apoptotic kinases, such as AMPK and ZIPK. Subsequently, we screened a series of candidate compounds targeting AMPK/ZIPK, synthesized some compounds and eventually discovered a novel dual-target activator (BL-AD008). Moreover, we found BL-AD008 bear remarkable anti-proliferative activities toward cervical cancer cells and could induce apoptosis by death-receptor and mitochondrial pathways. Additionally, we found that BL-AD008-induced apoptosis was affected by the combination of AMPK and ZIPK. Then, we found that BL-AD008 bear its anti-tumor activities and induced apoptosis by targeting AMPK/ZIPK in vivo. In conclusion, these results demonstrate the ability of systems biology network to identify some key apoptotic kinase targets AMPK and ZIPK; thus providing a dual-target small molecule activator (BL-AD008) as a potential new apoptosis-modulating drug in future cervical cancer therapy.
Collapse
Affiliation(s)
- Leilei Fu
- State Key Laboratory of Biotherapy, Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Shouyue Zhang
- State Key Laboratory of Biotherapy, Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Lan Zhang
- State Key Laboratory of Biotherapy, Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
- School of Traditional Chinese Materia Medica, Shenyang Pharmaceutical University, Shenyang, China
| | - Xupeng Tong
- State Key Laboratory of Biotherapy, Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
- School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Jin Zhang
- School of Traditional Chinese Materia Medica, Shenyang Pharmaceutical University, Shenyang, China
| | - Yonghui Zhang
- State Key Laboratory of Biotherapy, Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
- Collaborative Innovation Center for Biotherapy, Department of Pharmacology & Pharmaceutical Sciences, School of Medicine, Tsinghua University, Beijing, China
| | - Liang Ouyang
- State Key Laboratory of Biotherapy, Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Bo Liu
- State Key Laboratory of Biotherapy, Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Jian Huang
- School of Traditional Chinese Materia Medica, Shenyang Pharmaceutical University, Shenyang, China
| |
Collapse
|
6
|
Nafis S, Kalaiarasan P, Brojen Singh RK, Husain M, Bamezai RNK. Apoptosis regulatory protein-protein interaction demonstrates hierarchical scale-free fractal network. Brief Bioinform 2014; 16:675-99. [PMID: 25256288 DOI: 10.1093/bib/bbu036] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2014] [Accepted: 08/21/2014] [Indexed: 12/29/2022] Open
Abstract
Dysregulation or inhibition of apoptosis favors cancer and many other diseases. Understanding of the network interaction of the genes involved in apoptotic pathway, therefore, is essential, to look for targets of therapeutic intervention. Here we used the network theory methods, using experimentally validated 25 apoptosis regulatory proteins and identified important genes for apoptosis regulation, which demonstrated a hierarchical scale-free fractal protein-protein interaction network. TP53, BRCA1, UBIQ and CASP3 were recognized as a four key regulators. BRCA1 and UBIQ were also individually found to control highly clustered modules and play an important role in the stability of the overall network. The connection among the BRCA1, UBIQ and TP53 proteins was found to be important for regulation, which controlled their own respective communities and the overall network topology. The feedback loop regulation motif was identified among NPM1, BRCA1 and TP53, and these crucial motif topologies were also reflected in high frequency. The propagation of the perturbed signal from hubs was found to be active upto some distance, after which propagation started decreasing and TP53 was the most efficient signal propagator. From the functional enrichment analysis, most of the apoptosis regulatory genes associated with cardiovascular diseases and highly expressed in brain tissues were identified. Apart from TP53, BRCA1 was observed to regulate apoptosis by influencing motif, propagation of signals and module regulation, reflecting their biological significance. In future, biochemical investigation of the observed hub-interacting partners could provide further understanding about their role in the pathophysiology of cancer.
Collapse
|
7
|
Schleich K, Lavrik IN. Mathematical modeling of apoptosis. Cell Commun Signal 2013; 11:44. [PMID: 23803157 PMCID: PMC3699383 DOI: 10.1186/1478-811x-11-44] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2013] [Accepted: 06/17/2013] [Indexed: 12/27/2022] Open
Abstract
Apoptosis is a form of programmed cell death, which is fundamental to all multicellular organisms. Deregulation of apoptosis leads to a number of severe diseases including cancer. Apoptosis is initiated either by extrinsic signals via stimulation of receptors at the cellular surface or intrinsic signals, such as DNA damage or growth factor withdrawal. Apoptosis has been extensively studied using systems biology which substantially contributed to the understanding of this death signaling network. This review gives an overview of mathematical models of apoptosis and the potential of systems biology to contribute to the development of novel therapies for cancer or other apoptosis-related diseases.
Collapse
Affiliation(s)
- Kolja Schleich
- Division of Immunogenetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Inna N Lavrik
- Department of Translational Inflammation, Institute of Experimental Internal Medicine, Otto von Guericke University, Magdeburg, Germany
| |
Collapse
|
8
|
PROSPER: an integrated feature-based tool for predicting protease substrate cleavage sites. PLoS One 2012; 7:e50300. [PMID: 23209700 PMCID: PMC3510211 DOI: 10.1371/journal.pone.0050300] [Citation(s) in RCA: 222] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Accepted: 10/18/2012] [Indexed: 12/04/2022] Open
Abstract
The ability to catalytically cleave protein substrates after synthesis is fundamental for all forms of life. Accordingly, site-specific proteolysis is one of the most important post-translational modifications. The key to understanding the physiological role of a protease is to identify its natural substrate(s). Knowledge of the substrate specificity of a protease can dramatically improve our ability to predict its target protein substrates, but this information must be utilized in an effective manner in order to efficiently identify protein substrates by in silico approaches. To address this problem, we present PROSPER, an integrated feature-based server for in silico identification of protease substrates and their cleavage sites for twenty-four different proteases. PROSPER utilizes established specificity information for these proteases (derived from the MEROPS database) with a machine learning approach to predict protease cleavage sites by using different, but complementary sequence and structure characteristics. Features used by PROSPER include local amino acid sequence profile, predicted secondary structure, solvent accessibility and predicted native disorder. Thus, for proteases with known amino acid specificity, PROSPER provides a convenient, pre-prepared tool for use in identifying protein substrates for the enzymes. Systematic prediction analysis for the twenty-four proteases thus far included in the database revealed that the features we have included in the tool strongly improve performance in terms of cleavage site prediction, as evidenced by their contribution to performance improvement in terms of identifying known cleavage sites in substrates for these enzymes. In comparison with two state-of-the-art prediction tools, PoPS and SitePrediction, PROSPER achieves greater accuracy and coverage. To our knowledge, PROSPER is the first comprehensive server capable of predicting cleavage sites of multiple proteases within a single substrate sequence using machine learning techniques. It is freely available at http://lightning.med.monash.edu.au/PROSPER/.
Collapse
|
9
|
duVerle DA, Mamitsuka H. A review of statistical methods for prediction of proteolytic cleavage. Brief Bioinform 2011; 13:337-49. [PMID: 22138323 DOI: 10.1093/bib/bbr059] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A fundamental component of systems biology, proteolytic cleavage is involved in nearly all aspects of cellular activities: from gene regulation to cell lifecycle regulation. Current sequencing technologies have made it possible to compile large amount of cleavage data and brought greater understanding of the underlying protein interactions. However, the practical impossibility to exhaustively retrieve substrate sequences through experimentation alone has long highlighted the need for efficient computational prediction methods. Such methods must be able to quickly mark substrate candidates and putative cleavage sites for further analysis. Available methods and expected reliability depend heavily on the type and complexity of proteolytic action, as well as the availability of well-labelled experimental data sets: factors varying greatly across enzyme families. For this review, we chose to give a quick overview of the general issues and challenges in cleavage prediction methods followed by a more in-depth presentation of major techniques and implementations, with a focus on two particular families of cysteine proteases: caspases and calpains. Through their respective differences in proteolytic specificity (high for caspases, broader for calpains) and data availability (much lower for calpains), we aimed to illustrate the strengths and limitations of techniques ranging from position-based matrices and decision trees to more flexible machine-learning methods such as hidden Markov models and Support Vector Machines. In addition to a technical overview for each family of algorithms, we tried to provide elements of evaluation and performance comparison across methods.
Collapse
Affiliation(s)
- David A duVerle
- Bioinformatics Center, Kyoto University, Uji, Kyoto 611-0011, Japan.
| | | |
Collapse
|
10
|
Ono Y, Sorimachi H. Calpains: an elaborate proteolytic system. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2011; 1824:224-36. [PMID: 21864727 DOI: 10.1016/j.bbapap.2011.08.005] [Citation(s) in RCA: 246] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2011] [Revised: 08/03/2011] [Accepted: 08/05/2011] [Indexed: 01/26/2023]
Abstract
Calpain is an intracellular Ca(2+)-dependent cysteine protease (EC 3.4.22.17; Clan CA, family C02). Recent expansion of sequence data across the species definitively shows that calpain has been present throughout evolution; calpains are found in almost all eukaryotes and some bacteria, but not in archaebacteria. Fifteen genes within the human genome encode a calpain-like protease domain. Interestingly, some human calpains, particularly those with non-classical domain structures, are very similar to calpain homologs identified in evolutionarily distant organisms. Three-dimensional structural analyses have helped to identify calpain's unique mechanism of activation; the calpain protease domain comprises two core domains that fuse to form a functional protease only when bound to Ca(2+)via well-conserved amino acids. This finding highlights the mechanistic characteristics shared by the numerous calpain homologs, despite the fact that they have divergent domain structures. In other words, calpains function through the same mechanism but are regulated independently. This article reviews the recent progress in calpain research, focusing on those studies that have helped to elucidate its mechanism of action. This article is part of a Special Issue entitled: Proteolysis 50 years after the discovery of lysosome.
Collapse
Affiliation(s)
- Yasuko Ono
- Calpain Project, Department of Advanced Science for Biomolecules, Tokyo Metropolitan Institute of medical Science, Tokyo, Japan.
| | | |
Collapse
|
11
|
Song J, Tan H, Boyd SE, Shen H, Mahmood K, Webb GI, Akutsu T, Whisstock JC, Pike RN. Bioinformatic approaches for predicting substrates of proteases. J Bioinform Comput Biol 2011; 9:149-78. [PMID: 21328711 DOI: 10.1142/s0219720011005288] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2010] [Revised: 10/08/2010] [Accepted: 10/09/2010] [Indexed: 11/18/2022]
Abstract
Proteases have central roles in "life and death" processes due to their important ability to catalytically hydrolyze protein substrates, usually altering the function and/or activity of the target in the process. Knowledge of the substrate specificity of a protease should, in theory, dramatically improve the ability to predict target protein substrates. However, experimental identification and characterization of protease substrates is often difficult and time-consuming. Thus solving the "substrate identification" problem is fundamental to both understanding protease biology and the development of therapeutics that target specific protease-regulated pathways. In this context, bioinformatic prediction of protease substrates may provide useful and experimentally testable information about novel potential cleavage sites in candidate substrates. In this article, we provide an overview of recent advances in developing bioinformatic approaches for predicting protease substrate cleavage sites and identifying novel putative substrates. We discuss the advantages and drawbacks of the current methods and detail how more accurate models can be built by deriving multiple sequence and structural features of substrates. We also provide some suggestions about how future studies might further improve the accuracy of protease substrate specificity prediction.
Collapse
Affiliation(s)
- Jiangning Song
- Department of Biochemistry and Molecular Biology, Monash University, Victoria 3800, Australia.
| | | | | | | | | | | | | | | | | |
Collapse
|
12
|
DuVerle DA, Ono Y, Sorimachi H, Mamitsuka H. Calpain cleavage prediction using multiple kernel learning. PLoS One 2011; 6:e19035. [PMID: 21559271 PMCID: PMC3086883 DOI: 10.1371/journal.pone.0019035] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2011] [Accepted: 03/23/2011] [Indexed: 11/19/2022] Open
Abstract
Calpain, an intracellular Ca²⁺-dependent cysteine protease, is known to play a role in a wide range of metabolic pathways through limited proteolysis of its substrates. However, only a limited number of these substrates are currently known, with the exact mechanism of substrate recognition and cleavage by calpain still largely unknown. While previous research has successfully applied standard machine-learning algorithms to accurately predict substrate cleavage by other similar types of proteases, their approach does not extend well to calpain, possibly due to its particular mode of proteolytic action and limited amount of experimental data. Through the use of Multiple Kernel Learning, a recent extension to the classic Support Vector Machine framework, we were able to train complex models based on rich, heterogeneous feature sets, leading to significantly improved prediction quality (6% over highest AUC score produced by state-of-the-art methods). In addition to producing a stronger machine-learning model for the prediction of calpain cleavage, we were able to highlight the importance and role of each feature of substrate sequences in defining specificity: primary sequence, secondary structure and solvent accessibility. Most notably, we showed there existed significant specificity differences across calpain sub-types, despite previous assumption to the contrary. Prediction accuracy was further successfully validated using, as an unbiased test set, mutated sequences of calpastatin (endogenous inhibitor of calpain) modified to no longer block calpain's proteolytic action. An online implementation of our prediction tool is available at http://calpain.org.
Collapse
|
13
|
Song J, Matthews AY, Reboul CF, Kaiserman D, Pike RN, Bird PI, Whisstock JC. Predicting serpin/protease interactions. Methods Enzymol 2011; 501:237-73. [PMID: 22078538 DOI: 10.1016/b978-0-12-385950-1.00012-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Proteases are tightly regulated by specific inhibitors, such as serpins, which are able to undergo considerable and irreversible conformational changes in order to trap their targets. There has been a considerable effort to investigate serpin structure and functions in the past few decades; however, the specific interactions between proteases and serpins remain elusive. In this chapter, we describe detailed experimental protocols to determine and characterize the extended substrate specificity of proteases based on a substrate phage display technique. We also describe how to employ a bioinformatics system to analyze the substrate specificity data obtained from this technique and predict the potential inhibitory serpin partners of a protease (in this case, the immune protease, granzyme B) in a step-by-step manner. The method described here could also be applied to other proteases for more generalized substrate specificity analysis and substrate discovery.
Collapse
Affiliation(s)
- Jiangning Song
- Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia
| | | | | | | | | | | | | |
Collapse
|
14
|
Maji P, Das C. Efficient design of bio-basis function to predict protein functional sites using kernel-based classifiers. IEEE Trans Nanobioscience 2010; 9:242-9. [PMID: 20889438 DOI: 10.1109/tnb.2010.2080684] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In order to apply the powerful kernel-based pattern recognition algorithms such as support vector machines to predict functional sites in proteins, amino acids need encoding prior to input. In this regard, a new string kernel function, termed as the modified bio-basis function, is proposed that maps a nonnumerical sequence space to a numerical feature space. The proposed string kernel function is developed based on the conventional bio-basis function and needs a bio-basis string as a support like conventional kernel function. The concept of zone of influence of a bio-basis string is introduced in the proposed kernel function to take into account the influence of each bio-basis string in nonnumerical sequence space. An efficient method is described to select a set of bio-basis strings for the proposed kernel function, integrating the Fisher ratio and a novel concept of degree of resemblance. The integration enables the method to select a reduced set of relevant and nonredundant bio-basis strings.
Collapse
Affiliation(s)
- Pradipta Maji
- Machine Intelligence Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata, 700 108, India.
| | | |
Collapse
|
15
|
Lavrik IN. Systems biology of apoptosis signaling networks. Curr Opin Biotechnol 2010; 21:551-5. [PMID: 20674332 DOI: 10.1016/j.copbio.2010.07.001] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2010] [Revised: 06/29/2010] [Accepted: 07/05/2010] [Indexed: 11/19/2022]
Abstract
Apoptosis is a complex but highly defined cellular program of cell demolition. Apoptosis can be triggered by the extrinsic or the intrinsic death pathways. Deregulation of apoptosis leads to a number of serious diseases, including cancer. A substantial progress in understanding apoptotic signaling networks has been recently achieved using systems biology. This review will give an overview of the contemporary models of apoptotic signaling networks. The potential of the dynamic models, which include all known components of the network, versus the importance of searching for new components of the networks using different screening techniques will be discussed. The further development of apoptotic signaling networks should provide ways to sensitize cells toward apoptosis and provide new therapies for cancer treatment.
Collapse
Affiliation(s)
- Inna N Lavrik
- Division of Immunogenetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany.
| |
Collapse
|
16
|
Piippo M, Lietzén N, Nevalainen OS, Salmi J, Nyman TA. Pripper: prediction of caspase cleavage sites from whole proteomes. BMC Bioinformatics 2010; 11:320. [PMID: 20546630 PMCID: PMC2893604 DOI: 10.1186/1471-2105-11-320] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2010] [Accepted: 06/15/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Caspases are a family of proteases that have central functions in programmed cell death (apoptosis) and inflammation. Caspases mediate their effects through aspartate-specific cleavage of their target proteins, and at present almost 400 caspase substrates are known. There are several methods developed to predict caspase cleavage sites from individual proteins, but currently none of them can be used to predict caspase cleavage sites from multiple proteins or entire proteomes, or to use several classifiers in combination. The possibility to create a database from predicted caspase cleavage products for the whole genome could significantly aid in identifying novel caspase targets from tandem mass spectrometry based proteomic experiments. RESULTS Three different pattern recognition classifiers were developed for predicting caspase cleavage sites from protein sequences. Evaluation of the classifiers with quality measures indicated that all of the three classifiers performed well in predicting caspase cleavage sites, and when combining different classifiers the accuracy increased further. A new tool, Pripper, was developed to utilize the classifiers and predict the caspase cut sites from an arbitrary number of input sequences. A database was constructed with the developed tool, and it was used to identify caspase target proteins from tandem mass spectrometry data from two different proteomic experiments. Both known caspase cleavage products as well as novel cleavage products were identified using the database demonstrating the usefulness of the tool. Pripper is not restricted to predicting only caspase cut sites, but it gives the possibility to scan protein sequences for any given motif(s) and predict cut sites once a suitable cut site prediction model for any other protease has been developed. Pripper is freely available and can be downloaded from http://users.utu.fi/mijopi/Pripper. CONCLUSIONS We have developed Pripper, a tool for reading an arbitrary number of proteins in FASTA format, predicting their caspase cleavage sites and outputting the cleaved sequences to a new FASTA format sequence file. We show that Pripper is a valuable tool in identifying novel caspase target proteins from modern proteomics experiments.
Collapse
Affiliation(s)
- Mirva Piippo
- Department of Information Technology, University of Turku, Turku, Finland.
| | | | | | | | | |
Collapse
|
17
|
Abstract
Neural networks are a class of intelligent learning machines establishing the relationships between descriptors of real-world objects. As optimisation tools they are also a class of computational algorithms implemented using statistical/numerical techniques for parameter estimate, model selection, and generalisation enhancement. In bioinformatics applications, neural networks have played an important role for classification, function approximation, knowledge discovery, and data visualisation. This chapter will focus on supervised neural networks and discuss their applications to bioinformatics.
Collapse
|
18
|
Barkan DT, Hostetter DR, Mahrus S, Pieper U, Wells JA, Craik CS, Sali A. Prediction of protease substrates using sequence and structure features. ACTA ACUST UNITED AC 2010; 26:1714-22. [PMID: 20505003 DOI: 10.1093/bioinformatics/btq267] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Granzyme B (GrB) and caspases cleave specific protein substrates to induce apoptosis in virally infected and neoplastic cells. While substrates for both types of proteases have been determined experimentally, there are many more yet to be discovered in humans and other metazoans. Here, we present a bioinformatics method based on support vector machine (SVM) learning that identifies sequence and structural features important for protease recognition of substrate peptides and then uses these features to predict novel substrates. Our approach can act as a convenient hypothesis generator, guiding future experiments by high-confidence identification of peptide-protein partners. RESULTS The method is benchmarked on the known substrates of both protease types, including our literature-curated GrB substrate set (GrBah). On these benchmark sets, the method outperforms a number of other methods that consider sequence only, predicting at a 0.87 true positive rate (TPR) and a 0.13 false positive rate (FPR) for caspase substrates, and a 0.79 TPR and a 0.21 FPR for GrB substrates. The method is then applied to approximately 25 000 proteins in the human proteome to generate a ranked list of predicted substrates of each protease type. Two of these predictions, AIF-1 and SMN1, were selected for further experimental analysis, and each was validated as a GrB substrate. AVAILABILITY All predictions for both protease types are publically available at http://salilab.org/peptide. A web server is at the same site that allows a user to train new SVM models to make predictions for any protein that recognizes specific oligopeptide ligands.
Collapse
Affiliation(s)
- David T Barkan
- Graduate Group in Bioinformatics, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | | | | | | | | | | | | |
Collapse
|
19
|
Song J, Tan H, Shen H, Mahmood K, Boyd SE, Webb GI, Akutsu T, Whisstock JC. Cascleave: towards more accurate prediction of caspase substrate cleavage sites. ACTA ACUST UNITED AC 2010; 26:752-60. [PMID: 20130033 DOI: 10.1093/bioinformatics/btq043] [Citation(s) in RCA: 132] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
MOTIVATION The caspase family of cysteine proteases play essential roles in key biological processes such as programmed cell death, differentiation, proliferation, necrosis and inflammation. The complete repertoire of caspase substrates remains to be fully characterized. Accordingly, systematic computational screening studies of caspase substrate cleavage sites may provide insight into the substrate specificity of caspases and further facilitating the discovery of putative novel substrates. RESULTS In this article we develop an approach (termed Cascleave) to predict both classical (i.e. following a P(1) Asp) and non-typical caspase cleavage sites. When using local sequence-derived profiles, Cascleave successfully predicted 82.2% of the known substrate cleavage sites, with a Matthews correlation coefficient (MCC) of 0.667. We found that prediction performance could be further improved by incorporating information such as predicted solvent accessibility and whether a cleavage sequence lies in a region that is most likely natively unstructured. Novel bi-profile Bayesian signatures were found to significantly improve the prediction performance and yielded the best performance with an overall accuracy of 87.6% and a MCC of 0.747, which is higher accuracy than published methods that essentially rely on amino acid sequence alone. It is anticipated that Cascleave will be a powerful tool for predicting novel substrate cleavage sites of caspases and shedding new insights on the unknown caspase-substrate interactivity relationship. AVAILABILITY http://sunflower.kuicr.kyoto-u.ac.jp/ approximately sjn/Cascleave/ CONTACT jiangning.song@med.monash.edu.au; takutsu@kuicr.kyoto-u.ac.jp; james; whisstock@med.monash.edu.au SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiangning Song
- Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.
| | | | | | | | | | | | | | | |
Collapse
|
20
|
Abstract
Peptides scanned from whole protein sequences are the core information for many peptide bioinformatics research such as functional site prediction, protein structure identification, and protein function recognition. In these applications, we normally need to assign a peptide to one of the given categories using a computer model. They are therefore referred to as peptide classification applications. Among various machine learning approaches, including neural networks, peptide machines have demonstrated excellent performance in many applications. This chapter discusses the basic concepts of peptide classification, commonly used feature extraction methods, three peptide machines, and some important issues in peptide classification.
Collapse
|
21
|
Wee LJK, Tan TW, Ranganathan S. CASVM: web server for SVM-based prediction of caspase substrates cleavage sites. Bioinformatics 2007; 23:3241-3. [PMID: 17599937 DOI: 10.1093/bioinformatics/btm334] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED Caspases belong to a unique class of cysteine proteases which function as critical effectors of apoptosis, inflammation and other important cellular processes. Caspases cleave substrates at specific tetrapeptide sites after a highly conserved aspartic acid residue. Prediction of such cleavage sites will complement structural and functional studies on substrates cleavage as well as discovery of new substrates. We have recently developed a support vector machines (SVM) method to address this issue. Our algorithm achieved an accuracy ranging from 81.25 to 97.92%, making it one of the best methods currently available. CASVM is the web server implementation of our SVM algorithms, written in Perl and hosted on a Linux platform. The server can be used for predicting non-canonical caspase substrate cleavage sites. We have also included a relational database containing experimentally verified caspase substrates retrievable using accession IDs, keywords or sequence similarity. AVAILABILITY http://www.casbase.org/casvm/index.html
Collapse
Affiliation(s)
- Lawrence J K Wee
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | | | | |
Collapse
|
22
|
Wee LJK, Tan TW, Ranganathan S. SVM-based prediction of caspase substrate cleavage sites. BMC Bioinformatics 2006; 7 Suppl 5:S14. [PMID: 17254298 PMCID: PMC1764470 DOI: 10.1186/1471-2105-7-s5-s14] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Caspases belong to a class of cysteine proteases which function as critical effectors in apoptosis and inflammation by cleaving substrates immediately after unique sites. Prediction of such cleavage sites will complement structural and functional studies on substrates cleavage as well as discovery of new substrates. Recently, different computational methods have been developed to predict the cleavage sites of caspase substrates with varying degrees of success. As the support vector machines (SVM) algorithm has been shown to be useful in several biological classification problems, we have implemented an SVM-based method to investigate its applicability to this domain. RESULTS A set of unique caspase substrates cleavage sites were obtained from literature and used for evaluating the SVM method. Datasets containing (i) the tetrapeptide cleavage sites, (ii) the tetrapeptide cleavage sites, augmented by two adjacent residues, P1' and P2' amino acids and (iii) the tetrapeptide cleavage sites with ten additional upstream and downstream flanking sequences (where available) were tested. The SVM method achieved an accuracy ranging from 81.25% to 97.92% on independent test sets. The SVM method successfully predicted the cleavage of a novel caspase substrate and its mutants. CONCLUSION This study presents an SVM approach for predicting caspase substrate cleavage sites based on the cleavage sites and the downstream and upstream flanking sequences. The method shows an improvement over existing methods and may be useful for predicting hitherto undiscovered cleavage sites.
Collapse
Affiliation(s)
- Lawrence JK Wee
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Tin Wee Tan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Shoba Ranganathan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Department of Chemistry and Biomolecular Sciences & Biotechnology Research Institute, Macquarie University, Sydney, Australia
| |
Collapse
|
23
|
Abstract
The relatively common occurrence of sequences within proteins that match the consensus substrate specificity of caspases in intracellular proteins suggests a multitude of substrates in vivo - somewhere in the order of several hundred in humans alone. Indeed, the list of proteins that are reported to be cleaved by caspases in vitro proliferates rapidly. However, only a few of these proteins have been rigorously established as biologically or pathologically relevant, bona fide substrates in vivo. Many of them probably simply represent 'innocent bystanders' or erroneous assignments. In this review we discuss concepts of caspase substrate recognition and specificity, give resources for the discovery and annotation of caspase substrates, and highlight some specific human or mouse proteins where there is strong evidence for biologic or pathologic relevance.
Collapse
Affiliation(s)
- J C Timmer
- Graduate Program in Molecular Pathology, University of California San Diego, La Jolla, CA 92037, USA
| | | |
Collapse
|
24
|
Sidhu A, Yang ZR. Prediction of signal peptides using bio-basis function neural networks and decision trees. ACTA ACUST UNITED AC 2006; 5:13-9. [PMID: 16539533 DOI: 10.2165/00822942-200605010-00002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Signal peptide identification is of immense importance in drug design. Accurate identification of signal peptides is the first critical step to be able to change the direction of the targeting proteins and use the designed drug to target a specific organelle to correct a defect. Because experimental identification is the most accurate method, but is expensive and time-consuming, an efficient and affordable automated system is of great interest. In this article, we propose using an adapted neural network, called a bio-basis function neural network, and decision trees for predicting signal peptides. The bio-basis function neural network model and decision trees achieved 97.16% and 97.63% accuracy respectively, demonstrating that the methods work well for the prediction of signal peptides. Moreover, decision trees revealed that position P(1'), which is important in forming signal peptides, most commonly comprises either leucine or alanine. This concurs with the (P(3)-P(1)-P(1')) coupling model.
Collapse
Affiliation(s)
- Ateesh Sidhu
- Biological Science, University of Warwick, Coventry, UK.
| | | |
Collapse
|
25
|
Yang ZR. Mining SARS-CoV protease cleavage data using non-orthogonal decision trees: a novel method for decisive template selection. Bioinformatics 2005; 21:2644-50. [PMID: 15797903 PMCID: PMC7197706 DOI: 10.1093/bioinformatics/bti404] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2004] [Revised: 02/07/2005] [Accepted: 03/22/2005] [Indexed: 12/02/2022] Open
Abstract
MOTIVATION Although the outbreak of the severe acute respiratory syndrome (SARS) is currently over, it is expected that it will return to attack human beings. A critical challenge to scientists from various disciplines worldwide is to study the specificity of cleavage activity of SARS-related coronavirus (SARS-CoV) and use the knowledge obtained from the study for effective inhibitor design to fight the disease. The most commonly used inductive programming methods for knowledge discovery from data assume that the elements of input patterns are orthogonal to each other. Suppose a sub-sequence is denoted as P2-P1-P1'-P2', the conventional inductive programming method may result in a rule like 'if P1 = Q, then the sub-sequence is cleaved, otherwise non-cleaved'. If the site P1 is not orthogonal to the others (for instance, P2, P1' and P2'), the prediction power of these kind of rules may be limited. Therefore this study is aimed at developing a novel method for constructing non-orthogonal decision trees for mining protease data. RESULT Eighteen sequences of coronavirus polyprotein were downloaded from NCBI (http://www.ncbi.nlm.nih.gov). Among these sequences, 252 cleavage sites were experimentally determined. These sequences were scanned using a sliding window with size k to generate about 50,000 k-mer sub-sequences (for short, k-mers). The value of k varies from 4 to 12 with a gap of two. The bio-basis function proposed by Thomson et al. is used to transform the k-mers to a high-dimensional numerical space on which an inductive programming method is applied for the purpose of deriving a decision tree for decision-making. The process of this transform is referred to as a bio-mapping. The constructed decision trees select about 10 out of 50,000 k-mers. This small set of selected k-mers is regarded as a set of decisive templates. By doing so, non-orthogonal decision trees are constructed using the selected templates and the prediction accuracy is significantly improved.
Collapse
Affiliation(s)
- Zheng Rong Yang
- Department of Computer Science, Exeter University, United Kingdom.
| |
Collapse
|