1
|
Nair S, Baker NE. Extramacrochaetae regulates Notch signaling in the Drosophila eye through non-apoptotic caspase activity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.04.560841. [PMID: 39131389 PMCID: PMC11312471 DOI: 10.1101/2023.10.04.560841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
Many cell fate decisions are determined transcriptionally. Accordingly, some fate specification is prevented by Inhibitor of DNA binding (Id) proteins that interfere with DNA binding by master regulatory transcription factors. We show that the Drosophila Id protein Extra macrochaetae (Emc) also affects developmental decisions by regulating caspase activity. Emc, which prevents proneural bHLH transcription factors from specifying neural cell fate, also prevents homodimerization of another bHLH protein, Daughterless (Da), and thereby maintains expression of the Death-Associated Inhibitor of Apoptosis (diap1) gene. Accordingly, we found that multiple effects of emc mutations on cell growth and on eye development were all caused by activation of caspases. These effects included acceleration of the morphogenetic furrow, failure of R7 photoreceptor cell specification, and delayed differentiation of non-neuronal cone cells. Within emc mutant clones, Notch signaling was elevated in the morphogenetic furrow, increasing morphogenetic furrow speed. This was associated with caspase-dependent increase in levels of Delta protein, the transmembrane ligand for Notch. Posterior to the morphogenetic furrow, elevated Delta cis-inhibited Notch signaling that was required for R7 specification and cone cell differentiation. Growth inhibition of emc mutant clones in wing imaginal discs also depended on caspases. Thus, emc mutations reveal the importance of restraining caspase activity even in non-apoptotic cells to prevent abnormal development, in the Drosophila eye through effects on Notch signaling.
Collapse
Affiliation(s)
- Sudershana Nair
- Department of Genetics, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461
- Present address: Department of Neuroscience and Physiology, NYU School of Medicine, 435 East 30 St, New York, NY
| | - Nicholas E. Baker
- Department of Genetics, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461
- Department of Developmental and Molecular Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461
- Department of Ophthalmology and Visual Sciences, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461
- Present address: Department of Microbiology and Molecular Genetics, University of California, Irvine, 2011 Biological Sciences 3, Irvine, CA 92697-2300
| |
Collapse
|
2
|
Mu L, Song J, Akutsu T, Mori T. DiCleave: a deep learning model for predicting human Dicer cleavage sites. BMC Bioinformatics 2024; 25:13. [PMID: 38195423 PMCID: PMC10775615 DOI: 10.1186/s12859-024-05638-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 01/03/2024] [Indexed: 01/11/2024] Open
Abstract
BACKGROUND MicroRNAs (miRNAs) are a class of non-coding RNAs that play a pivotal role as gene expression regulators. These miRNAs are typically approximately 20 to 25 nucleotides long. The maturation of miRNAs requires Dicer cleavage at specific sites within the precursor miRNAs (pre-miRNAs). Recent advances in machine learning-based approaches for cleavage site prediction, such as PHDcleav and LBSizeCleav, have been reported. ReCGBM, a gradient boosting-based model, demonstrates superior performance compared with existing methods. Nonetheless, ReCGBM operates solely as a binary classifier despite the presence of two cleavage sites in a typical pre-miRNA. Previous approaches have focused on utilizing only a fraction of the structural information in pre-miRNAs, often overlooking comprehensive secondary structure information. There is a compelling need for the development of a novel model to address these limitations. RESULTS In this study, we developed a deep learning model for predicting the presence of a Dicer cleavage site within a pre-miRNA segment. This model was enhanced by an autoencoder that learned the secondary structure embeddings of pre-miRNA. Benchmarking experiments demonstrated that the performance of our model was comparable to that of ReCGBM in the binary classification tasks. In addition, our model excelled in multi-class classification tasks, making it a more versatile and practical solution than ReCGBM. CONCLUSIONS Our proposed model exhibited superior performance compared with the current state-of-the-art model, underscoring the effectiveness of a deep learning approach in predicting Dicer cleavage sites. Furthermore, our model could be trained using only sequence and secondary structure information. Its capacity to accommodate multi-class classification tasks has enhanced the practical utility of our model.
Collapse
Affiliation(s)
- Lixuan Mu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, 611-0011, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, 611-0011, Japan
| | - Tomoya Mori
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, 611-0011, Japan.
| |
Collapse
|
3
|
Li X, Wang GA, Wei Z, Wang H, Zhu X. Protein-DNA interface hotspots prediction based on fusion features of embeddings of protein language model and handcrafted features. Comput Biol Chem 2023; 107:107970. [PMID: 37866116 DOI: 10.1016/j.compbiolchem.2023.107970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 10/06/2023] [Accepted: 10/07/2023] [Indexed: 10/24/2023]
Abstract
The identification of hotspot residues at the protein-DNA binding interfaces plays a crucial role in various aspects such as drug discovery and disease treatment. Although experimental methods such as alanine scanning mutagenesis have been developed to determine the hotspot residues on protein-DNA interfaces, they are both inefficient and costly. Therefore, it is highly necessary to develop efficient and accurate computational methods for predicting hotspot residues. Several computational methods have been developed, however, they are mainly based on hand-crafted features which may not be able to represent all the information of proteins. In this regard, we propose a model called PDH-EH, which utilizes fused features of embeddings extracted from a protein language model (PLM) and handcrafted features. After we extracted the total 1141 dimensional features, we used mRMR to select the optimal feature subset. Based on the optimal feature subset, several different learning algorithms such as Random Forest, Support Vector Machine, and XGBoost were used to build the models. The cross-validation results on the training dataset show that the model built by using Random Forest achieves the highest AUROC. Further evaluation on the independent test set shows that our model outperforms the existing state-of-the-art models. Moreover, the effectiveness and interpretability of embeddings extracted from PLM were demonstrated in our analysis. The codes and datasets used in this study are available at: https://github.com/lixiangli01/PDH-EH.
Collapse
Affiliation(s)
- Xiang Li
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Gang-Ao Wang
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Zhuoyu Wei
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Hong Wang
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China.
| |
Collapse
|
4
|
Li F, Wang C, Guo X, Akutsu T, Webb GI, Coin LJM, Kurgan L, Song J. ProsperousPlus: a one-stop and comprehensive platform for accurate protease-specific substrate cleavage prediction and machine-learning model construction. Brief Bioinform 2023; 24:bbad372. [PMID: 37874948 DOI: 10.1093/bib/bbad372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 08/30/2023] [Accepted: 09/29/2023] [Indexed: 10/26/2023] Open
Abstract
Proteases contribute to a broad spectrum of cellular functions. Given a relatively limited amount of experimental data, developing accurate sequence-based predictors of substrate cleavage sites facilitates a better understanding of protease functions and substrate specificity. While many protease-specific predictors of substrate cleavage sites were developed, these efforts are outpaced by the growth of the protease substrate cleavage data. In particular, since data for 100+ protease types are available and this number continues to grow, it becomes impractical to publish predictors for new protease types, and instead it might be better to provide a computational platform that helps users to quickly and efficiently build predictors that address their specific needs. To this end, we conceptualized, developed, tested and released a versatile bioinformatics platform, ProsperousPlus, that empowers users, even those with no programming or little bioinformatics background, to build fast and accurate predictors of substrate cleavage sites. ProsperousPlus facilitates the use of the rapidly accumulating substrate cleavage data to train, empirically assess and deploy predictive models for user-selected substrate types. Benchmarking tests on test datasets show that our platform produces predictors that on average exceed the predictive performance of current state-of-the-art approaches. ProsperousPlus is available as a webserver and a stand-alone software package at http://prosperousplus.unimelb-biotools.cloud.edu.au/.
Collapse
Affiliation(s)
- Fuyi Li
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
- South Australian immunoGENomics Cancer Institute (SAiGENCI), Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Cong Wang
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Xudong Guo
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Data Futures Institute, Monash University, VIC 3800, Australia
| | - Lachlan J M Coin
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Jiangning Song
- Monash Data Futures Institute, Monash University, VIC 3800, Australia
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
| |
Collapse
|
5
|
Maasch JRMA, Torres MDT, Melo MCR, de la Fuente-Nunez C. Molecular de-extinction of ancient antimicrobial peptides enabled by machine learning. Cell Host Microbe 2023; 31:1260-1274.e6. [PMID: 37516110 DOI: 10.1016/j.chom.2023.07.001] [Citation(s) in RCA: 38] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 05/12/2023] [Accepted: 07/06/2023] [Indexed: 07/31/2023]
Abstract
Molecular de-extinction could offer avenues for drug discovery by reintroducing bioactive molecules that are no longer encoded by extant organisms. To prospect for antimicrobial peptides encrypted within extinct and extant human proteins, we introduce the panCleave random forest model for proteome-wide cleavage site prediction. Our model outperformed multiple protease-specific cleavage site classifiers for three modern human caspases, despite its pan-protease design. Antimicrobial activity was observed in vitro for modern and archaic protein fragments identified with panCleave. Lead peptides showed resistance to proteolysis and exhibited variable membrane permeabilization. Additionally, representative modern and archaic protein fragments showed anti-infective efficacy against A. baumannii in both a skin abscess infection model and a preclinical murine thigh infection model. These results suggest that machine-learning-based encrypted peptide prospection can identify stable, nontoxic peptide antibiotics. Moreover, we establish molecular de-extinction through paleoproteome mining as a framework for antibacterial drug discovery.
Collapse
Affiliation(s)
- Jacqueline R M A Maasch
- Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104, USA; Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Bioengineering, Department of Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104, USA; Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Marcelo D T Torres
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Bioengineering, Department of Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104, USA; Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Marcelo C R Melo
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Bioengineering, Department of Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104, USA; Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Bioengineering, Department of Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104, USA; Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
6
|
Jarrin M, Kalligeraki AA, Uwineza A, Cawood CS, Brown AP, Ward EN, Le K, Freitag-Pohl S, Pohl E, Kiss B, Tapodi A, Quinlan RA. Independent Membrane Binding Properties of the Caspase Generated Fragments of the Beaded Filament Structural Protein 1 (BFSP1) Involves an Amphipathic Helix. Cells 2023; 12:1580. [PMID: 37371051 PMCID: PMC10297038 DOI: 10.3390/cells12121580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 06/04/2023] [Accepted: 06/05/2023] [Indexed: 06/29/2023] Open
Abstract
BACKGROUND BFSP1 (beaded filament structural protein 1) is a plasma membrane, Aquaporin 0 (AQP0/MIP)-associated intermediate filament protein expressed in the eye lens. BFSP1 is myristoylated, a post-translation modification that requires caspase cleavage at D433. Bioinformatic analyses suggested that the sequences 434-452 were α-helical and amphipathic. METHODS AND RESULTS By CD spectroscopy, we show that the addition of trifluoroethanol induced a switch from an intrinsically disordered to a more α-helical conformation for the residues 434-467. Recombinantly produced BFSP1 fragments containing this amphipathic helix bind to lens lipid bilayers as determined by surface plasmon resonance (SPR). Lastly, we demonstrate by transient transfection of non-lens MCF7 cells that these same BFSP1 C-terminal sequences localise to plasma membranes and to cytoplasmic vesicles. These can be co-labelled with the vital dye, lysotracker, but other cell compartments, such as the nuclear and mitochondrial membranes, were negative. The N-terminal myristoylation of the amphipathic helix appeared not to change either the lipid affinity or membrane localisation of the BFSP1 polypeptides or fragments we assessed by SPR and transient transfection, but it did appear to enhance its helical content. CONCLUSIONS These data support the conclusion that C-terminal sequences of human BFSP1 distal to the caspase site at G433 have independent membrane binding properties via an adjacent amphipathic helix.
Collapse
Affiliation(s)
- Miguel Jarrin
- Department of Biosciences, Upper Mountjoy Science Site, The University of Durham, South Road, Durham DH1 3LE, UK (R.A.Q.)
- Biophysical Sciences Institute, Durham University, Upper Mountjoy, South Road, Durham DH1 3LE, UK
| | - Alexia A. Kalligeraki
- Department of Biosciences, Upper Mountjoy Science Site, The University of Durham, South Road, Durham DH1 3LE, UK (R.A.Q.)
- Biophysical Sciences Institute, Durham University, Upper Mountjoy, South Road, Durham DH1 3LE, UK
| | - Alice Uwineza
- Department of Biosciences, Upper Mountjoy Science Site, The University of Durham, South Road, Durham DH1 3LE, UK (R.A.Q.)
- Biophysical Sciences Institute, Durham University, Upper Mountjoy, South Road, Durham DH1 3LE, UK
| | - Chris S. Cawood
- Department of Biosciences, Upper Mountjoy Science Site, The University of Durham, South Road, Durham DH1 3LE, UK (R.A.Q.)
- Biophysical Sciences Institute, Durham University, Upper Mountjoy, South Road, Durham DH1 3LE, UK
| | - Adrian P. Brown
- Department of Biosciences, Upper Mountjoy Science Site, The University of Durham, South Road, Durham DH1 3LE, UK (R.A.Q.)
| | - Edward N. Ward
- Department of Biosciences, Upper Mountjoy Science Site, The University of Durham, South Road, Durham DH1 3LE, UK (R.A.Q.)
- Biophysical Sciences Institute, Durham University, Upper Mountjoy, South Road, Durham DH1 3LE, UK
| | - Khoa Le
- Biophysical Sciences Institute, Durham University, Upper Mountjoy, South Road, Durham DH1 3LE, UK
- Department of Biological Structure, University of Washington, Seattle, WA 98195, USA
| | - Stefanie Freitag-Pohl
- Department of Chemistry, Durham University, Lower Mountjoy, South Road, Durham DH1 3LE, UK
| | - Ehmke Pohl
- Biophysical Sciences Institute, Durham University, Upper Mountjoy, South Road, Durham DH1 3LE, UK
- Department of Chemistry, Durham University, Lower Mountjoy, South Road, Durham DH1 3LE, UK
| | - Bence Kiss
- Department of Biochemistry and Medical Chemistry, Medical School, University of Pécs, 7624 Pécs, Hungary
| | - Antal Tapodi
- Department of Biosciences, Upper Mountjoy Science Site, The University of Durham, South Road, Durham DH1 3LE, UK (R.A.Q.)
- Biophysical Sciences Institute, Durham University, Upper Mountjoy, South Road, Durham DH1 3LE, UK
- Department of Biochemistry and Medical Chemistry, Medical School, University of Pécs, 7624 Pécs, Hungary
| | - Roy A. Quinlan
- Department of Biosciences, Upper Mountjoy Science Site, The University of Durham, South Road, Durham DH1 3LE, UK (R.A.Q.)
- Biophysical Sciences Institute, Durham University, Upper Mountjoy, South Road, Durham DH1 3LE, UK
- Department of Biological Structure, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
7
|
Wang H, Julien O. CaspSites: A Database and Web Application for Experimentally Observed Human Caspase Substrates Using N-Terminomics. J Proteome Res 2023; 22:454-461. [PMID: 36696595 DOI: 10.1021/acs.jproteome.2c00620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
CaspSites is a free-to-use database and web application for experimentally observed human caspase substrates using N-terminomics. It can be accessed and used by all users at the web URL www.caspsites.org. CaspSites stores cleavage site information identified for human caspases 1-9 in lysates and apoptotic cells, collected from their corresponding published studies. The database can be queried, viewed, and exported using the search page of the web application. The main parameters offered are protein substrate, cleavage site (P4-P4') residues, and individual caspase data sets, which can be connected using OR, AND, or NOT logical operators for custom user-built queries. CaspSites will be regularly updated with new experimental findings for understudied caspases, providing researchers insight into the distinctive roles human caspases play in cellular processes by identifying their target proteins in relation to each other.
Collapse
Affiliation(s)
- Henry Wang
- Department of Biochemistry, University of Alberta, Edmonton, Alberta T6G2H7, Canada
| | - Olivier Julien
- Department of Biochemistry, University of Alberta, Edmonton, Alberta T6G2H7, Canada
| |
Collapse
|
8
|
Henehan GT, Ryan BJ, Kinsella GK. Approaches to Avoid Proteolysis During Protein Expression and Purification. Methods Mol Biol 2023; 2699:77-95. [PMID: 37646995 DOI: 10.1007/978-1-0716-3362-5_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
All cells contain proteases, which hydrolyze the peptide bonds between amino acids of a protein backbone. Typically, proteases are prevented from nonspecific proteolysis by regulation and by their physical separation into different subcellular compartments; however, this segregation is not retained during cell lysis, which is the initial step in any protein isolation procedure. Prevention of proteolysis during protein purification often takes the form of a two-pronged approach: first, inhibition of proteolysis in situ, followed by the early separation of the protease from the protein of interest via chromatographic purification. Protease inhibitors are routinely used to limit the effect of the proteases before they are physically separated from the protein of interest via column chromatography. In this chapter, commonly used approaches to reducing or avoiding proteolysis during protein expression and purification are reviewed.
Collapse
Affiliation(s)
- Gary T Henehan
- School of Food Science and Environmental Health, Technological University Dublin, Grangegorman, Dublin, Ireland
| | - Barry J Ryan
- School of Food Science and Environmental Health, Technological University Dublin, Grangegorman, Dublin, Ireland
| | - Gemma K Kinsella
- School of Food Science and Environmental Health, Technological University Dublin, Grangegorman, Dublin, Ireland.
| |
Collapse
|
9
|
Bell PA, Scheuermann S, Renner F, Pan CL, Lu HY, Turvey SE, Bornancin F, Régnier CH, Overall CM. Integrating knowledge of protein sequence with protein function for the prediction and validation of new MALT1 substrates. Comput Struct Biotechnol J 2022; 20:4717-4732. [PMID: 36147669 PMCID: PMC9463181 DOI: 10.1016/j.csbj.2022.08.021] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 08/07/2022] [Accepted: 08/08/2022] [Indexed: 11/30/2022] Open
Affiliation(s)
- Peter A. Bell
- Centre for Blood Research, Life Sciences Centre, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Department of Oral Biological and Medical Sciences, Faculty of Dentistry, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | - Sophia Scheuermann
- Centre for Blood Research, Life Sciences Centre, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Department of Oral Biological and Medical Sciences, Faculty of Dentistry, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Department of Immunology, Eberhard Karl University Tübingen, 72076 Tübingen, Germany
- Department of Hematology and Oncology, University Hospital Tübingen, Children's Hospital, 72076 Tübingen, Germany
| | - Florian Renner
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4056 Basel, Switzerland
- Molecular Targeted Therapy - Discovery Oncology, Roche Pharma Research & Early Development, F. Hoffmann-La Roche Ltd, 4070 Basel, Switzerland
| | - Christina L. Pan
- Centre for Blood Research, Life Sciences Centre, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Department of Oral Biological and Medical Sciences, Faculty of Dentistry, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | - Henry Y. Lu
- Department of Pediatrics, British Columbia Children's Hospital, The University of British Columbia, Vancouver, BC V5Z 4H4, Canada
- Department of Experimental Medicine, Faculty of Medicine, The University of British Columbia, Vancouver, BC V5Z 1M9, Canada
| | - Stuart E. Turvey
- Department of Pediatrics, British Columbia Children's Hospital, The University of British Columbia, Vancouver, BC V5Z 4H4, Canada
- Department of Experimental Medicine, Faculty of Medicine, The University of British Columbia, Vancouver, BC V5Z 1M9, Canada
| | - Frédéric Bornancin
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4056 Basel, Switzerland
| | - Catherine H. Régnier
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4056 Basel, Switzerland
| | - Christopher M. Overall
- Centre for Blood Research, Life Sciences Centre, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Department of Oral Biological and Medical Sciences, Faculty of Dentistry, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Corresponding author at: Centre for Blood Research, Life Sciences Centre, University of British Columbia, Vancouver, BC V6T 1Z3, Canada.
| |
Collapse
|
10
|
Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2499:285-322. [PMID: 35696087 DOI: 10.1007/978-1-0716-2317-6_15] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Posttranslational modification (PTM ) is a ubiquitous phenomenon in both eukaryotes and prokaryotes which gives rise to enormous proteomic diversity. PTM mostly comes in two flavors: covalent modification to polypeptide chain and proteolytic cleavage. Understanding and characterization of PTM is a fundamental step toward understanding the underpinning of biology. Recent advances in experimental approaches, mainly mass-spectrometry-based approaches, have immensely helped in obtaining and characterizing PTMs. However, experimental approaches are not enough to understand and characterize more than 450 different types of PTMs and complementary computational approaches are becoming popular. Recently, due to the various advancements in the field of Deep Learning (DL), along with the explosion of applications of DL to various fields, the field of computational prediction of PTM has also witnessed the development of a plethora of deep learning (DL)-based approaches. In this book chapter, we first review some recent DL-based approaches in the field of PTM site prediction. In addition, we also review the recent advances in the not-so-studied PTM , that is, proteolytic cleavage predictions. We describe advances in PTM prediction by highlighting the Deep learning architecture, feature encoding, novelty of the approaches, and availability of the tools/approaches. Finally, we provide an outlook and possible future research directions for DL-based approaches for PTM prediction.
Collapse
|
11
|
Zhang S, Zhao L, Zheng CH, Xia J. A feature-based approach to predict hot spots in protein-DNA binding interfaces. Brief Bioinform 2021; 21:1038-1046. [PMID: 30957840 DOI: 10.1093/bib/bbz037] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 02/20/2019] [Accepted: 03/07/2019] [Indexed: 12/21/2022] Open
Abstract
DNA-binding hot spot residues of proteins are dominant and fundamental interface residues that contribute most of the binding free energy of protein-DNA interfaces. As experimental methods for identifying hot spots are expensive and time consuming, computational approaches are urgently required in predicting hot spots on a large scale. In this work, we systematically assessed a wide variety of 114 features from a combination of the protein sequence, structure, network and solvent accessible information and their combinations along with various feature selection strategies for hot spot prediction. We then trained and compared four commonly used machine learning models, namely, support vector machine (SVM), random forest, Naïve Bayes and k-nearest neighbor, for the identification of hot spots using 10-fold cross-validation and the independent test set. Our results show that (1) features based on the solvent accessible surface area have significant effect on hot spot prediction; (2) different but complementary features generally enhance the prediction performance; and (3) SVM outperforms other machine learning methods on both training and independent test sets. In an effort to improve predictive performance, we developed a feature-based method, namely, PrPDH (Prediction of Protein-DNA binding Hot spots), for the prediction of hot spots in protein-DNA binding interfaces using SVM based on the selected 10 optimal features. Comparative results on benchmark data sets indicate that our predictor is able to achieve generally better performance in predicting hot spots compared to the state-of-the-art predictors. A user-friendly web server for PrPDH is well established and is freely available at http://bioinfo.ahu.edu.cn:8080/PrPDH.
Collapse
Affiliation(s)
- Sijia Zhang
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Le Zhao
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Chun-Hou Zheng
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Junfeng Xia
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| |
Collapse
|
12
|
Liang X, Li F, Chen J, Li J, Wu H, Li S, Song J, Liu Q. Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification. Brief Bioinform 2021; 22:bbaa312. [PMID: 33316035 PMCID: PMC8294543 DOI: 10.1093/bib/bbaa312] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 09/30/2020] [Accepted: 08/25/2020] [Indexed: 12/13/2022] Open
Abstract
Anti-cancer peptides (ACPs) are known as potential therapeutics for cancer. Due to their unique ability to target cancer cells without affecting healthy cells directly, they have been extensively studied. Many peptide-based drugs are currently evaluated in the preclinical and clinical trials. Accurate identification of ACPs has received considerable attention in recent years; as such, a number of machine learning-based methods for in silico identification of ACPs have been developed. These methods promote the research on the mechanism of ACPs therapeutics against cancer to some extent. There is a vast difference in these methods in terms of their training/testing datasets, machine learning algorithms, feature encoding schemes, feature selection methods and evaluation strategies used. Therefore, it is desirable to summarize the advantages and disadvantages of the existing methods, provide useful insights and suggestions for the development and improvement of novel computational tools to characterize and identify ACPs. With this in mind, we firstly comprehensively investigate 16 state-of-the-art predictors for ACPs in terms of their core algorithms, feature encoding schemes, performance evaluation metrics and webserver/software usability. Then, comprehensive performance assessment is conducted to evaluate the robustness and scalability of the existing predictors using a well-prepared benchmark dataset. We provide potential strategies for the model performance improvement. Moreover, we propose a novel ensemble learning framework, termed ACPredStackL, for the accurate identification of ACPs. ACPredStackL is developed based on the stacking ensemble strategy combined with SVM, Naïve Bayesian, lightGBM and KNN. Empirical benchmarking experiments against the state-of-the-art methods demonstrate that ACPredStackL achieves a comparative performance for predicting ACPs. The webserver and source code of ACPredStackL is freely available at http://bigdata.biocie.cn/ACPredStackL/ and https://github.com/liangxiaoq/ACPredStackL, respectively.
Collapse
Affiliation(s)
- Xiao Liang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
- Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling, Shaanxi 712100, China
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Centre for Data Science, Monash University, Melbourne, VIC 3800, Australia
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia
| | - Jinxiang Chen
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Junlong Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Hao Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Shuqin Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
- Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling, Shaanxi 712100, China
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Centre for Data Science, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
- Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling, Shaanxi 712100, China
| |
Collapse
|
13
|
Zhang D, Xu ZC, Su W, Yang YH, Lv H, Yang H, Lin H. iCarPS: a computational tool for identifying protein carbonylation sites by novel encoded features. Bioinformatics 2021; 37:171-177. [PMID: 32766811 DOI: 10.1093/bioinformatics/btaa702] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 07/12/2020] [Accepted: 07/28/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Protein carbonylation is one of the most important oxidative stress-induced post-translational modifications, which is generally characterized as stability, irreversibility and relative early formation. It plays a significant role in orchestrating various biological processes and has been already demonstrated to be related to many diseases. However, the experimental technologies for carbonylation sites identification are not only costly and time consuming, but also unable of processing a large number of proteins at a time. Thus, rapidly and effectively identifying carbonylation sites by computational methods will provide key clues for the analysis of occurrence and development of diseases. RESULTS In this study, we developed a predictor called iCarPS to identify carbonylation sites based on sequence information. A novel feature encoding scheme called residues conical coordinates combined with their physicochemical properties was proposed to formulate carbonylated protein and non-carbonylated protein samples. To remove potential redundant features and improve the prediction performance, a feature selection technique was used. The accuracy and robustness of iCarPS were proved by experiments on training and independent datasets. Comparison with other published methods demonstrated that the proposed method is powerful and could provide powerful performance for carbonylation sites identification. AVAILABILITY AND IMPLEMENTATION Based on the proposed model, a user-friendly webserver and a software package were constructed, which can be freely accessed at http://lin-group.cn/server/iCarPS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dan Zhang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhao-Chun Xu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen 333403, China
| | - Wei Su
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yu-He Yang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lv
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Yang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
14
|
Liu P, Song J, Lin CY, Akutsu T. ReCGBM: a gradient boosting-based method for predicting human dicer cleavage sites. BMC Bioinformatics 2021; 22:63. [PMID: 33568063 PMCID: PMC7877110 DOI: 10.1186/s12859-021-03993-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 02/02/2021] [Indexed: 11/30/2022] Open
Abstract
Background Human dicer is an enzyme that cleaves pre-miRNAs into miRNAs. Several models have been developed to predict human dicer cleavage sites, including PHDCleav and LBSizeCleav. Given an input sequence, these models can predict whether the sequence contains a cleavage site. However, these models only consider each sequence independently and lack interpretability. Therefore, it is necessary to develop an accurate and explainable predictor, which employs relations between different sequences, to enhance the understanding of the mechanism by which human dicer cleaves pre-miRNA. Results In this study, we develop an accurate and explainable predictor for human dicer cleavage site – ReCGBM. We design relational features and class features as inputs to a lightGBM model. Computational experiments show that ReCGBM achieves the best performance compared to the existing methods. Further, we find that features in close proximity to the center of pre-miRNA are more important and make a significant contribution to the performance improvement of the developed method. Conclusions The results of this study show that ReCGBM is an interpretable and accurate predictor. Besides, the analyses of feature importance show that it might be of particular interest to consider more informative features close to the center of the pre-miRNA in future predictors.
Collapse
Affiliation(s)
- Pengyu Liu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, 611-0011, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia
| | - Chun-Yu Lin
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, 300, Taiwan.,Center for Intelligent Drug Systems and Smart Bio-devices, National Chiao Tung University, Hsinchu, 300, Taiwan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, 611-0011, Japan.
| |
Collapse
|
15
|
Kaushal P, Lee C. N-terminomics - its past and recent advancements. J Proteomics 2020; 233:104089. [PMID: 33359939 DOI: 10.1016/j.jprot.2020.104089] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 07/22/2020] [Accepted: 12/20/2020] [Indexed: 02/06/2023]
Abstract
N-terminomics is a rapidly evolving branch of proteomics that encompasses the study of protein N-terminal sequence. A proteome-wide collection of such sequences has been widely used to understand the proteolytic cascades and in annotating the genome. Over the last two decades, various N-terminomic strategies have been developed for achieving high sensitivity, greater depth of coverage, and high-throughputness. We, in this review, cover how the field of N-terminomics has evolved to date, including discussion on various sample preparation and N-terminal peptide enrichment strategies. We also compare different N-terminomic methods and highlight their relative benefits and shortcomings in their implementation. In addition, an overview of the currently available bioinformatics tools and data analysis pipelines for the annotation of N-terminomic datasets is also included. SIGNIFICANCE: It has been recognized that proteins undergo several post-translational modifications (PTM), and a number of perturbed biological pathways are directly associated with modifications at the terminal sites of a protein. In this regard, N-terminomics can be applied to generate a proteome-wide landscape of mature N-terminal sequences, annotate their source of generation, and recognize their significance in the biological pathways. Besides, a system-wide study can be used to study complicated proteolytic machinery and protease cleavage patterns for potential therapeutic targets. Moreover, due to unprecedented improvements in the analytical methods and mass spectrometry instrumentation in recent times, the N-terminomic methodologies now offers an unparalleled ability to study proteoforms and their implications in clinical conditions. Such approaches can further be applied for the detection of low abundant proteoforms, annotation of non-canonical protein coding sites, identification of candidate disease biomarkers, and, last but not least, the discovery of novel drug targets.
Collapse
Affiliation(s)
- Prashant Kaushal
- Center for Theragnosis, Korea Institute of Science and Technology, Seoul 02792, Republic of Korea; Division of Bio-Medical Science & Technology, KIST School, Korea University of Science and Technology, Seoul 02792, Republic of Korea
| | - Cheolju Lee
- Center for Theragnosis, Korea Institute of Science and Technology, Seoul 02792, Republic of Korea; Division of Bio-Medical Science & Technology, KIST School, Korea University of Science and Technology, Seoul 02792, Republic of Korea; KHU-KIST Department of Converging Science and Technology, Kyung Hee University, 26 Kyunghee-daero, Dongdaemun-gu, Seoul 02447, Republic of Korea.
| |
Collapse
|
16
|
Post-translational Modification of OTULIN Regulates Ubiquitin Dynamics and Cell Death. Cell Rep 2020; 29:3652-3663.e5. [PMID: 31825842 DOI: 10.1016/j.celrep.2019.11.014] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 09/24/2019] [Accepted: 11/04/2019] [Indexed: 11/23/2022] Open
Abstract
Linear ubiquitination has emerged as an important post-translational modification that regulates NF-κB activation, inflammation, and cell death in both immune and non-immune compartments, including the skin. The deubiquitinase OTULIN specifically disassembles linear ubiquitin chains generated by the linear ubiquitin assembly complex (LUBAC) and is necessary to prevent embryonic lethality and autoinflammatory disease. Here, we dissect the direct role of OTULIN in cell death and find that OTULIN limits apoptosis and necroptosis in keratinocytes. During apoptosis, OTULIN is cleaved by capase-3 at Asp-31 into a C-terminal fragment that restricts caspase activation and cell death. During necroptosis, OTULIN is hyper-phosphorylated at Tyr-56, which modulates RIPK1 ubiquitin dynamics and promotes cell death. OTULIN Tyr-56 phosphorylation is counteracted by the activity of dual-specificity phosphatase 14 (DUSP14), which we identify as an OTULIN phosphatase that limits necroptosis. Our data provide evidence of dynamic post-translational modifications of OTULIN and highlight their importance in cell death outcome.
Collapse
|
17
|
Douglas T, Saleh M. Cross-regulation between LUBAC and caspase-1 modulates cell death and inflammation. J Biol Chem 2020; 295:5216-5228. [PMID: 32122970 PMCID: PMC7170516 DOI: 10.1074/jbc.ra119.011622] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 02/12/2020] [Indexed: 11/06/2022] Open
Abstract
The linear ubiquitin assembly complex (LUBAC) is an essential component of the innate and adaptive immune system. Modification of cellular substrates with linear polyubiquitin chains is a key regulatory step in signal transduction that impacts cell death and inflammatory signaling downstream of various innate immunity receptors. Loss-of-function mutations in the LUBAC components HOIP and HOIL-1 yield a systemic autoinflammatory disease in humans, whereas their genetic ablation is embryonically lethal in mice. Deficiency of the LUBAC adaptor protein Sharpin results in a multi-organ inflammatory disease in mice characterized by chronic proliferative dermatitis (cpdm), which is propagated by TNFR1-induced and RIPK1-mediated keratinocyte cell death. We have previously shown that caspase-1 and -11 promoted the dermatitis pathology of cpdm mice and mediated cell death in the skin. Here, we describe a reciprocal regulation of caspase-1 and LUBAC activities in keratinocytes. We show that LUBAC interacted with caspase-1 via HOIP and modified its CARD domain with linear polyubiquitin and that depletion of HOIP or Sharpin resulted in heightened caspase-1 activation and cell death in response to inflammasome activation, unlike what is observed in macrophages. Reciprocally, caspase-1, as well as caspase-8, regulated LUBAC activity by proteolytically processing HOIP at Asp-348 and Asp-387 during the execution of cell death. HOIP processing impeded substrate ubiquitination in the NF-κB pathway and resulted in enhanced apoptosis. These results highlight a regulatory mechanism underlying efficient apoptosis in keratinocytes and provide further evidence of a cross-talk between inflammatory and cell death pathways.
Collapse
Affiliation(s)
- Todd Douglas
- Department of Microbiology and Immunology, McGill University, Montréal, Québec H3G 0B1, Canada
| | - Maya Saleh
- Department of Microbiology and Immunology, McGill University, Montréal, Québec H3G 0B1, Canada; Department of Medicine, McGill University, Montréal, Québec H3G 0B1, Canada.
| |
Collapse
|
18
|
Wei L, Luan S, Nagai LAE, Su R, Zou Q. Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species. Bioinformatics 2020; 35:1326-1333. [PMID: 30239627 DOI: 10.1093/bioinformatics/bty824] [Citation(s) in RCA: 126] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 09/12/2018] [Accepted: 09/18/2018] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) is recently shown to play crucial roles in restriction-modification systems. For better understanding of their functional mechanisms, it is fundamentally important to identify 4mC modification. Machine learning methods have recently emerged as an effective and efficient approach for the high-throughput identification of 4mC sites, although high predictive error rates are still challenging for existing methods. Therefore, it is highly desirable to develop a computational method to more accurately identify m4C sites. RESULTS In this study, we propose a machine learning based predictor, namely 4mcPred-SVM, for the genome-wide detection of DNA 4mC sites. In this predictor, we present a new feature representation algorithm that sufficiently exploits sequence-based information. To improve the feature representation ability, we use a two-step feature optimization strategy, thereby obtaining the most representative features. Using the resulting features and Support Vector Machine (SVM), we adaptively train the optimal models for different species. Comparative results on benchmark datasets from six species indicate that our predictor is able to achieve generally better performance in predicting 4mC sites as compared to the state-of-the-art predictors. Importantly, the sequence-based features can reliably and robust predict 4mC sites, facilitating the discovery of potentially important sequence characteristics for the prediction of 4mC sites. AVAILABILITY AND IMPLEMENTATION The user-friendly webserver that implements the proposed 4mcPred-SVM is well established, and is freely accessible at http://server.malab.cn/4mcPred-SVM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Leyi Wei
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Shasha Luan
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Luis Augusto Eijy Nagai
- Lab of Functional Analysis In Silico, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Ran Su
- School of Computer Software, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
19
|
Song J, Wang Y, Li F, Akutsu T, Rawlings ND, Webb GI, Chou KC. iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Brief Bioinform 2020; 20:638-658. [PMID: 29897410 PMCID: PMC6556904 DOI: 10.1093/bib/bby028] [Citation(s) in RCA: 124] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Revised: 03/02/2018] [Indexed: 01/03/2023] Open
Abstract
Regulation of proteolysis plays a critical role in a myriad of important cellular processes. The key to better understanding the mechanisms that control this process is to identify the specific substrates that each protease targets. To address this, we have developed iProt-Sub, a powerful bioinformatics tool for the accurate prediction of protease-specific substrates and their cleavage sites. Importantly, iProt-Sub represents a significantly advanced version of its successful predecessor, PROSPER. It provides optimized cleavage site prediction models with better prediction performance and coverage for more species-specific proteases (4 major protease families and 38 different proteases). iProt-Sub integrates heterogeneous sequence and structural features and uses a two-step feature selection procedure to further remove redundant and irrelevant features in an effort to improve the cleavage site prediction accuracy. Features used by iProt-Sub are encoded by 11 different sequence encoding schemes, including local amino acid sequence profile, secondary structure, solvent accessibility and native disorder, which will allow a more accurate representation of the protease specificity of approximately 38 proteases and training of the prediction models. Benchmarking experiments using cross-validation and independent tests showed that iProt-Sub is able to achieve a better performance than several existing generic tools. We anticipate that iProt-Sub will be a powerful tool for proteome-wide prediction of protease-specific substrates and their cleavage sites, and will facilitate hypothesis-driven functional interrogation of protease-specific substrate cleavage and proteolytic events.
Collapse
Affiliation(s)
- Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia.,Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia and ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Yanan Wang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, 611-0011, Japan
| | - Neil D Rawlings
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA and Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
20
|
Li F, Leier A, Liu Q, Wang Y, Xiang D, Akutsu T, Webb GI, Smith AI, Marquez-Lago T, Li J, Song J. Procleave: Predicting Protease-specific Substrate Cleavage Sites by Combining Sequence and Structural Information. GENOMICS, PROTEOMICS & BIOINFORMATICS 2020; 18:52-64. [PMID: 32413515 PMCID: PMC7393547 DOI: 10.1016/j.gpb.2019.08.002] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 08/08/2019] [Accepted: 10/23/2019] [Indexed: 10/29/2022]
Abstract
Proteases are enzymes that cleave and hydrolyse the peptide bonds between two specific amino acid residues of target substrate proteins. Protease-controlled proteolysis plays a key role in the degradation and recycling of proteins, which is essential for various physiological processes. Thus, solving the substrate identification problem will have important implications for the precise understanding of functions and physiological roles of proteases, as well as for therapeutic target identification and pharmaceutical applicability. Consequently, there is a great demand for bioinformatics methods that can predict novel substrate cleavage events with high accuracy by utilizing both sequence and structural information. In this study, we present Procleave, a novel bioinformatics approach for predicting protease-specific substrates and specific cleavage sites by taking into account both their sequence and 3D structural information. Structural features of known cleavage sites were represented by discrete values using a LOWESS data-smoothing optimization method, which turned out to be critical for the performance of Procleave. The optimal approximations of all structural parameter values were encoded in a conditional random field (CRF) computational framework, alongside sequence and chemical group-based features. Here, we demonstrate the outstanding performance of Procleave through extensive benchmarking and independent tests. Procleave is capable of correctly identifying most cleavage sites in the case study. Importantly, when applied to the human structural proteome encompassing 17,628 protein structures, Procleave suggests a number of potential novel target substrates and their corresponding cleavage sites of different proteases. Procleave is implemented as a webserver and is freely accessible at http://procleave.erc.monash.edu/.
Collapse
Affiliation(s)
- Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Andre Leier
- School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35233, USA
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Yanan Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Dongxu Xiang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - A Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Tatiana Marquez-Lago
- School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35233, USA.
| | - Jian Li
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia.
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia; ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia.
| |
Collapse
|
21
|
Marini S, Vitali F, Rampazzi S, Demartini A, Akutsu T. Protease target prediction via matrix factorization. Bioinformatics 2019; 35:923-929. [PMID: 30169576 DOI: 10.1093/bioinformatics/bty746] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 08/20/2018] [Accepted: 08/27/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Protein cleavage is an important cellular event, involved in a myriad of processes, from apoptosis to immune response. Bioinformatics provides in silico tools, such as machine learning-based models, to guide the discovery of targets for the proteases responsible for protein cleavage. State-of-the-art models have a scope limited to specific protease families (such as Caspases), and do not explicitly include biological or medical knowledge (such as the hierarchical protein domain similarity or gene-gene interactions). To fill this gap, we present a novel approach for protease target prediction based on data integration. RESULTS By representing protease-protein target information in the form of relational matrices, we design a model (i) that is general and not limited to a single protease family, and (b) leverages on the available knowledge, managing extremely sparse data from heterogeneous data sources, including primary sequence, pathways, domains and interactions. When compared with other algorithms on test data, our approach provides a better performance even for models specifically focusing on a single protease family. AVAILABILITY AND IMPLEMENTATION https://gitlab.com/smarini/MaDDA/ (Matlab code and utilized data.). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Simone Marini
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Francesca Vitali
- Department of Medicine, Center for Biomedical Informatics and Biostatistics, BIO5 Institute), University of Arizona, Tucson, AZ, USA
| | - Sara Rampazzi
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Andrea Demartini
- Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan
| |
Collapse
|
22
|
Li F, Wang Y, Li C, Marquez-Lago TT, Leier A, Rawlings ND, Haffari G, Revote J, Akutsu T, Chou KC, Purcell AW, Pike RN, Webb GI, Ian Smith A, Lithgow T, Daly RJ, Whisstock JC, Song J. Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods. Brief Bioinform 2019; 20:2150-2166. [PMID: 30184176 PMCID: PMC6954447 DOI: 10.1093/bib/bby077] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 07/26/2018] [Accepted: 08/01/2018] [Indexed: 01/06/2023] Open
Abstract
The roles of proteolytic cleavage have been intensively investigated and discussed during the past two decades. This irreversible chemical process has been frequently reported to influence a number of crucial biological processes (BPs), such as cell cycle, protein regulation and inflammation. A number of advanced studies have been published aiming at deciphering the mechanisms of proteolytic cleavage. Given its significance and the large number of functionally enriched substrates targeted by specific proteases, many computational approaches have been established for accurate prediction of protease-specific substrates and their cleavage sites. Consequently, there is an urgent need to systematically assess the state-of-the-art computational approaches for protease-specific cleavage site prediction to further advance the existing methodologies and to improve the prediction performance. With this goal in mind, in this article, we carefully evaluated a total of 19 computational methods (including 8 scoring function-based methods and 11 machine learning-based methods) in terms of their underlying algorithm, calculated features, performance evaluation and software usability. Then, extensive independent tests were performed to assess the robustness and scalability of the reviewed methods using our carefully prepared independent test data sets with 3641 cleavage sites (specific to 10 proteases). The comparative experimental results demonstrate that PROSPERous is the most accurate generic method for predicting eight protease-specific cleavage sites, while GPS-CCD and LabCaS outperformed other predictors for calpain-specific cleavage sites. Based on our review, we then outlined some potential ways to improve the prediction performance and ease the computational burden by applying ensemble learning, deep learning, positive unlabeled learning and parallel and distributed computing techniques. We anticipate that our study will serve as a practical and useful guide for interested readers to further advance next-generation bioinformatics tools for protease-specific cleavage site prediction.
Collapse
Affiliation(s)
- Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yanan Wang
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Department of Biology, Institute of Molecular Systems Biology,ETH Zürich, Zürich 8093, Switzerland
| | - Tatiana T Marquez-Lago
- Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Neil D Rawlings
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Wellcome Trust Genome Campus,Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gholamreza Haffari
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Jerico Revote
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Anthony W Purcell
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Robert N Pike
- La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - A Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Trevor Lithgow
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, Victoria 3800, Australia
| | - Roger J Daly
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - James C Whisstock
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
23
|
Radchenko T, Fontaine F, Morettoni L, Zamora I. WebMetabase: cleavage sites analysis tool for natural and unnatural substrates from diverse data source. Bioinformatics 2019; 35:650-655. [PMID: 30052776 DOI: 10.1093/bioinformatics/bty667] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Revised: 06/29/2018] [Accepted: 07/24/2018] [Indexed: 11/14/2022] Open
Abstract
SUMMARY More than 150 peptide therapeutics are globally in clinical development. Many enzymatic barriers should be crossed by a successful drug to be prosperous in such a process. Therefore, the new peptide drugs must be designed preventing the potential protease cleavage to make the compound less susceptible to protease reaction. We present a new data analysis tool developed in WebMetabase, an approach that stores the information from liquid chromatography mass spectrometry-based experimental data or from external sources such as the MEROPS database. The tool is a chemically aware system where each peptide substrate is presented as a sequence of structural blocks (SBs) connected by amide bonds and not being limited to the natural amino acids. Each SB is characterized by its pharmacophoric and physicochemical properties including a similarity score that describes likelihood between a SB and each one of the other SBs in the database. This methodology can be used to perform a frequency analysis to discover the most frequent cleavage sites for similar amide bonds, defined based on the similarity of the SB that participate in such a bond within the experimentally derived and/or public database. AVAILABILITY AND IMPLEMENTATION http://webmetabase.com:8182/WebMetabaseBioinformatics/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tatiana Radchenko
- Pompeu Fabra University, Barcelona, Spain.,Lead Molecular Design, S.L., Sant Cugat del Vallés, Spain
| | | | | | - Ismael Zamora
- Pompeu Fabra University, Barcelona, Spain.,Lead Molecular Design, S.L., Sant Cugat del Vallés, Spain
| |
Collapse
|
24
|
Wang F, Guan ZX, Dao FY, Ding H. A Brief Review of the Computational Identification of Antifreeze Protein. CURR ORG CHEM 2019. [DOI: 10.2174/1385272823666190718145613] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Lots of cold-adapted organisms could produce antifreeze proteins (AFPs) to counter the freezing of cell fluids by controlling the growth of ice crystal. AFPs have been found in various species such as in vertebrates, invertebrates, plants, bacteria, and fungi. These AFPs from fish, insects and plants displayed a high diversity. Thus, the identification of the AFPs is a challenging task in computational proteomics. With the accumulation of AFPs and development of machine meaning methods, it is possible to construct a high-throughput tool to timely identify the AFPs. In this review, we briefly reviewed the application of machine learning methods in antifreeze proteins identification from difference section, including published benchmark dataset, sequence descriptor, classification algorithms and published methods. We hope that this review will produce new ideas and directions for the researches in identifying antifreeze proteins.
Collapse
Affiliation(s)
- Fang Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
25
|
Li F, Chen J, Leier A, Marquez-Lago T, Liu Q, Wang Y, Revote J, Smith AI, Akutsu T, Webb GI, Kurgan L, Song J. DeepCleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites. Bioinformatics 2019; 36:1057-1065. [PMID: 31566664 PMCID: PMC8215920 DOI: 10.1093/bioinformatics/btz721] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 08/13/2019] [Accepted: 09/25/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Proteases are enzymes that cleave target substrate proteins by catalyzing the hydrolysis of peptide bonds between specific amino acids. While the functional proteolysis regulated by proteases plays a central role in the 'life and death' cellular processes, many of the corresponding substrates and their cleavage sites were not found yet. Availability of accurate predictors of the substrates and cleavage sites would facilitate understanding of proteases' functions and physiological roles. Deep learning is a promising approach for the development of accurate predictors of substrate cleavage events. RESULTS We propose DeepCleave, the first deep learning-based predictor of protease-specific substrates and cleavage sites. DeepCleave uses protein substrate sequence data as input and employs convolutional neural networks with transfer learning to train accurate predictive models. High predictive performance of our models stems from the use of high-quality cleavage site features extracted from the substrate sequences through the deep learning process, and the application of transfer learning, multiple kernels and attention layer in the design of the deep network. Empirical tests against several related state-of-the-art methods demonstrate that DeepCleave outperforms these methods in predicting caspase and matrix metalloprotease substrate-cleavage sites. AVAILABILITY AND IMPLEMENTATION The DeepCleave webserver and source code are freely available at http://deepcleave.erc.monash.edu/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - André Leier
- Department of Genetics, USA,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Tatiana Marquez-Lago
- Department of Genetics, USA,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Yanze Wang
- College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Jerico Revote
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - A Ian Smith
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | | | | |
Collapse
|
26
|
Bao Y, Marini S, Tamura T, Kamada M, Maegawa S, Hosokawa H, Song J, Akutsu T. Toward more accurate prediction of caspase cleavage sites: a comprehensive review of current methods, tools and features. Brief Bioinform 2019; 20:1669-1684. [PMID: 29860277 PMCID: PMC6917222 DOI: 10.1093/bib/bby041] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 04/16/2018] [Indexed: 12/20/2022] Open
Abstract
As one of the few irreversible protein posttranslational modifications, proteolytic cleavage is involved in nearly all aspects of cellular activities, ranging from gene regulation to cell life-cycle regulation. Among the various protease-specific types of proteolytic cleavage, cleavages by casapses/granzyme B are considered as essential in the initiation and execution of programmed cell death and inflammation processes. Although a number of substrates for both types of proteolytic cleavage have been experimentally identified, the complete repertoire of caspases and granzyme B substrates remains to be fully characterized. To tackle this issue and complement experimental efforts for substrate identification, systematic bioinformatics studies of known cleavage sites provide important insights into caspase/granzyme B substrate specificity, and facilitate the discovery of novel substrates. In this article, we review and benchmark 12 state-of-the-art sequence-based bioinformatics approaches and tools for caspases/granzyme B cleavage prediction. We evaluate and compare these methods in terms of their input/output, algorithms used, prediction performance, validation methods and software availability and utility. In addition, we construct independent data sets consisting of caspases/granzyme B substrates from different species and accordingly assess the predictive power of these different predictors for the identification of cleavage sites. We find that the prediction results are highly variable among different predictors. Furthermore, we experimentally validate the predictions of a case study by performing caspase cleavage assay. We anticipate that this comprehensive review and survey analysis will provide an insightful resource for biologists and bioinformaticians who are interested in using and/or developing tools for caspase/granzyme B cleavage prediction.
Collapse
Affiliation(s)
- Yu Bao
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Simone Marini
- Department of Computational Medicine and Bioinformatics, University of Michigan, 1241 E. Catherine St., 5940 Buhl, Ann Arbor 48109-5618, USA
| | - Takeyuki Tamura
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Mayumi Kamada
- Graduate School of Medicine, Kyoto University, Sakyo-ku, Kyoto 606-8507, Japan
| | - Shingo Maegawa
- Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Hiroshi Hosokawa
- Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash Centre for Data Science and ARC Centre of Excellence in Advance Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| |
Collapse
|
27
|
Lv H, Dao FY, Guan ZX, Zhang D, Tan JX, Zhang Y, Chen W, Lin H. iDNA6mA-Rice: A Computational Tool for Detecting N6-Methyladenine Sites in Rice. Front Genet 2019; 10:793. [PMID: 31552096 PMCID: PMC6746913 DOI: 10.3389/fgene.2019.00793] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Accepted: 07/26/2019] [Indexed: 01/08/2023] Open
Abstract
DNA N6-methyladenine (6mA) is a dominant DNA modification form and involved in many biological functions. The accurate genome-wide identification of 6mA sites may increase understanding of its biological functions. Experimental methods for 6mA detection in eukaryotes genome are laborious and expensive. Therefore, it is necessary to develop computational methods to identify 6mA sites on a genomic scale, especially for plant genomes. Based on this consideration, the study aims to develop a machine learning-based method of predicting 6mA sites in the rice genome. We initially used mono-nucleotide binary encoding to formulate positive and negative samples. Subsequently, the machine learning algorithm named Random Forest was utilized to perform the classification for identifying 6mA sites. Our proposed method could produce an area under the receiver operating characteristic curve of 0.964 with an overall accuracy of 0.917, as indicated by the fivefold cross-validation test. Furthermore, an independent dataset was established to assess the generalization ability of our method. Finally, an area under the receiver operating characteristic curve of 0.981 was obtained, suggesting that the proposed method had good performance of predicting 6mA sites in the rice genome. For the convenience of retrieving 6mA sites, on the basis of the computational method, we built a freely accessible web server named iDNA6mA-Rice at http://lin-group.cn/server/iDNA6mA-Rice.
Collapse
Affiliation(s)
- Hao Lv
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jiu-Xin Tan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yong Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
28
|
Zhang M, Li F, Marquez-Lago TT, Leier A, Fan C, Kwoh CK, Chou KC, Song J, Jia C. MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics 2019; 35:2957-2965. [PMID: 30649179 PMCID: PMC6736106 DOI: 10.1093/bioinformatics/btz016] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 12/09/2018] [Accepted: 01/05/2019] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Promoters are short DNA consensus sequences that are localized proximal to the transcription start sites of genes, allowing transcription initiation of particular genes. However, the precise prediction of promoters remains a challenging task because individual promoters often differ from the consensus at one or more positions. RESULTS In this study, we present a new multi-layer computational approach, called MULTiPly, for recognizing promoters and their specific types. MULTiPly took into account the sequences themselves, including both local information such as k-tuple nucleotide composition, dinucleotide-based auto covariance and global information of the entire samples based on bi-profile Bayes and k-nearest neighbour feature encodings. Specifically, the F-score feature selection method was applied to identify the best unique type of feature prediction results, in combination with other types of features that were subsequently added to further improve the prediction performance of MULTiPly. Benchmarking experiments on the benchmark dataset and comparisons with five state-of-the-art tools show that MULTiPly can achieve a better prediction performance on 5-fold cross-validation and jackknife tests. Moreover, the superiority of MULTiPly was also validated on a newly constructed independent test dataset. MULTiPly is expected to be used as a useful tool that will facilitate the discovery of both general and specific types of promoters in the post-genomic era. AVAILABILITY AND IMPLEMENTATION The MULTiPly webserver and curated datasets are freely available at http://flagshipnt.erc.monash.edu/MULTiPly/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Meng Zhang
- School of Science, Dalian Maritime University, Dalian, China
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | - Tatiana T Marquez-Lago
- Department of Genetics, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - André Leier
- Department of Genetics, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Cunshuo Fan
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | | | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian, China
- College of Information Engineering, Northwest A&F University, Yangling, China
| |
Collapse
|
29
|
Wang J, Yang B, An Y, Marquez-Lago T, Leier A, Wilksch J, Hong Q, Zhang Y, Hayashida M, Akutsu T, Webb GI, Strugnell RA, Song J, Lithgow T. Systematic analysis and prediction of type IV secreted effector proteins by machine learning approaches. Brief Bioinform 2019; 20:931-951. [PMID: 29186295 PMCID: PMC6585386 DOI: 10.1093/bib/bbx164] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2017] [Revised: 11/08/2017] [Indexed: 12/13/2022] Open
Abstract
In the course of infecting their hosts, pathogenic bacteria secrete numerous effectors, namely, bacterial proteins that pervert host cell biology. Many Gram-negative bacteria, including context-dependent human pathogens, use a type IV secretion system (T4SS) to translocate effectors directly into the cytosol of host cells. Various type IV secreted effectors (T4SEs) have been experimentally validated to play crucial roles in virulence by manipulating host cell gene expression and other processes. Consequently, the identification of novel effector proteins is an important step in increasing our understanding of host-pathogen interactions and bacterial pathogenesis. Here, we train and compare six machine learning models, namely, Naïve Bayes (NB), K-nearest neighbor (KNN), logistic regression (LR), random forest (RF), support vector machines (SVMs) and multilayer perceptron (MLP), for the identification of T4SEs using 10 types of selected features and 5-fold cross-validation. Our study shows that: (1) including different but complementary features generally enhance the predictive performance of T4SEs; (2) ensemble models, obtained by integrating individual single-feature models, exhibit a significantly improved predictive performance and (3) the 'majority voting strategy' led to a more stable and accurate classification performance when applied to predicting an ensemble learning model with distinct single features. We further developed a new method to effectively predict T4SEs, Bastion4 (Bacterial secretion effector predictor for T4SS), and we show our ensemble classifier clearly outperforms two recent prediction tools. In summary, we developed a state-of-the-art T4SE predictor by conducting a comprehensive performance evaluation of different machine learning algorithms along with a detailed analysis of single- and multi-feature selections.
Collapse
Affiliation(s)
- Jiawei Wang
- Biomedicine Discovery Institute and the Department of Microbiology at Monash University, Australia
| | - Bingjiao Yang
- National Engineering Research Center for Equipment and Technology of Cold Strip Rolling, College of Mechanical Engineering from Yanshan University, China
| | - Yi An
- College of Information Engineering, Northwest A&F University, China
| | - Tatiana Marquez-Lago
- Department of Genetics, University of Alabama at Birmingham (UAB) School of Medicine, USA
| | - André Leier
- Department of Genetics and the Informatics Institute, University of Alabama at Birmingham (UAB) School of Medicine, USA
| | - Jonathan Wilksch
- Department of Microbiology and Immunology at the University of Melbourne, Australia
| | | | - Yang Zhang
- Computer Science and Engineering in 2015 fromNorthwestern Polytechnical University, China
| | | | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan
| | - Geoffrey I Webb
- Faculty of Information Technology, Monash Centre for Data Science, Monash University
| | - Richard A Strugnell
- Department of Microbiology and Immunology, Faculty of Medicine Dentistry and Health Sciences, University of Melbourne
| | - Jiangning Song
- Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Trevor Lithgow
- Department of Microbiology at Monash University, Australia
| |
Collapse
|
30
|
Zhu YH, Hu J, Song XN, Yu DJ. DNAPred: Accurate Identification of DNA-Binding Sites from Protein Sequence by Ensembled Hyperplane-Distance-Based Support Vector Machines. J Chem Inf Model 2019; 59:3057-3071. [PMID: 30943723 DOI: 10.1021/acs.jcim.8b00749] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Accurate identification of protein-DNA binding sites is significant for both understanding protein function and drug design. Machine-learning-based methods have been extensively used for the prediction of protein-DNA binding sites. However, the data imbalance problem, in which the number of nonbinding residues (negative-class samples) is far larger than that of binding residues (positive-class samples), seriously restricts the performance improvements of machine-learning-based predictors. In this work, we designed a two-stage imbalanced learning algorithm, called ensembled hyperplane-distance-based support vector machines (E-HDSVM), to improve the prediction performance of protein-DNA binding sites. The first stage of E-HDSVM designs a new iterative sampling algorithm, called hyperplane-distance-based under-sampling (HD-US), to extract multiple subsets from the original imbalanced data set, each of which is used to train a support vector machine (SVM). Unlike traditional sampling algorithms, HD-US selects samples by calculating the distances between the samples and the separating hyperplane of the SVM. The second stage of E-HDSVM proposes an enhanced AdaBoost (EAdaBoost) algorithm to ensemble multiple trained SVMs. As an enhanced version of the original AdaBoost algorithm, EAdaBoost overcomes the overfitting problem. Stringent cross-validation and independent tests on benchmark data sets demonstrated the superiority of E-HDSVM over several popular imbalanced learning algorithms. Based on the proposed E-HDSVM algorithm, we further implemented a sequence-based protein-DNA binding site predictor, called DNAPred, which is freely available at http://csbio.njust.edu.cn/bioinf/dnapred/ for academic use. The computational experimental results showed that our predictor achieved an average overall accuracy of 91.7% and a Mathew's correlation coefficient of 0.395 on five benchmark data sets and outperformed several state-of-the-art sequence-based protein-DNA binding site predictors.
Collapse
Affiliation(s)
- Yi-Heng Zhu
- School of Computer Science and Engineering , Nanjing University of Science and Technology , Xiaolingwei 200 , Nanjing 210094 , P. R. China
| | - Jun Hu
- College of Information Engineering , Zhejiang University of Technology , Hangzhou 310023 , P. R. China
| | - Xiao-Ning Song
- School of Internet of Things , Jiangnan University , 1800 Lihu Road , Wuxi 214122 , P. R. China
| | - Dong-Jun Yu
- School of Computer Science and Engineering , Nanjing University of Science and Technology , Xiaolingwei 200 , Nanjing 210094 , P. R. China
| |
Collapse
|
31
|
Song J, Li F, Leier A, Marquez-Lago TT, Akutsu T, Haffari G, Chou KC, Webb GI, Pike RN, Hancock J. PROSPERous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy. Bioinformatics 2019; 34:684-687. [PMID: 29069280 DOI: 10.1093/bioinformatics/btx670] [Citation(s) in RCA: 114] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Accepted: 10/18/2017] [Indexed: 11/13/2022] Open
Abstract
Summary Proteases are enzymes that specifically cleave the peptide backbone of their target proteins. As an important type of irreversible post-translational modification, protein cleavage underlies many key physiological processes. When dysregulated, proteases' actions are associated with numerous diseases. Many proteases are highly specific, cleaving only those target substrates that present certain particular amino acid sequence patterns. Therefore, tools that successfully identify potential target substrates for proteases may also identify previously unknown, physiologically relevant cleavage sites, thus providing insights into biological processes and guiding hypothesis-driven experiments aimed at verifying protease-substrate interaction. In this work, we present PROSPERous, a tool for rapid in silico prediction of protease-specific cleavage sites in substrate sequences. Our tool is based on logistic regression models and uses different scoring functions and their pairwise combinations to subsequently predict potential cleavage sites. PROSPERous represents a state-of-the-art tool that enables fast, accurate and high-throughput prediction of substrate cleavage sites for 90 proteases. Availability and implementation http://prosperous.erc.monash.edu/. Contact jiangning.song@monash.edu or geoff.webb@monash.edu or r.pike@latrobe.edu.au. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology.,Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute.,ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Clayton, VIC 3800, Australia
| | - Fuyi Li
- Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA.,Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Tatiana T Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA.,Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | | | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology
| | - Robert N Pike
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Clayton, VIC 3800, Australia.,La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
| | | |
Collapse
|
32
|
Tapodi A, Clemens DM, Uwineza A, Jarrin M, Goldberg MW, Thinon E, Heal WP, Tate EW, Nemeth-Cahalan K, Vorontsova I, Hall JE, Quinlan RA. BFSP1 C-terminal domains released by post-translational processing events can alter significantly the calcium regulation of AQP0 water permeability. Exp Eye Res 2019; 185:107585. [PMID: 30790544 PMCID: PMC6713518 DOI: 10.1016/j.exer.2019.02.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 01/26/2019] [Accepted: 02/03/2019] [Indexed: 01/20/2023]
Abstract
BFSP1 (beaded filament structural protein 1, filensin) is a cytoskeletal protein expressed in the eye lens. It binds AQP0 in vitro and its C-terminal sequences have been suggested to regulate the water channel activity of AQP0. A myristoylated fragment from the C-terminus of BFSP1 was found in AQP0 enriched fractions. Here we identify BFSP1 as a substrate for caspase-mediated cleavage at several C-terminal sites including D433. Cleavage at D433 exposes a cryptic myristoylation sequence (434–440). We confirm that this sequence is an excellent substrate for both NMT1 and 2 (N-myristoyl transferase). Thus caspase cleavage may promote formation of myristoylated fragments derived from the BFSP1 C-terminus (G434-S665). Myristoylation at G434 is not required for membrane association. Biochemical fractionation and immunogold labeling confirmed that C-terminal BFSP1 fragments containing the myristoylation sequence colocalized with AQP0 in the same plasma membrane compartments of lens fibre cells. To determine the functional significance of the association of BFSP1 G434-S665 sequences with AQP0, we measured AQP0 water permeability in Xenopus oocytes co-transfected with transcripts expressing both AQP0 and various C-terminal domain fragments of BFSP1 generated by caspase cleavage. We found that different fragments dramatically alter the response of AQP0 to different concentrations of Ca2+. The complete C-terminal fragment (G434-S665) eliminates calcium regulation altogether. Shorter fragments can enhance regulation by elevated calcium or reverse the response, indicative of the regulatory potential of BFSP1 with respect to AQP0. In particular, elimination of the myristoylation site by the mutation G434A reverses the order of water permeability sensitivity to different Ca2+ concentrations.
Collapse
Affiliation(s)
- Antal Tapodi
- Department of Biosciences, The University of Durham, South Road, Durham, DH1 3LE, UK
| | | | - Alice Uwineza
- Department of Biosciences, The University of Durham, South Road, Durham, DH1 3LE, UK
| | - Miguel Jarrin
- Department of Biosciences, The University of Durham, South Road, Durham, DH1 3LE, UK
| | - Martin W Goldberg
- Department of Biosciences, The University of Durham, South Road, Durham, DH1 3LE, UK
| | - Emmanuelle Thinon
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, Wood Lane, London, W12 0BZ, UK; Institute of Chemical Biology, Molecular Sciences Research Hub, Imperial College London, Wood Lane, London, W12 0BZ, UK
| | - William P Heal
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, Wood Lane, London, W12 0BZ, UK; Institute of Chemical Biology, Molecular Sciences Research Hub, Imperial College London, Wood Lane, London, W12 0BZ, UK
| | - Edward W Tate
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, Wood Lane, London, W12 0BZ, UK; Institute of Chemical Biology, Molecular Sciences Research Hub, Imperial College London, Wood Lane, London, W12 0BZ, UK
| | | | | | - James E Hall
- Physiology and Biophysics, UC Irvine, Irvine, CA, USA.
| | - Roy A Quinlan
- Department of Biosciences, The University of Durham, South Road, Durham, DH1 3LE, UK; Biophysical Sciences Institute, The University of Durham, South Road, Durham, DH1 3LE, UK.
| |
Collapse
|
33
|
Gehlhausen JR, Hawley E, Wahle BM, He Y, Edwards D, Rhodes SD, Lajiness JD, Staser K, Chen S, Yang X, Yuan J, Li X, Jiang L, Smith A, Bessler W, Sandusky G, Stemmer-Rachamimov A, Stuhlmiller TJ, Angus SP, Johnson GL, Nalepa G, Yates CW, Wade Clapp D, Park SJ. A proteasome-resistant fragment of NIK mediates oncogenic NF-κB signaling in schwannomas. Hum Mol Genet 2019; 28:572-583. [PMID: 30335132 PMCID: PMC6489415 DOI: 10.1093/hmg/ddy361] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 09/29/2018] [Accepted: 10/05/2018] [Indexed: 12/29/2022] Open
Abstract
Schwannomas are common, highly morbid and medically untreatable tumors that can arise in patients with germ line as well as somatic mutations in neurofibromatosis type 2 (NF2). These mutations most commonly result in the loss of function of the NF2-encoded protein, Merlin. Little is known about how Merlin functions endogenously as a tumor suppressor and how its loss leads to oncogenic transformation in Schwann cells (SCs). Here, we identify nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κB)-inducing kinase (NIK) as a potential drug target driving NF-κB signaling and Merlin-deficient schwannoma genesis. Using a genomic approach to profile aberrant tumor signaling pathways, we describe multiple upregulated NF-κB signaling elements in human and murine schwannomas, leading us to identify a caspase-cleaved, proteasome-resistant NIK kinase domain fragment that amplifies pathogenic NF-κB signaling. Lentiviral-mediated transduction of this NIK fragment into normal SCs promotes proliferation, survival, and adhesion while inducing schwannoma formation in a novel in vivo orthotopic transplant model. Furthermore, we describe an NF-κB-potentiated hepatocyte growth factor (HGF) to MET proto-oncogene receptor tyrosine kinase (c-Met) autocrine feed-forward loop promoting SC proliferation. These innovative studies identify a novel signaling axis underlying schwannoma formation, revealing new and potentially druggable schwannoma vulnerabilities with future therapeutic potential.
Collapse
Affiliation(s)
- Jeffrey R Gehlhausen
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Biochemistry, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Eric Hawley
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Biochemistry, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Benjamin Mark Wahle
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Yongzheng He
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Donna Edwards
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Biochemistry, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Steven D Rhodes
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Anatomy and Cell Biology, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Jacquelyn D Lajiness
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Biochemistry, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Karl Staser
- Department of Medicine, Division of Dermatology, Washington University in Saint Louis, St. Louis, MO, USA
| | - Shi Chen
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Xianlin Yang
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Jin Yuan
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Xiaohong Li
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Li Jiang
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Abbi Smith
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Waylan Bessler
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - George Sandusky
- Department of Pathology, Indiana University School of Medicine, Indianapolis, IN, USA
| | | | | | - Steven P Angus
- Department of Pharmacology, University of North Carolina, Chapel Hill, NC
| | - Gary L Johnson
- Department of Pharmacology, University of North Carolina, Chapel Hill, NC
| | - Grzegorz Nalepa
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Biochemistry, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Charles W Yates
- Department of Otolaryngology, Indiana University School of Medicine, Indianapolis, IN, USA
| | - D Wade Clapp
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Biochemistry, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Microbiology and Immunology, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Su-Jung Park
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Biochemistry, Indiana University School of Medicine, Indianapolis, IN, USA
| |
Collapse
|
34
|
Radchenko T, Fontaine F, Morettoni L, Zamora I. Software-aided workflow for predicting protease-specific cleavage sites using physicochemical properties of the natural and unnatural amino acids in peptide-based drug discovery. PLoS One 2019; 14:e0199270. [PMID: 30620739 PMCID: PMC6324806 DOI: 10.1371/journal.pone.0199270] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 12/18/2018] [Indexed: 12/03/2022] Open
Abstract
Peptide drugs have been used in the treatment of multiple pathologies. During peptide discovery, it is crucially important to be able to map the potential sites of cleavages of the proteases. This knowledge is used to later chemically modify the peptide drug to adapt it for the therapeutic use, making peptide stable against individual proteases or in complex medias. In some other cases it needed to make it specifically unstable for some proteases, as peptides could be used as a system to target delivery drugs on specific tissues or cells. The information about proteases, their sites of cleavages and substrates are widely spread across publications and collected in databases such as MEROPS. Therefore, it is possible to develop models to improve the understanding of the potential peptide drug proteolysis. We propose a new workflow to derive protease specificity rules and predict the potential scissile bonds in peptides for individual proteases. WebMetabase stores the information from experimental or external sources in a chemically aware database where each peptide and site of cleavage is represented as a sequence of structural blocks connected by amide bonds and characterized by its physicochemical properties described by Volsurf descriptors. Thus, this methodology could be applied in the case of non-standard amino acid. A frequency analysis can be performed in WebMetabase to discover the most frequent cleavage sites. These results were used to train several models using logistic regression, support vector machine and ensemble tree classifiers to map cleavage sites for several human proteases from four different families (serine, cysteine, aspartic and matrix metalloproteases). Finally, we compared the predictive performance of the developed models with other available public tools PROSPERous and SitePrediction.
Collapse
Affiliation(s)
- Tatiana Radchenko
- Pompeu Fabra University, Barcelona, Spain
- Lead Molecular Design, S. L, Sant Cugat del Vallés, Spain
- * E-mail: (TR); (IZ)
| | | | | | - Ismael Zamora
- Pompeu Fabra University, Barcelona, Spain
- Lead Molecular Design, S. L, Sant Cugat del Vallés, Spain
- * E-mail: (TR); (IZ)
| |
Collapse
|
35
|
Zhu XJ, Feng CQ, Lai HY, Chen W, Hao L. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2018.10.007] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
36
|
Wei L, Hu J, Li F, Song J, Su R, Zou Q. Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms. Brief Bioinform 2018; 21:106-119. [PMID: 30383239 DOI: 10.1093/bib/bby107] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 09/18/2018] [Accepted: 10/05/2018] [Indexed: 12/11/2022] Open
Abstract
Quorum-sensing peptides (QSPs) are the signal molecules that are closely associated with diverse cellular processes, such as cell-cell communication, and gene expression regulation in Gram-positive bacteria. It is therefore of great importance to identify QSPs for better understanding and in-depth revealing of their functional mechanisms in physiological processes. Machine learning algorithms have been developed for this purpose, showing the great potential for the reliable prediction of QSPs. In this study, several sequence-based feature descriptors for peptide representation and machine learning algorithms are comprehensively reviewed, evaluated and compared. To effectively use existing feature descriptors, we used a feature representation learning strategy that automatically learns the most discriminative features from existing feature descriptors in a supervised way. Our results demonstrate that this strategy is capable of effectively capturing the sequence determinants to represent the characteristics of QSPs, thereby contributing to the improved predictive performance. Furthermore, wrapping this feature representation learning strategy, we developed a powerful predictor named QSPred-FL for the detection of QSPs in large-scale proteomic data. Benchmarking results with 10-fold cross validation showed that QSPred-FL is able to achieve better performance as compared to the state-of-the-art predictors. In addition, we have established a user-friendly webserver that implements QSPred-FL, which is currently available at http://server.malab.cn/QSPred-FL. We expect that this tool will be useful for the high-throughput prediction of QSPs and the discovery of important functional mechanisms of QSPs.
Collapse
Affiliation(s)
- Leyi Wei
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Jie Hu
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Fuyi Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Jiangning Song
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Ran Su
- School of Computer Software, Tianjin University, Tianjin, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
37
|
Qiang X, Chen H, Ye X, Su R, Wei L. M6AMRFS: Robust Prediction of N6-Methyladenosine Sites With Sequence-Based Features in Multiple Species. Front Genet 2018; 9:495. [PMID: 30410501 PMCID: PMC6209681 DOI: 10.3389/fgene.2018.00495] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Accepted: 10/04/2018] [Indexed: 12/23/2022] Open
Abstract
As one of the well-studied RNA methylation modifications, N6-methyladenosine (m6A) plays important roles in various biological progresses, such as RNA splicing and degradation, etc. Identification of m6A sites is fundamentally important for better understanding of their functional mechanisms. Recently, machine learning based prediction methods have emerged as an effective approach for fast and accurate identification of m6A sites. In this paper, we proposed "M6AMRFS", a new machine learning based predictor for the identification of m6A sites. In this predictor, we exploited a new feature representation algorithm to encode RNA sequences with two feature descriptors (dinucleotide binary encoding and Local position-specific dinucleotide frequency), and used the F-score algorithm combined with SFS (Sequential Forward Search) to enhance the feature representation ability. To predict m6A sites, we employed the eXtreme Gradient Boosting (XGBoost) algorithm to build a predictive model. Benchmarking results showed that the proposed predictor is competitive with the state-of-the art predictors. Importantly, robust predictions for multiple species by our predictor demonstrate that our predictive models have strong generalization ability. To the best of our knowledge, M6AMRFS is the first tool that can be used for the identification of m6A sites in multiple species. To facilitate the use of our predictor, we have established a user-friendly webserver with the implementation of M6AMRFS, which is currently available in http://server.malab.cn/M6AMRFS/. We anticipate that it will be a useful tool for the relevant research of m6A sites.
Collapse
Affiliation(s)
- Xiaoli Qiang
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Huangrong Chen
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Ran Su
- School of Software, Tianjin University, Tianjin, China
| | - Leyi Wei
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| |
Collapse
|
38
|
Lu B, Li C, Chen Q, Song J. ProBAPred: Inferring protein–protein binding affinity by incorporating protein sequence and structural features. J Bioinform Comput Biol 2018; 16:1850011. [PMID: 29954286 DOI: 10.1142/s0219720018500117] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Protein-protein binding interaction is the most prevalent biological activity that mediates a great variety of biological processes. The increasing availability of experimental data of protein–protein interaction allows a systematic construction of protein–protein interaction networks, significantly contributing to a better understanding of protein functions and their roles in cellular pathways and human diseases. Compared to well-established classification for protein–protein interactions (PPIs), limited work has been conducted for estimating protein–protein binding free energy, which can provide informative real-value regression models for characterizing the protein–protein binding affinity. In this study, we propose a novel ensemble computational framework, termed ProBAPred (Protein–protein Binding Affinity Predictor), for quantitative estimation of protein–protein binding affinity. A large number of sequence and structural features, including physical–chemical properties, binding energy and conformation annotations, were collected and calculated from currently available protein binding complex datasets and the literature. Feature selection based on the WEKA package was performed to identify and characterize the most informative and contributing feature subsets. Experiments on the independent test showed that our ensemble method achieved the lowest Mean Absolute Error (MAE; 1.657[Formula: see text]kcal/mol) and the second highest correlation coefficient ([Formula: see text]), compared with the existing methods. The datasets and source codes of ProBAPred, and the supplementary materials in this study can be downloaded at http://lightning.med.monash.edu/probapred/ for academic use. We anticipate that the developed ProBAPred regression models can facilitate computational characterization and experimental studies of protein–protein binding affinity.
Collapse
Affiliation(s)
- Bangli Lu
- School of Computer, Electronic and Information, and State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, 100 Daxue Road, 530004 Nanning, P. R. China
| | - Chen Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
| | - Qingfeng Chen
- School of Computer, Electronic and Information, and State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, 100 Daxue Road, 530004 Nanning, P. R. China
| | - Jiangning Song
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, VIC 3800, Australia
- ARC Centre of Excellence for Advanced Molecular Imaging, Monash University, VIC 3800, Australia
| |
Collapse
|
39
|
Wang H, Feng L, Webb GI, Kurgan L, Song J, Lin D. Critical evaluation of bioinformatics tools for the prediction of protein crystallization propensity. Brief Bioinform 2018; 19:838-852. [PMID: 28334201 PMCID: PMC6171492 DOI: 10.1093/bib/bbx018] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Revised: 01/19/2017] [Indexed: 12/11/2022] Open
Abstract
X-ray crystallography is the main tool for structural determination of proteins. Yet, the underlying crystallization process is costly, has a high attrition rate and involves a series of trial-and-error attempts to obtain diffraction-quality crystals. The Structural Genomics Consortium aims to systematically solve representative structures of major protein-fold classes using primarily high-throughput X-ray crystallography. The attrition rate of these efforts can be improved by selection of proteins that are potentially easier to be crystallized. In this context, bioinformatics approaches have been developed to predict crystallization propensities based on protein sequences. These approaches are used to facilitate prioritization of the most promising target proteins, search for alternative structural orthologues of the target proteins and suggest designs of constructs capable of potentially enhancing the likelihood of successful crystallization. We reviewed and compared nine predictors of protein crystallization propensity. Moreover, we demonstrated that integrating selected outputs from multiple predictors as candidate input features to build the predictive model results in a significantly higher predictive performance when compared to using these predictors individually. Furthermore, we also introduced a new and accurate predictor of protein crystallization propensity, Crysf, which uses functional features extracted from UniProt as inputs. This comprehensive review will assist structural biologists in selecting the most appropriate predictor, and is also beneficial for bioinformaticians to develop a new generation of predictive algorithms.
Collapse
Affiliation(s)
- Huilin Wang
- Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, China
| | | | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, USA
| | - Jiangning Song
- Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Donghai Lin
- Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, China
| |
Collapse
|
40
|
Zhang K, Lv DW, Li R. B Cell Receptor Activation and Chemical Induction Trigger Caspase-Mediated Cleavage of PIAS1 to Facilitate Epstein-Barr Virus Reactivation. Cell Rep 2018; 21:3445-3457. [PMID: 29262325 DOI: 10.1016/j.celrep.2017.11.071] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Revised: 10/21/2017] [Accepted: 11/17/2017] [Indexed: 12/16/2022] Open
Abstract
Epstein-Barr virus (EBV) in tumor cells is predominately in the latent phase, but the virus can undergo lytic reactivation in response to various stimuli. However, the cellular factors that control latency and lytic replication are poorly defined. In this study, we demonstrated that a cellular factor, PIAS1, restricts EBV lytic replication. PIAS1 depletion significantly facilitated EBV reactivation, while PIAS1 reconstitution had the opposite effect. Remarkably, we found that various lytic triggers promote caspase-dependent cleavage of PIAS1 to antagonize PIAS1-mediated restriction and that caspase inhibition suppresses EBV replication through blocking PIAS1 cleavage. We further demonstrated that a cleavage-resistant PIAS1 mutant suppresses EBV replication upon B cell receptor activation. Mechanistically, we demonstrated that PIAS1 acts as an inhibitor for transcription factors involved in lytic gene expression. Collectively, these results establish PIAS1 as a key regulator of EBV lytic replication and uncover a mechanism by which EBV exploits apoptotic caspases to antagonize PIAS1-mediated restriction.
Collapse
Affiliation(s)
- Kun Zhang
- Philips Institute for Oral Health Research, School of Dentistry, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Dong-Wen Lv
- Philips Institute for Oral Health Research, School of Dentistry, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Renfeng Li
- Philips Institute for Oral Health Research, School of Dentistry, Virginia Commonwealth University, Richmond, VA 23298, USA; Department of Microbiology and Immunology, School of Medicine, Virginia Commonwealth University, Richmond, VA 23298, USA; Massey Cancer Center, Virginia Commonwealth University, Richmond, VA 23298, USA.
| |
Collapse
|
41
|
Bhagwat SR, Hajela K, Kumar A. Proteolysis to Identify Protease Substrates: Cleave to Decipher. Proteomics 2018; 18:e1800011. [DOI: 10.1002/pmic.201800011] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 04/03/2018] [Indexed: 02/06/2023]
Affiliation(s)
- Sonali R. Bhagwat
- Discipline of Biosciences and Biomedical Engineering; Indian Institute of Technology; Indore 453552 Simrol India
| | - Krishnan Hajela
- School of Life Sciences; Devi Ahilya Vishwavidyalaya; Indore 452001 India
| | - Amit Kumar
- Discipline of Biosciences and Biomedical Engineering; Indian Institute of Technology; Indore 453552 Simrol India
| |
Collapse
|
42
|
PhosContext2vec: a distributed representation of residue-level sequence contexts and its application to general and kinase-specific phosphorylation site prediction. Sci Rep 2018; 8:8240. [PMID: 29844483 PMCID: PMC5974293 DOI: 10.1038/s41598-018-26392-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 05/10/2018] [Indexed: 11/28/2022] Open
Abstract
Phosphorylation is the most important type of protein post-translational modification. Accordingly, reliable identification of kinase-mediated phosphorylation has important implications for functional annotation of phosphorylated substrates and characterization of cellular signalling pathways. The local sequence context surrounding potential phosphorylation sites is considered to harbour the most relevant information for phosphorylation site prediction models. However, currently there is a lack of condensed vector representation for this important contextual information, despite the presence of varying residue-level features that can be constructed from sequence homology profiles, structural information, and physicochemical properties. To address this issue, we present PhosContext2vec which is a distributed representation of residue-level sequence contexts for potential phosphorylation sites and demonstrate its application in both general and kinase-specific phosphorylation site predictions. Benchmarking experiments indicate that PhosContext2vec could achieve promising predictive performance compared with several other existing methods for phosphorylation site prediction. We envisage that PhosContext2vec, as a new sequence context representation, can be used in combination with other informative residue-level features to improve the classification performance in a number of related bioinformatics tasks that require appropriate residue-level feature vector representation and extraction. The web server of PhosContext2vec is publicly available at http://phoscontext2vec.erc.monash.edu/.
Collapse
|
43
|
Song J, Wang Y, Li F, Akutsu T, Rawlings ND, Webb GI, Chou KC. iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Brief Bioinform 2018. [DOI: 10.1093/bib/bby028 epub ahead of print].] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia and ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Yanan Wang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, 611-0011, Japan
| | - Neil D Rawlings
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA and Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
44
|
Song J, Li F, Takemoto K, Haffari G, Akutsu T, Chou KC, Webb GI. PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework. J Theor Biol 2018; 443:125-137. [DOI: 10.1016/j.jtbi.2018.01.023] [Citation(s) in RCA: 95] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Revised: 01/17/2018] [Accepted: 01/18/2018] [Indexed: 10/18/2022]
|
45
|
PhosphoPredict: A bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection. Sci Rep 2017; 7:6862. [PMID: 28761071 PMCID: PMC5537252 DOI: 10.1038/s41598-017-07199-4] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2016] [Accepted: 06/27/2017] [Indexed: 12/31/2022] Open
Abstract
Protein phosphorylation is a major form of post-translational modification (PTM) that regulates diverse cellular processes. In silico methods for phosphorylation site prediction can provide a useful and complementary strategy for complete phosphoproteome annotation. Here, we present a novel bioinformatics tool, PhosphoPredict, that combines protein sequence and functional features to predict kinase-specific substrates and their associated phosphorylation sites for 12 human kinases and kinase families, including ATM, CDKs, GSK-3, MAPKs, PKA, PKB, PKC, and SRC. To elucidate critical determinants, we identified feature subsets that were most informative and relevant for predicting substrate specificity for each individual kinase family. Extensive benchmarking experiments based on both five-fold cross-validation and independent tests indicated that the performance of PhosphoPredict is competitive with that of several other popular prediction tools, including KinasePhos, PPSP, GPS, and Musite. We found that combining protein functional and sequence features significantly improves phosphorylation site prediction performance across all kinases. Application of PhosphoPredict to the entire human proteome identified 150 to 800 potential phosphorylation substrates for each of the 12 kinases or kinase families. PhosphoPredict significantly extends the bioinformatics portfolio for kinase function analysis and will facilitate high-throughput identification of kinase-specific phosphorylation sites, thereby contributing to both basic and translational research programs.
Collapse
|
46
|
Wang Y, Song J, Marquez-Lago TT, Leier A, Li C, Lithgow T, Webb GI, Shen HB. Knowledge-transfer learning for prediction of matrix metalloprotease substrate-cleavage sites. Sci Rep 2017; 7:5755. [PMID: 28720874 PMCID: PMC5515926 DOI: 10.1038/s41598-017-06219-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 06/08/2017] [Indexed: 11/24/2022] Open
Abstract
Matrix Metalloproteases (MMPs) are an important family of proteases that play crucial roles in key cellular and disease processes. Therefore, MMPs constitute important targets for drug design, development and delivery. Advanced proteomic technologies have identified type-specific target substrates; however, the complete repertoire of MMP substrates remains uncharacterized. Indeed, computational prediction of substrate-cleavage sites associated with MMPs is a challenging problem. This holds especially true when considering MMPs with few experimentally verified cleavage sites, such as for MMP-2, -3, -7, and -8. To fill this gap, we propose a new knowledge-transfer computational framework which effectively utilizes the hidden shared knowledge from some MMP types to enhance predictions of other, distinct target substrate-cleavage sites. Our computational framework uses support vector machines combined with transfer machine learning and feature selection. To demonstrate the value of the model, we extracted a variety of substrate sequence-derived features and compared the performance of our method using both 5-fold cross-validation and independent tests. The results show that our transfer-learning-based method provides a robust performance, which is at least comparable to traditional feature-selection methods for prediction of MMP-2, -3, -7, -8, -9 and -12 substrate-cleavage sites on independent tests. The results also demonstrate that our proposed computational framework provides a useful alternative for the characterization of sequence-level determinants of MMP-substrate specificity.
Collapse
Affiliation(s)
- Yanan Wang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, 3800, Australia
| | - Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, 3800, Australia
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia
- ARC Centre of Excellence for Advanced Molecular Imaging, Monash University, Melbourne, VIC, 3800, Australia
| | - Tatiana T Marquez-Lago
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - André Leier
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - Chen Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia
| | - Trevor Lithgow
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, 3800, Australia.
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, 3800, Australia.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| |
Collapse
|
47
|
Nicholson J, Jevons SJ, Groselj B, Ellermann S, Konietzny R, Kerr M, Kessler BM, Kiltie AE. E3 Ligase cIAP2 Mediates Downregulation of MRE11 and Radiosensitization in Response to HDAC Inhibition in Bladder Cancer. Cancer Res 2017; 77:3027-3039. [PMID: 28363998 DOI: 10.1158/0008-5472.can-16-3232] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Revised: 01/10/2017] [Accepted: 03/27/2017] [Indexed: 11/16/2022]
Abstract
The MRE11/RAD50/NBS1 (MRN) complex mediates DNA repair pathways, including double-strand breaks induced by radiotherapy. Meiotic recombination 11 homolog (MRE11) is downregulated by histone deacetylase inhibition (HDACi), resulting in reduced levels of DNA repair in bladder cancer cells and radiosensitization. In this study, we show that the mechanism of this downregulation is posttranslational and identify a C-terminally truncated MRE11, which is formed after HDAC inhibition as full-length MRE11 is downregulated. Truncated MRE11 was stabilized by proteasome inhibition, exhibited a decreased half-life after treatment with panobinostat, and therefore represents a newly identified intermediate induced and degraded in response to HDAC inhibition. The E3 ligase cellular inhibitor of apoptosis protein 2 (cIAP2) was upregulated in response to HDAC inhibition and was validated as a new MRE11 binding partner whose upregulation had similar effects to HDAC inhibition. cIAP2 overexpression resulted in downregulation and altered ubiquitination patterns of MRE11 and mediated radiosensitization in response to HDAC inhibition. These results highlight cIAP2 as a player in the DNA damage response as a posttranscriptional regulator of MRE11 and identify cIAP2 as a potential target for biomarker discovery or chemoradiation strategies in bladder cancer. Cancer Res; 77(11); 3027-39. ©2017 AACR.
Collapse
Affiliation(s)
- Judith Nicholson
- CRUK/MRC Oxford Institute for Radiation Oncology, University of Oxford, Oxford, United Kingdom.
| | - Sarah J Jevons
- CRUK/MRC Oxford Institute for Radiation Oncology, University of Oxford, Oxford, United Kingdom
| | - Blaz Groselj
- CRUK/MRC Oxford Institute for Radiation Oncology, University of Oxford, Oxford, United Kingdom
| | - Sophie Ellermann
- CRUK/MRC Oxford Institute for Radiation Oncology, University of Oxford, Oxford, United Kingdom
| | - Rebecca Konietzny
- TDI Mass Spectrometry Laboratory, Target Discovery Institute, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Martin Kerr
- CRUK/MRC Oxford Institute for Radiation Oncology, University of Oxford, Oxford, United Kingdom
| | - Benedikt M Kessler
- TDI Mass Spectrometry Laboratory, Target Discovery Institute, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Anne E Kiltie
- CRUK/MRC Oxford Institute for Radiation Oncology, University of Oxford, Oxford, United Kingdom.
| |
Collapse
|
48
|
Minina EA, Coll NS, Tuominen H, Bozhkov PV. Metacaspases versus caspases in development and cell fate regulation. Cell Death Differ 2017; 24:1314-1325. [PMID: 28234356 DOI: 10.1038/cdd.2017.18] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2016] [Revised: 01/11/2017] [Accepted: 01/19/2017] [Indexed: 12/18/2022] Open
Abstract
Initially found to be critically involved in inflammation and apoptosis, caspases have since then been implicated in the regulation of various signaling pathways in animals. How caspases and caspase-mediated processes evolved is a topic of great interest and hot debate. In fact, caspases are just the tip of the iceberg, representing a relatively small group of mostly animal-specific enzymes within a broad family of structurally related cysteine proteases (family C14 of CD clan) found in all kingdoms of life. Apart from caspases, this family encompasses para- and metacaspases, and all three groups of proteases exhibit significant variation in biochemistry and function in vivo. Notably, metacaspases are present in all eukaryotic lineages with a remarkable absence in animals. Thus, metacaspases and caspases must have adapted to operate under distinct cellular and physiological settings. Here we discuss biochemical properties and biological functions of metacaspases in comparison to caspases, with a major focus on the regulation of developmental aspects in plants versus animals.
Collapse
Affiliation(s)
- E A Minina
- Department of Molecular Sciences, Uppsala BioCenter, Swedish University of Agricultural Sciences and Linnean Center for Plant Biology, Uppsala, Sweden
| | - N S Coll
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus UAB Bellaterra, Barcelona, Spain
| | - H Tuominen
- Umeaå Plant Science Centre, Department of Plant Physiology, Umeaå University, Umeaå, Sweden
| | - P V Bozhkov
- Department of Molecular Sciences, Uppsala BioCenter, Swedish University of Agricultural Sciences and Linnean Center for Plant Biology, Uppsala, Sweden
| |
Collapse
|
49
|
Li YF, Nanayakkara G, Sun Y, Li X, Wang L, Cueto R, Shao Y, Fu H, Johnson C, Cheng J, Chen X, Hu W, Yu J, Choi ET, Wang H, Yang XF. Analyses of caspase-1-regulated transcriptomes in various tissues lead to identification of novel IL-1β-, IL-18- and sirtuin-1-independent pathways. J Hematol Oncol 2017; 10:40. [PMID: 28153032 PMCID: PMC5290602 DOI: 10.1186/s13045-017-0406-2] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Accepted: 01/25/2017] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND It is well established that caspase-1 exerts its biological activities through its downstream targets such as IL-1β, IL-18, and Sirt-1. The microarray datasets derived from various caspase-1 knockout tissues indicated that caspase-1 can significantly impact the transcriptome. However, it is not known whether all the effects exerted by caspase-1 on transcriptome are mediated only by its well-known substrates. Therefore, we hypothesized that the effects of caspase-1 on transcriptome may be partially independent from IL-1β, IL-18, and Sirt-1. METHODS To determine new global and tissue-specific gene regulatory effects of caspase-1, we took novel microarray data analysis approaches including Venn analysis, cooperation analysis, and meta-analysis methods. We used these statistical methods to integrate different microarray datasets conducted on different caspase-1 knockout tissues and datasets where caspase-1 downstream targets were manipulated. RESULTS We made the following important findings: (1) Caspase-1 exerts its regulatory effects on the majority of genes in a tissue-specific manner; (2) Caspase-1 regulatory genes partially cooperates with genes regulated by sirtuin-1 during organ injury and inflammation in adipose tissue but not in the liver; (3) Caspase-1 cooperates with IL-1β in regulating less than half of the genes involved in cardiovascular disease, organismal injury, and cancer in mouse liver; (4) The meta-analysis identifies 40 caspase-1 globally regulated genes across tissues, suggesting that caspase-1 globally regulates many novel pathways; and (5) The meta-analysis identified new cooperatively and non-cooperatively regulated genes in caspase-1, IL-1β, IL-18, and Sirt-1 pathways. CONCLUSIONS Our findings suggest that caspase-1 regulates many new signaling pathways potentially via its known substrates and also via transcription factors and other proteins that are yet to be identified.
Collapse
Affiliation(s)
- Ya-Feng Li
- Centers for Metabolic Disease Research and Cardiovascular Research, Lewis Katz School of Medicine at Temple University, 3500 North Broad Street, MERB-1059, Philadelphia, PA, 19140, USA.,Cardiovascular Research, & Thrombosis Research, Departments of Pharmacology, Lewis Katz School of Medicine at Temple University, Philadelphia, PA, 19140, USA.,The Shanxi Provincial People's Hospital, an Affiliate Hospital of Shanxi Medical University, Taiyuan, Shanxi, 030001, China
| | - Gayani Nanayakkara
- Centers for Metabolic Disease Research and Cardiovascular Research, Lewis Katz School of Medicine at Temple University, 3500 North Broad Street, MERB-1059, Philadelphia, PA, 19140, USA
| | - Yu Sun
- Centers for Metabolic Disease Research and Cardiovascular Research, Lewis Katz School of Medicine at Temple University, 3500 North Broad Street, MERB-1059, Philadelphia, PA, 19140, USA
| | - Xinyuan Li
- Centers for Metabolic Disease Research and Cardiovascular Research, Lewis Katz School of Medicine at Temple University, 3500 North Broad Street, MERB-1059, Philadelphia, PA, 19140, USA
| | - Luqiao Wang
- Centers for Metabolic Disease Research and Cardiovascular Research, Lewis Katz School of Medicine at Temple University, 3500 North Broad Street, MERB-1059, Philadelphia, PA, 19140, USA
| | - Ramon Cueto
- Centers for Metabolic Disease Research and Cardiovascular Research, Lewis Katz School of Medicine at Temple University, 3500 North Broad Street, MERB-1059, Philadelphia, PA, 19140, USA
| | - Ying Shao
- Centers for Metabolic Disease Research and Cardiovascular Research, Lewis Katz School of Medicine at Temple University, 3500 North Broad Street, MERB-1059, Philadelphia, PA, 19140, USA
| | - Hangfei Fu
- Centers for Metabolic Disease Research and Cardiovascular Research, Lewis Katz School of Medicine at Temple University, 3500 North Broad Street, MERB-1059, Philadelphia, PA, 19140, USA
| | - Candice Johnson
- Centers for Metabolic Disease Research and Cardiovascular Research, Lewis Katz School of Medicine at Temple University, 3500 North Broad Street, MERB-1059, Philadelphia, PA, 19140, USA
| | - Jiali Cheng
- Centers for Metabolic Disease Research and Cardiovascular Research, Lewis Katz School of Medicine at Temple University, 3500 North Broad Street, MERB-1059, Philadelphia, PA, 19140, USA
| | - Xiongwen Chen
- Cardiovascular Research, & Thrombosis Research, Departments of Pharmacology, Lewis Katz School of Medicine at Temple University, Philadelphia, PA, 19140, USA.,Department of Immunology, Lewis Katz School of Medicine at Temple University, Philadelphia, PA, 19140, USA
| | - Wenhui Hu
- Centers for Metabolic Disease Research and Cardiovascular Research, Lewis Katz School of Medicine at Temple University, 3500 North Broad Street, MERB-1059, Philadelphia, PA, 19140, USA
| | - Jun Yu
- Centers for Metabolic Disease Research and Cardiovascular Research, Lewis Katz School of Medicine at Temple University, 3500 North Broad Street, MERB-1059, Philadelphia, PA, 19140, USA.,Department of Immunology, Lewis Katz School of Medicine at Temple University, Philadelphia, PA, 19140, USA
| | - Eric T Choi
- Centers for Metabolic Disease Research and Cardiovascular Research, Lewis Katz School of Medicine at Temple University, 3500 North Broad Street, MERB-1059, Philadelphia, PA, 19140, USA.,Department of Surgery, Lewis Katz School of Medicine at Temple University, Philadelphia, PA, 19140, USA
| | - Hong Wang
- Centers for Metabolic Disease Research and Cardiovascular Research, Lewis Katz School of Medicine at Temple University, 3500 North Broad Street, MERB-1059, Philadelphia, PA, 19140, USA.,Department of Physiology, Lewis Katz School of Medicine at Temple University, Philadelphia, PA, 19140, USA
| | - Xiao-Feng Yang
- Centers for Metabolic Disease Research and Cardiovascular Research, Lewis Katz School of Medicine at Temple University, 3500 North Broad Street, MERB-1059, Philadelphia, PA, 19140, USA. .,Cardiovascular Research, & Thrombosis Research, Departments of Pharmacology, Lewis Katz School of Medicine at Temple University, Philadelphia, PA, 19140, USA. .,Department of Physiology, Lewis Katz School of Medicine at Temple University, Philadelphia, PA, 19140, USA. .,Department of Immunology, Lewis Katz School of Medicine at Temple University, Philadelphia, PA, 19140, USA.
| |
Collapse
|
50
|
Lateef Z, Gimenez G, Baker ES, Ward VK. Transcriptomic analysis of human norovirus NS1-2 protein highlights a multifunctional role in murine monocytes. BMC Genomics 2017; 18:39. [PMID: 28056773 PMCID: PMC5217272 DOI: 10.1186/s12864-016-3417-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 12/12/2016] [Indexed: 12/22/2022] Open
Abstract
Background The GII.4 Sydney 2012 strain of human norovirus (HuNoV) is a pandemic strain that is responsible for the majority of norovirus outbreaks in healthcare settings. The function of the non-structural (NS)1-2 protein from HuNoV is unknown. Results In silico analysis of human norovirus NS1-2 protein showed that it shares features with the murine NS1-2 protein, including a disordered region, a transmembrane domain and H-box and NC sequence motifs. The proteins also contain caspase cleavage and phosphorylation sites, indicating that processing and phosphorylation may be a conserved feature of norovirus NS1-2 proteins. In this study, RNA transcripts of human and murine norovirus full-length and the disordered region of NS1-2 were transfected into monocytes, and next generation sequencing was used to analyse the transcriptomic profile of cells expressing virus proteins. The profiles were then compared to the transcriptomic profile of MNV-infected cells. Conclusions RNAseq analysis showed that NS1-2 proteins from human and murine noroviruses affect multiple immune systems (chemokine, cytokine, and Toll-like receptor signaling) and intracellular pathways (NFκB, MAPK, PI3K-Akt signaling) in murine monocytes. Comparison to the transcriptomic profile of MNV-infected cells indicated the pathways that NS1-2 may affect during norovirus infection. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3417-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zabeen Lateef
- Department of Microbiology and Immunology, Otago School of Medical Sciences, University of Otago, 720 Cumberland St, Dunedin, 9054, New Zealand.
| | - Gregory Gimenez
- Otago Genomics and Bioinformatics Facility, University of Otago, Dunedin, 9054, New Zealand
| | - Estelle S Baker
- Department of Microbiology and Immunology, Otago School of Medical Sciences, University of Otago, 720 Cumberland St, Dunedin, 9054, New Zealand
| | - Vernon K Ward
- Department of Microbiology and Immunology, Otago School of Medical Sciences, University of Otago, 720 Cumberland St, Dunedin, 9054, New Zealand
| |
Collapse
|