1
|
Bucataru C, Ciobanasu C. Antimicrobial peptides: Opportunities and challenges in overcoming resistance. Microbiol Res 2024; 286:127822. [PMID: 38986182 DOI: 10.1016/j.micres.2024.127822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 06/20/2024] [Accepted: 06/25/2024] [Indexed: 07/12/2024]
Abstract
Antibiotic resistance represents a global health threat, challenging the efficacy of traditional antimicrobial agents and necessitating innovative approaches to combat infectious diseases. Among these alternatives, antimicrobial peptides have emerged as promising candidates against resistant pathogens. Unlike traditional antibiotics with only one target, these peptides can use different mechanisms to destroy bacteria, with low toxicity to mammalian cells compared to many conventional antibiotics. Antimicrobial peptides (AMPs) have encouraging antibacterial properties and are currently employed in the clinical treatment of pathogen infection, cancer, wound healing, cosmetics, or biotechnology. This review summarizes the mechanisms of antimicrobial peptides against bacteria, discusses the mechanisms of drug resistance, the limitations and challenges of AMPs in peptide drug applications for combating drug-resistant bacterial infections, and strategies to enhance their capabilities.
Collapse
Affiliation(s)
- Cezara Bucataru
- Alexandru I. Cuza University, Institute of Interdisciplinary Research, Department of Exact and Natural Sciences, Bulevardul Carol I, Nr.11, Iasi 700506, Romania
| | - Corina Ciobanasu
- Alexandru I. Cuza University, Institute of Interdisciplinary Research, Department of Exact and Natural Sciences, Bulevardul Carol I, Nr.11, Iasi 700506, Romania.
| |
Collapse
|
2
|
Sánchez-Arroyo A, Plaza-Vinuesa L, de las Rivas B, Mancheño JM, Muñoz R. Aspergillus niger Ochratoxinase Is a Highly Specific, Metal-Dependent Amidohydrolase Suitable for OTA Biodetoxification in Food and Feed. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:18658-18669. [PMID: 39110482 PMCID: PMC11342369 DOI: 10.1021/acs.jafc.4c02944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 07/10/2024] [Accepted: 07/29/2024] [Indexed: 08/22/2024]
Abstract
Microbial enzymes can be used as processing aids or additives in food and feed industries. Enzymatic detoxification of ochratoxin A (OTA) is a promising method to reduce OTA content. Here, we characterize the full-length enzyme ochratoxinase (AnOTA), an amidohydrolase from Aspergillus niger. AnOTA hydrolyzes OTA and ochratoxin B (OTB) mycotoxins efficiently and also other substrates containing phenylalanine, alanine, or leucine residues at their C-terminal position, revealing a narrow specificity profile. AnOTA lacks endopeptidase or aminoacylase activities. The structural basis of the molecular recognition by AnOTA of OTA, OTB, and a wide array of model substrates has been investigated by molecular docking simulation. AnOTA shows maximal hydrolytic activity at neutral pH and high temperature (65 °C) and retained high activity after prolonged incubation at 45 °C. The reduction of OTA levels in food products by AnOTA has been investigated using several commercial plant-based beverages. The results showed complete degradation of OTA with no detectable modification of beverage proteins. Therefore, the addition of AnOTA seems to be a useful procedure to eliminate OTA in plant-based beverages. Moreover, computational predictions of in vivo characteristics indicated that AnOTA is neither an allergenic nor antigenic protein. All characteristics found for AnOTA supported the suitability of its use for OTA detoxification in food and feed.
Collapse
Affiliation(s)
- Ana Sánchez-Arroyo
- Bacterial
Biotechnology, Institute of Food Science,
Technology and Nutrition (ICTAN), CSIC, José Antonio Novais 6, 28040 Madrid, Spain
| | - Laura Plaza-Vinuesa
- Bacterial
Biotechnology, Institute of Food Science,
Technology and Nutrition (ICTAN), CSIC, José Antonio Novais 6, 28040 Madrid, Spain
| | - Blanca de las Rivas
- Bacterial
Biotechnology, Institute of Food Science,
Technology and Nutrition (ICTAN), CSIC, José Antonio Novais 6, 28040 Madrid, Spain
| | - José Miguel Mancheño
- Department
of Crystallography and Structural Biology, Institute of Physical Chemistry Blas Cabrera (IQF), CSIC, Serrano 119, 28006 Madrid, Spain
| | - Rosario Muñoz
- Bacterial
Biotechnology, Institute of Food Science,
Technology and Nutrition (ICTAN), CSIC, José Antonio Novais 6, 28040 Madrid, Spain
| |
Collapse
|
3
|
González-Esparragoza D, Carrasco-Carballo A, Rosas-Murrieta NH, Millán-Pérez Peña L, Luna F, Herrera-Camacho I. In Silico Analysis of Protein-Protein Interactions of Putative Endoplasmic Reticulum Metallopeptidase 1 in Schizosaccharomyces pombe. Curr Issues Mol Biol 2024; 46:4609-4629. [PMID: 38785548 PMCID: PMC11120530 DOI: 10.3390/cimb46050280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/26/2024] [Accepted: 05/07/2024] [Indexed: 05/25/2024] Open
Abstract
Ermp1 is a putative metalloprotease from Schizosaccharomyces pombe and a member of the Fxna peptidases. Although their function is unknown, orthologous proteins from rats and humans have been associated with the maturation of ovarian follicles and increased ER stress. This study focuses on proposing the first prediction of PPI by comparison of the interologues between humans and yeasts, as well as the molecular docking and dynamics of the M28 domain of Ermp1 with possible target proteins. As results, 45 proteins are proposed that could interact with the metalloprotease. Most of these proteins are related to the transport of Ca2+ and the metabolism of amino acids and proteins. Docking and molecular dynamics suggest that the M28 domain of Ermp1 could hydrolyze leucine and methionine residues of Amk2, Ypt5 and Pex12. These results could support future experimental investigations of other Fxna peptidases, such as human ERMP1.
Collapse
Affiliation(s)
- Dalia González-Esparragoza
- Laboratorio de Bioquímica y Biología Molecular, Centro de Química del Instituto de Ciencias (ICUAP), Benemérita Universidad Autónoma de Puebla, Puebla 72570, Mexico; (D.G.-E.); (N.H.R.-M.); (L.M.-P.P.)
- Laboratorio de Elucidación y Síntesis en Química Orgánica, Instituto de Ciencias de la Universidad Autónoma de Puebla (ICUAP), Benemérita Universidad Autónoma de Puebla, Puebla 72570, Mexico
| | - Alan Carrasco-Carballo
- Laboratorio de Elucidación y Síntesis en Química Orgánica, Instituto de Ciencias de la Universidad Autónoma de Puebla (ICUAP), Benemérita Universidad Autónoma de Puebla, Puebla 72570, Mexico
- Consejo Nacional de Humanidades Ciencia y Tecnología, Instituto de Ciencias de la Universidad Autónoma de Puebla (ICUAP), Benemérita Universidad Autónoma de Puebla, Puebla 72570, Mexico
| | - Nora H. Rosas-Murrieta
- Laboratorio de Bioquímica y Biología Molecular, Centro de Química del Instituto de Ciencias (ICUAP), Benemérita Universidad Autónoma de Puebla, Puebla 72570, Mexico; (D.G.-E.); (N.H.R.-M.); (L.M.-P.P.)
| | - Lourdes Millán-Pérez Peña
- Laboratorio de Bioquímica y Biología Molecular, Centro de Química del Instituto de Ciencias (ICUAP), Benemérita Universidad Autónoma de Puebla, Puebla 72570, Mexico; (D.G.-E.); (N.H.R.-M.); (L.M.-P.P.)
| | - Felix Luna
- Laboratorio de Neuroendocrinología, Facultad de Ciencias Químicas, Benemérita Universidad Autónoma de Puebla, Puebla 72570, Mexico;
| | - Irma Herrera-Camacho
- Laboratorio de Bioquímica y Biología Molecular, Centro de Química del Instituto de Ciencias (ICUAP), Benemérita Universidad Autónoma de Puebla, Puebla 72570, Mexico; (D.G.-E.); (N.H.R.-M.); (L.M.-P.P.)
| |
Collapse
|
4
|
Shen L, Sun X, Chen Z, Guo Y, Shen Z, Song Y, Xin W, Ding H, Ma X, Xu W, Zhou W, Che J, Tan L, Chen L, Chen S, Dong X, Fang L, Zhu F. ADCdb: the database of antibody-drug conjugates. Nucleic Acids Res 2024; 52:D1097-D1109. [PMID: 37831118 PMCID: PMC10768060 DOI: 10.1093/nar/gkad831] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/07/2023] [Accepted: 09/28/2023] [Indexed: 10/14/2023] Open
Abstract
Antibody-drug conjugates (ADCs) are a class of innovative biopharmaceutical drugs, which, via their antibody (mAb) component, deliver and release their potent warhead (a.k.a. payload) at the disease site, thereby simultaneously improving the efficacy of delivered therapy and reducing its off-target toxicity. To design ADCs of promising efficacy, it is crucial to have the critical data of pharma-information and biological activities for each ADC. However, no such database has been constructed yet. In this study, a database named ADCdb focusing on providing ADC information (especially its pharma-information and biological activities) from multiple perspectives was thus developed. Particularly, a total of 6572 ADCs (359 approved by FDA or in clinical trial pipeline, 501 in preclinical test, 819 with in-vivo testing data, 1868 with cell line/target testing data, 3025 without in-vivo/cell line/target testing data) together with their explicit pharma-information was collected and provided. Moreover, a total of 9171 literature-reported activities were discovered, which were identified from diverse clinical trial pipelines, model organisms, patient/cell-derived xenograft models, etc. Due to the significance of ADCs and their relevant data, this new database was expected to attract broad interests from diverse research fields of current biopharmaceutical drug discovery. The ADCdb is now publicly accessible at: https://idrblab.org/adcdb/.
Collapse
Affiliation(s)
- Liteng Shen
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhen Chen
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yu Guo
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Zheyuan Shen
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yi Song
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Wenxiu Xin
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
| | - Haiying Ding
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
| | - Xinyue Ma
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
| | - Weiben Xu
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China
| | - Wanying Zhou
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
| | - Jinxin Che
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Lili Tan
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
| | - Liangsheng Chen
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
| | - Siqi Chen
- School of Pharmaceutical Science, Zhejiang Chinese Medical University, Hangzhou 310053, China
| | - Xiaowu Dong
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China
| | - Luo Fang
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China
- School of Pharmaceutical Science, Zhejiang Chinese Medical University, Hangzhou 310053, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
5
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and design of protease enzyme specificity using a structure-aware graph convolutional network. Proc Natl Acad Sci U S A 2023; 120:e2303590120. [PMID: 37729196 PMCID: PMC10523478 DOI: 10.1073/pnas.2303590120] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 08/14/2023] [Indexed: 09/22/2023] Open
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key posttranslational modification involved in physiology and disease. The ability to robustly and rapidly predict protease-substrate specificity would also enable targeted proteolytic cleavage by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pretrained PGCN model to guide the design of protease libraries for cleaving two noncanonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Joseph H. Lubin
- Department of Chemistry and Chemical Biology, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | | | - Guanyang Wang
- Department of Statistics, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
- Department of Statistics, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
- Department of Chemistry and Chemical Biology, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| |
Collapse
|
6
|
Li F, Wang C, Guo X, Akutsu T, Webb GI, Coin LJM, Kurgan L, Song J. ProsperousPlus: a one-stop and comprehensive platform for accurate protease-specific substrate cleavage prediction and machine-learning model construction. Brief Bioinform 2023; 24:bbad372. [PMID: 37874948 DOI: 10.1093/bib/bbad372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 08/30/2023] [Accepted: 09/29/2023] [Indexed: 10/26/2023] Open
Abstract
Proteases contribute to a broad spectrum of cellular functions. Given a relatively limited amount of experimental data, developing accurate sequence-based predictors of substrate cleavage sites facilitates a better understanding of protease functions and substrate specificity. While many protease-specific predictors of substrate cleavage sites were developed, these efforts are outpaced by the growth of the protease substrate cleavage data. In particular, since data for 100+ protease types are available and this number continues to grow, it becomes impractical to publish predictors for new protease types, and instead it might be better to provide a computational platform that helps users to quickly and efficiently build predictors that address their specific needs. To this end, we conceptualized, developed, tested and released a versatile bioinformatics platform, ProsperousPlus, that empowers users, even those with no programming or little bioinformatics background, to build fast and accurate predictors of substrate cleavage sites. ProsperousPlus facilitates the use of the rapidly accumulating substrate cleavage data to train, empirically assess and deploy predictive models for user-selected substrate types. Benchmarking tests on test datasets show that our platform produces predictors that on average exceed the predictive performance of current state-of-the-art approaches. ProsperousPlus is available as a webserver and a stand-alone software package at http://prosperousplus.unimelb-biotools.cloud.edu.au/.
Collapse
Affiliation(s)
- Fuyi Li
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
- South Australian immunoGENomics Cancer Institute (SAiGENCI), Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Cong Wang
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Xudong Guo
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Data Futures Institute, Monash University, VIC 3800, Australia
| | - Lachlan J M Coin
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Jiangning Song
- Monash Data Futures Institute, Monash University, VIC 3800, Australia
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
| |
Collapse
|
7
|
Maasch JRMA, Torres MDT, Melo MCR, de la Fuente-Nunez C. Molecular de-extinction of ancient antimicrobial peptides enabled by machine learning. Cell Host Microbe 2023; 31:1260-1274.e6. [PMID: 37516110 DOI: 10.1016/j.chom.2023.07.001] [Citation(s) in RCA: 38] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 05/12/2023] [Accepted: 07/06/2023] [Indexed: 07/31/2023]
Abstract
Molecular de-extinction could offer avenues for drug discovery by reintroducing bioactive molecules that are no longer encoded by extant organisms. To prospect for antimicrobial peptides encrypted within extinct and extant human proteins, we introduce the panCleave random forest model for proteome-wide cleavage site prediction. Our model outperformed multiple protease-specific cleavage site classifiers for three modern human caspases, despite its pan-protease design. Antimicrobial activity was observed in vitro for modern and archaic protein fragments identified with panCleave. Lead peptides showed resistance to proteolysis and exhibited variable membrane permeabilization. Additionally, representative modern and archaic protein fragments showed anti-infective efficacy against A. baumannii in both a skin abscess infection model and a preclinical murine thigh infection model. These results suggest that machine-learning-based encrypted peptide prospection can identify stable, nontoxic peptide antibiotics. Moreover, we establish molecular de-extinction through paleoproteome mining as a framework for antibacterial drug discovery.
Collapse
Affiliation(s)
- Jacqueline R M A Maasch
- Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104, USA; Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Bioengineering, Department of Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104, USA; Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Marcelo D T Torres
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Bioengineering, Department of Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104, USA; Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Marcelo C R Melo
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Bioengineering, Department of Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104, USA; Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Bioengineering, Department of Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104, USA; Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
8
|
Matveev EV, Safronov VV, Ponomarev GV, Kazanov MD. Predicting Structural Susceptibility of Proteins to Proteolytic Processing. Int J Mol Sci 2023; 24:10761. [PMID: 37445939 DOI: 10.3390/ijms241310761] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 06/16/2023] [Accepted: 06/26/2023] [Indexed: 07/15/2023] Open
Abstract
The importance of 3D protein structure in proteolytic processing is well known. However, despite the plethora of existing methods for predicting proteolytic sites, only a few of them utilize the structural features of potential substrates as predictors. Moreover, to our knowledge, there is currently no method available for predicting the structural susceptibility of protein regions to proteolysis. We developed such a method using data from CutDB, a database that contains experimentally verified proteolytic events. For prediction, we utilized structural features that have been shown to influence proteolysis in earlier studies, such as solvent accessibility, secondary structure, and temperature factor. Additionally, we introduced new structural features, including length of protruded loops and flexibility of protein termini. To maximize the prediction quality of the method, we carefully curated the training set, selected an appropriate machine learning method, and sampled negative examples to determine the optimal positive-to-negative class size ratio. We demonstrated that combining our method with models of protease primary specificity can outperform existing bioinformatics methods for the prediction of proteolytic sites. We also discussed the possibility of utilizing this method for bioinformatics prediction of other post-translational modifications.
Collapse
Affiliation(s)
- Evgenii V Matveev
- Skolkovo Institute of Science and Technology, Moscow 121205, Russia
- A.A. Kharkevich Institute for Information Transmission Problems, Moscow 127051, Russia
- Dmitry Rogachev National Medical Research Center of Pediatric Hematology, Oncology and Immunology, Moscow 117998, Russia
| | - Vyacheslav V Safronov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow 119991, Russia
| | - Gennady V Ponomarev
- Skolkovo Institute of Science and Technology, Moscow 121205, Russia
- A.A. Kharkevich Institute for Information Transmission Problems, Moscow 127051, Russia
| | - Marat D Kazanov
- Skolkovo Institute of Science and Technology, Moscow 121205, Russia
- A.A. Kharkevich Institute for Information Transmission Problems, Moscow 127051, Russia
- Dmitry Rogachev National Medical Research Center of Pediatric Hematology, Oncology and Immunology, Moscow 117998, Russia
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| |
Collapse
|
9
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and Design of Protease Enzyme Specificity Using a Structure-Aware Graph Convolutional Network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.16.528728. [PMID: 36824945 PMCID: PMC9949123 DOI: 10.1101/2023.02.16.528728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key post-translational modification involved in physiology and disease. The ability to robustly and rapidly predict protease substrate specificity would also enable targeted proteolytic cleavage - editing - of a target protein by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally-derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the three-dimensional structure and energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically-grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases: the NS3/4 protease from the Hepatitis C virus (HCV) and the Tobacco Etch Virus (TEV) proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pre-trained PGCN model to guide the design of TEV protease libraries for cleaving two non-canonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Joseph H. Lubin
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | | | - Guanyang Wang
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| |
Collapse
|
10
|
Grolmusz VK, Nagy P, Likó I, Butz H, Pócza T, Bozsik A, Papp J, Oláh E, Patócs A. A common genetic variation in GZMB may associate with cancer risk in patients with Lynch syndrome. Front Oncol 2023; 13:1005066. [PMID: 36890824 PMCID: PMC9986427 DOI: 10.3389/fonc.2023.1005066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 02/10/2023] [Indexed: 02/22/2023] Open
Abstract
Lynch syndrome (LS), also known as hereditary nonpolyposis colorectal cancer syndrome (HNPCC) is a common genetic predisposition to cancer due to germline mutations in genes affecting DNA mismatch repair. Due to mismatch repair deficiency, developing tumors are characterized by microsatellite instability (MSI-H), high frequency of expressed neoantigens and good clinical response to immune checkpoint inhibitors. Granzyme B (GrB) is the most abundant serine protease in the granules of cytotoxic T-cells and natural killer cells, mediating anti-tumor immunity. However, recent results confirm a diverse range of physiological functions of GrB including that in extracellular matrix remodelling, inflammation and fibrosis. In the present study, our aim was to investigate whether a frequent genetic variation of GZMB, the gene encoding GrB, constituted by three missense single nucleotide polymorphisms (rs2236338, rs11539752 and rs8192917) has any association with cancer risk in individuals with LS. In silico analysis and genotype calls from whole exome sequencing data in the Hungarian population confirmed that these SNPs are closely linked. Genotyping results of rs8192917 on a cohort of 145 individuals with LS demonstrated an association of the CC genotype with lower cancer risk. In silico prediction proposed likely GrB cleavage sites in a high proportion of shared neontigens in MSI-H tumors. Our results propose the CC genotype of rs8192917 as a potential disease-modifying genetic factor in LS.
Collapse
Affiliation(s)
- Vince Kornél Grolmusz
- Department of Molecular Genetics, National Institute of Oncology, Budapest, Hungary.,Hereditary Cancers Research Group, Eötvös Loránd Research Network - Semmelweis University, Budapest, Hungary.,Department of Laboratory Medicine, Semmelweis University, Budapest, Hungary.,National Tumorbiology Laboratory, National Institute of Oncology, Budapest, Hungary
| | - Petra Nagy
- Department of Molecular Genetics, National Institute of Oncology, Budapest, Hungary
| | - István Likó
- Hereditary Cancers Research Group, Eötvös Loránd Research Network - Semmelweis University, Budapest, Hungary.,National Tumorbiology Laboratory, National Institute of Oncology, Budapest, Hungary
| | - Henriett Butz
- Department of Molecular Genetics, National Institute of Oncology, Budapest, Hungary.,Hereditary Cancers Research Group, Eötvös Loránd Research Network - Semmelweis University, Budapest, Hungary.,Department of Laboratory Medicine, Semmelweis University, Budapest, Hungary.,National Tumorbiology Laboratory, National Institute of Oncology, Budapest, Hungary.,National Oncology Biobank Center, National Institute of Oncology, Budapest, Hungary
| | - Tímea Pócza
- Department of Molecular Genetics, National Institute of Oncology, Budapest, Hungary
| | - Anikó Bozsik
- Department of Molecular Genetics, National Institute of Oncology, Budapest, Hungary.,Hereditary Cancers Research Group, Eötvös Loránd Research Network - Semmelweis University, Budapest, Hungary.,National Tumorbiology Laboratory, National Institute of Oncology, Budapest, Hungary
| | - János Papp
- Department of Molecular Genetics, National Institute of Oncology, Budapest, Hungary.,Hereditary Cancers Research Group, Eötvös Loránd Research Network - Semmelweis University, Budapest, Hungary.,National Tumorbiology Laboratory, National Institute of Oncology, Budapest, Hungary
| | - Edit Oláh
- Department of Molecular Genetics, National Institute of Oncology, Budapest, Hungary
| | - Attila Patócs
- Department of Molecular Genetics, National Institute of Oncology, Budapest, Hungary.,Hereditary Cancers Research Group, Eötvös Loránd Research Network - Semmelweis University, Budapest, Hungary.,Department of Laboratory Medicine, Semmelweis University, Budapest, Hungary.,National Tumorbiology Laboratory, National Institute of Oncology, Budapest, Hungary
| |
Collapse
|
11
|
Henehan GT, Ryan BJ, Kinsella GK. Approaches to Avoid Proteolysis During Protein Expression and Purification. Methods Mol Biol 2023; 2699:77-95. [PMID: 37646995 DOI: 10.1007/978-1-0716-3362-5_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
All cells contain proteases, which hydrolyze the peptide bonds between amino acids of a protein backbone. Typically, proteases are prevented from nonspecific proteolysis by regulation and by their physical separation into different subcellular compartments; however, this segregation is not retained during cell lysis, which is the initial step in any protein isolation procedure. Prevention of proteolysis during protein purification often takes the form of a two-pronged approach: first, inhibition of proteolysis in situ, followed by the early separation of the protease from the protein of interest via chromatographic purification. Protease inhibitors are routinely used to limit the effect of the proteases before they are physically separated from the protein of interest via column chromatography. In this chapter, commonly used approaches to reducing or avoiding proteolysis during protein expression and purification are reviewed.
Collapse
Affiliation(s)
- Gary T Henehan
- School of Food Science and Environmental Health, Technological University Dublin, Grangegorman, Dublin, Ireland
| | - Barry J Ryan
- School of Food Science and Environmental Health, Technological University Dublin, Grangegorman, Dublin, Ireland
| | - Gemma K Kinsella
- School of Food Science and Environmental Health, Technological University Dublin, Grangegorman, Dublin, Ireland.
| |
Collapse
|
12
|
Onah E, Uzor PF, Ugwoke IC, Eze JU, Ugwuanyi ST, Chukwudi IR, Ibezim A. Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors. BMC Bioinformatics 2022; 23:466. [DOI: 10.1186/s12859-022-05017-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 10/11/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for the treatment of human immunodeficiency virus (HIV) infection, which is responsible for AIDS. Peptide cleavage by HIV-1 protease is an essential step in the replication of HIV-1. Thus, correct and timely prediction of the cleavage site of HIV-1 protease can significantly speed up and optimize the drug discovery process of novel HIV-1 protease inhibitors. In this work, we built and compared the performance of selected machine learning models for the prediction of HIV-1 protease cleavage site utilizing a hybrid of octapeptide sequence information comprising bond composition, amino acid binary profile (AABP), and physicochemical properties as numerical descriptors serving as input variables for some selected machine learning algorithms. Our work differs from antecedent studies exploring the same subject in the combination of octapeptide descriptors and method used. Instead of using various subsets of the dataset for training and testing the models, we combined the dataset, applied a 3-way data split, and then used a "stratified" 10-fold cross-validation technique alongside the testing set to evaluate the models.
Results
Among the 8 models evaluated in the “stratified” 10-fold CV experiment, logistic regression, multi-layer perceptron classifier, linear discriminant analysis, gradient boosting classifier, Naive Bayes classifier, and decision tree classifier with AUC, F-score, and B. Acc. scores in the ranges of 0.91–0.96, 0.81–0.88, and 80.1–86.4%, respectively, have the closest predictive performance to the state-of-the-art model (AUC 0.96, F-score 0.80 and B. Acc. ~ 80.0%). Whereas, the perceptron classifier and the K-nearest neighbors had statistically lower performance (AUC 0.77–0.82, F-score 0.53–0.69, and B. Acc. 60.0–68.5%) at p < 0.05. On the other hand, logistic regression, and multi-layer perceptron classifier (AUC of 0.97, F-score > 0.89, and B. Acc. > 90.0%) had the best performance on further evaluation on the testing set, though linear discriminant analysis, gradient boosting classifier, and Naive Bayes classifier equally performed well (AUC > 0.94, F-score > 0.87, and B. Acc. > 86.0%).
Conclusions
Logistic regression and multi-layer perceptron classifiers have comparable predictive performances to the state-of-the-art model when octapeptide sequence descriptors consisting of AABP, bond composition and standard physicochemical properties are used as input variables. In our future work, we hope to develop a standalone software for HIV-1 protease cleavage site prediction utilizing the linear regression algorithm and the aforementioned octapeptide sequence descriptors.
Collapse
|
13
|
Hu L, Li Z, Tang Z, Zhao C, Zhou X, Hu P. Effectively predicting HIV-1 protease cleavage sites by using an ensemble learning approach. BMC Bioinformatics 2022; 23:447. [PMID: 36303135 PMCID: PMC9608884 DOI: 10.1186/s12859-022-04999-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 10/13/2022] [Indexed: 11/10/2022] Open
Abstract
Background The site information of substrates that can be cleaved by human immunodeficiency virus 1 proteases (HIV-1 PRs) is of great significance for designing effective inhibitors against HIV-1 viruses. A variety of machine learning-based algorithms have been developed to predict HIV-1 PR cleavage sites by extracting relevant features from substrate sequences. However, only relying on the sequence information is not sufficient to ensure a promising performance due to the uncertainty in the way of separating the datasets used for training and testing. Moreover, the existence of noisy data, i.e., false positive and false negative cleavage sites, could negatively influence the accuracy performance. Results In this work, an ensemble learning algorithm for predicting HIV-1 PR cleavage sites, namely EM-HIV, is proposed by training a set of weak learners, i.e., biased support vector machine classifiers, with the asymmetric bagging strategy. By doing so, the impact of data imbalance and noisy data can thus be alleviated. Besides, in order to make full use of substrate sequences, the features used by EM-HIV are collected from three different coding schemes, including amino acid identities, chemical properties and variable-length coevolutionary patterns, for the purpose of constructing more relevant feature vectors of octamers. Experiment results on three independent benchmark datasets demonstrate that EM-HIV outperforms state-of-the-art prediction algorithm in terms of several evaluation metrics. Hence, EM-HIV can be regarded as a useful tool to accurately predict HIV-1 PR cleavage sites.
Collapse
Affiliation(s)
- Lun Hu
- grid.9227.e0000000119573309Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Zhenfeng Li
- grid.162110.50000 0000 9291 3229School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China
| | - Zehai Tang
- grid.162110.50000 0000 9291 3229School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China
| | - Cheng Zhao
- grid.162110.50000 0000 9291 3229School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China
| | - Xi Zhou
- grid.9227.e0000000119573309Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Pengwei Hu
- grid.9227.e0000000119573309Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| |
Collapse
|
14
|
Fan Y, Peng B. StackEPI: identification of cell line-specific enhancer-promoter interactions based on stacking ensemble learning. BMC Bioinformatics 2022; 23:272. [PMID: 35820811 PMCID: PMC9277947 DOI: 10.1186/s12859-022-04821-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 07/01/2022] [Indexed: 11/10/2022] Open
Abstract
Background Understanding the regulatory role of enhancer–promoter interactions (EPIs) on specific gene expression in cells contributes to the understanding of gene regulation, cell differentiation, etc., and its identification has been a challenging task. On the one hand, using traditional wet experimental methods to identify EPIs often means a lot of human labor and time costs. On the other hand, although the currently proposed computational methods have good recognition effects, they generally require a long training time. Results In this study, we studied the EPIs of six human cell lines and designed a cell line-specific EPIs prediction method based on a stacking ensemble learning strategy, which has better prediction performance and faster training speed, called StackEPI. Specifically, by combining different encoding schemes and machine learning methods, our prediction method can extract the cell line-specific effective information of enhancer and promoter gene sequences comprehensively and in many directions, and make accurate recognition of cell line-specific EPIs. Ultimately, the source code to implement StackEPI and experimental data involved in the experiment are available at https://github.com/20032303092/StackEPI.git. Conclusions The comparison results show that our model can deliver better performance on the problem of identifying cell line-specific EPIs and outperform other state-of-the-art models. In addition, our model also has a more efficient computation speed. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04821-9.
Collapse
Affiliation(s)
- Yongxian Fan
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China.
| | - Binchao Peng
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China
| |
Collapse
|
15
|
Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2499:285-322. [PMID: 35696087 DOI: 10.1007/978-1-0716-2317-6_15] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Posttranslational modification (PTM ) is a ubiquitous phenomenon in both eukaryotes and prokaryotes which gives rise to enormous proteomic diversity. PTM mostly comes in two flavors: covalent modification to polypeptide chain and proteolytic cleavage. Understanding and characterization of PTM is a fundamental step toward understanding the underpinning of biology. Recent advances in experimental approaches, mainly mass-spectrometry-based approaches, have immensely helped in obtaining and characterizing PTMs. However, experimental approaches are not enough to understand and characterize more than 450 different types of PTMs and complementary computational approaches are becoming popular. Recently, due to the various advancements in the field of Deep Learning (DL), along with the explosion of applications of DL to various fields, the field of computational prediction of PTM has also witnessed the development of a plethora of deep learning (DL)-based approaches. In this book chapter, we first review some recent DL-based approaches in the field of PTM site prediction. In addition, we also review the recent advances in the not-so-studied PTM , that is, proteolytic cleavage predictions. We describe advances in PTM prediction by highlighting the Deep learning architecture, feature encoding, novelty of the approaches, and availability of the tools/approaches. Finally, we provide an outlook and possible future research directions for DL-based approaches for PTM prediction.
Collapse
|
16
|
Mirabelli C, Jones MK, Young VL, Kolawole AO, Owusu I, Shan M, Abuaita B, Turula H, Trevino JG, Grigorova I, Lundy SK, Lyssiotis CA, Ward VK, Karst SM, Wobus CE. Human Norovirus Triggers Primary B Cell Immune Activation In Vitro. mBio 2022; 13:e0017522. [PMID: 35404121 PMCID: PMC9040803 DOI: 10.1128/mbio.00175-22] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 03/04/2022] [Indexed: 12/15/2022] Open
Abstract
Human norovirus (HNoV) is a global health and socioeconomic burden, estimated to infect every individual at least five times during their lifetime. The underlying mechanism for the potential lack of long-term immune protection from HNoV infections is not understood and prompted us to investigate HNoV susceptibility of primary human B cells and its functional impact. Primary B cells isolated from whole blood were infected with HNoV-positive stool samples and harvested at 3 days postinfection (dpi) to assess the viral RNA yield by reverse transcriptase quantitative PCR (RT-qPCR). A 3- to 18-fold increase in the HNoV RNA yield was observed in 50 to 60% of donors. Infection was further confirmed in B cells derived from splenic and lymph node biopsy specimens. Next, we characterized infection of whole-blood-derived B cells by flow cytometry in specific functional B cell subsets (naive CD27- IgD+, memory-switched CD27+ IgD-, memory-unswitched CD27+ IgD+, and double-negative CD27- IgD- cells). While the susceptibilities of the subsets were similar, changes in the B cell subset distribution upon infection were observed, which were also noted after treatment with HNoV virus-like particles and the predicted recombinant NS1 protein. Importantly, primary B cell stimulation with the predicted recombinant NS1 protein triggered B cell activation and induced metabolic changes. These data demonstrate that primary B cells are susceptible to HNoV infection and suggest that the NS1 protein can alter B cell activation and metabolism in vitro, which could have implications for viral pathogenesis and immune responses in vivo. IMPORTANCE Human norovirus (HNoV) is the most prevalent causative agent of gastroenteritis worldwide. Infection results in a self-limiting disease that can become chronic and severe in the immunocompromised, the elderly, and infants. There are currently no approved therapeutic and preventative strategies to limit the health and socioeconomic burdens associated with HNoV infections. Moreover, HNoV does not elicit lifelong immunity as repeat infections are common, presenting a challenge for vaccine development. Given the importance of B cells for humoral immunity, we investigated the susceptibility and impact of HNoV infection on human B cells. We found that HNoV replicates in human primary B cells derived from blood, spleen, and lymph node specimens, while the nonstructural protein NS1 can activate B cells. Because of the secreted nature of NS1, we put forward the hypothesis that HNoV infection can modulate bystander B cell function with potential impacts on systemic immune responses.
Collapse
Affiliation(s)
- Carmen Mirabelli
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
| | - Melissa K. Jones
- Department of Molecular Genetics and Microbiology, College of Medicine, University of Florida, Gainesville, Florida, USA
- Department of Microbiology and Cell Science, IFAS, University of Florida, Gainesville, Florida, USA
| | - Vivienne L. Young
- Department of Microbiology and Immunology, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Abimbola O. Kolawole
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
| | - Irene Owusu
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
- West African Center for Cell Biology of Infectious Pathogens, Department of Biochemistry, Cell and Molecular Biology, University of Ghana, Legon, Accra, Ghana
| | - Mengrou Shan
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, Michigan, USA
| | - Basel Abuaita
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
| | - Holly Turula
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
- Graduate Program in Immunology, University of Michigan, Ann Arbor, Michigan, USA
| | - Jose G. Trevino
- Division of Surgical Oncology, Department of Surgery, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Irina Grigorova
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
| | - Steven K. Lundy
- Division of Rheumatology, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Costas A. Lyssiotis
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, Michigan, USA
| | - Vernon K. Ward
- Department of Microbiology and Immunology, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Stephanie M. Karst
- Department of Molecular Genetics and Microbiology, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Christiane E. Wobus
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
17
|
Tibbs E, Cao X. Emerging Canonical and Non-Canonical Roles of Granzyme B in Health and Disease. Cancers (Basel) 2022; 14:1436. [PMID: 35326588 PMCID: PMC8946077 DOI: 10.3390/cancers14061436] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 03/05/2022] [Accepted: 03/08/2022] [Indexed: 12/23/2022] Open
Abstract
The Granzyme (Gzm) family has classically been recognized as a cytotoxic tool utilized by cytotoxic T lymphocytes (CTL) and natural killer (NK) cells to illicit cell death to infected and cancerous cells. Their importance is established based on evidence showing that deficiencies in these cell death executors result in defective immune responses. Recent findings have shown the importance of Granzyme B (GzmB) in regulatory immune cells, which may contribute to tumor growth and immune evasion during cancer development. Other studies have shown that members of the Gzm family are important for biological processes such as extracellular matrix remodeling, angiogenesis and organized vascular degradation. With this growing body of evidence, it is becoming more important to understand the broader function of Gzm's rather than a specific executor of cell death, and we should be aware of the many alternative roles that Gzm's play in physiological and pathological conditions. Therefore, we review the classical as well as novel non-canonical functions of GzmB and discuss approaches to utilize these new findings to address current gaps in our understanding of the immune system and tissue development.
Collapse
Affiliation(s)
- Ellis Tibbs
- Department of Microbiology and Immunology, School of Medicine, University of Maryland Baltimore, Baltimore, MD 21201, USA;
| | - Xuefang Cao
- Department of Microbiology and Immunology, School of Medicine, University of Maryland Baltimore, Baltimore, MD 21201, USA;
- Marlene and Stewart Greenebaum Comprehensive Cancer Center, University of Maryland Baltimore, Baltimore, MD 21201, USA
| |
Collapse
|
18
|
Uzozie AC, Smith TG, Chen S, Lange PF. Sensitive Identification of Known and Unknown Protease Activities by Unsupervised Linear Motif Deconvolution. Anal Chem 2022; 94:2244-2254. [PMID: 35029975 DOI: 10.1021/acs.analchem.1c04937] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
The cleavage-site specificities for many proteases are not well understood, restricting the utility of supervised classification methods. We present an algorithm and web interface to overcome this limitation through the unsupervised detection of overrepresented patterns in protein sequence data, providing insight into the mixture of protease activities contributing to a complex system. Here, we apply the RObust LInear Motif Deconvolution (RoLiM) algorithm to confidently detect substrate cleavage patterns for SARS-CoV-2 MPro protease in the N-terminome data of an infected human cell line. Using mass spectrometry-based peptide data from a case-control comparison of 341 primary urothelial bladder cancer cases and 110 controls, we identified distinct sequence motifs indicative of increased matrix metallopeptidase activity in urine from cancer patients. The evaluation of N-terminal peptides from patient plasma post-chemotherapy detected novel granzyme B/corin activity. RoLiM will enhance the unbiased investigation of peptide sequences to establish the composition of known and uncharacterized protease activities in biological systems. RoLiM is available at http://langelab.org/rolim/.
Collapse
Affiliation(s)
- Anuli C Uzozie
- Department of Pathology, University of British Columbia, Vancouver, British Columbia V6T 1Z7, Canada.,Michael Cuccione Childhood Cancer Research Program, BC Children's Hospital Research Institute, Vancouver, British Columbia V5Z 4H4, Canada
| | - Theodore G Smith
- Department of Pathology, University of British Columbia, Vancouver, British Columbia V6T 1Z7, Canada.,Michael Cuccione Childhood Cancer Research Program, BC Children's Hospital Research Institute, Vancouver, British Columbia V5Z 4H4, Canada
| | - Siyuan Chen
- Department of Pathology, University of British Columbia, Vancouver, British Columbia V6T 1Z7, Canada.,Michael Cuccione Childhood Cancer Research Program, BC Children's Hospital Research Institute, Vancouver, British Columbia V5Z 4H4, Canada
| | - Philipp F Lange
- Department of Pathology, University of British Columbia, Vancouver, British Columbia V6T 1Z7, Canada.,Michael Cuccione Childhood Cancer Research Program, BC Children's Hospital Research Institute, Vancouver, British Columbia V5Z 4H4, Canada.,Department of Molecular Oncology, BC Cancer, Vancouver, British Columbia V5Z 1L3, Canada
| |
Collapse
|
19
|
Behzadipour Y, Hemmati S. Viral Prefusion Targeting Using Entry Inhibitor Peptides: The Case of SARS-CoV-2 and Influenza A virus. Int J Pept Res Ther 2022; 28:42. [PMID: 35002586 PMCID: PMC8722418 DOI: 10.1007/s10989-021-10357-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/22/2021] [Indexed: 12/11/2022]
Abstract
In this study, peptide entry inhibitors against the fusion processes of severe acute respiratory syndrome coronavirus-2 (SCV2) and influenza A virus (IAV) were designed and evaluated. Fusion inhibitor peptides targeting the conformational shift of the viral fusion protein were designed based on the relatively conserved sequence of HR2 from SCV2 spike protein and the conserved fusion peptide from hemagglutinin (HA) of IAV. Helical HR2 peptides bind more efficiently to HR1 trimer, while helical amphipathic anti-IAV peptides have higher cell penetration and endosomal uptake. The initial sequences were mutated by increasing the amphipathicity, using helix favoring residues, and residues likely to form salt- and disulfide-bridges. After docking against their targets, all anti-SCV2 designed peptides bonded with the HR1 3-helical bundle's hydrophobic crevice, while AntiSCV2P1, AntiSCV2P3, AntiSCV2P7, and AntiSCV2P8 expected to form coiled coils with at least one of the HR1 strands. Four of the designed anti-IAV peptides were cell-penetrating (AntiIAVP2, AntiIAVP3, AntiIAVP4, AntiIAVP7). All of them interacted with the fusion peptide of HA and some of the residues in the conserved hydrophobic pocket of HA2 in H1N1, H3N1, and H5N1 subtypes of IAV. AntiIAVP3 and AntiIAVP4 peptides had the best binding to HA2 conserved hydrophobic pocket, while, AntiIAVP2 and AntiIAVP6 showed the best binding to the fusion peptide region. According to analyses for in-vivo administration, AntiSCV2P1, AntiSCV2P7, AntiIAVP2, and AntiIAVP7 were the best candidates. AntiSCV2 and AntiIAV peptides were also conjugated using an in vivo cleavable linker sensitive to TMPRSS2 applicable as a single therapeutic in coinfections or uncertain diagnosis.
Collapse
Affiliation(s)
- Yasaman Behzadipour
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, P.O. Box 71345-1583, Shiraz, Iran
| | - Shiva Hemmati
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, P.O. Box 71345-1583, Shiraz, Iran.,Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran.,Biotechnology Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| |
Collapse
|
20
|
Li F, Dong S, Leier A, Han M, Guo X, Xu J, Wang X, Pan S, Jia C, Zhang Y, Webb GI, Coin LJM, Li C, Song J. Positive-unlabeled learning in bioinformatics and computational biology: a brief review. Brief Bioinform 2021; 23:6415313. [PMID: 34729589 DOI: 10.1093/bib/bbab461] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/27/2021] [Accepted: 10/07/2021] [Indexed: 12/14/2022] Open
Abstract
Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.
Collapse
Affiliation(s)
- Fuyi Li
- Monash University, Australia
| | | | - André Leier
- Department of Genetics, UAB School of Medicine, USA
| | - Meiya Han
- Department of Biochemistry and Molecular Biology, Monash University, Australia
| | | | - Jing Xu
- Computer Science and Technology from Nankai University, China
| | - Xiaoyu Wang
- Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute, Monash University, Australia
| | - Shirui Pan
- University of Technology Sydney (UTS), Ultimo, NSW, Australia
| | - Cangzhi Jia
- College of Science, Dalian Maritime University, Australia
| | - Yang Zhang
- Northwestern Polytechnical University, China
| | - Geoffrey I Webb
- Faculty of Information Technology at Monash University, Australia
| | - Lachlan J M Coin
- Department of Clinical Pathology, University of Melbourne, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry of Molecular Biology, Monash University, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash University, Melbourne, Australia
| |
Collapse
|
21
|
Pereiro P, Lama R, Figueras A, Novoa B. Characterization of the turbot (Scophthalmus maximus) interleukin-18: Identification of splicing variants, phylogeny, synteny and expression analysis. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2021; 124:104199. [PMID: 34228995 DOI: 10.1016/j.dci.2021.104199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 07/02/2021] [Accepted: 07/02/2021] [Indexed: 06/13/2023]
Abstract
Interleukin-18 (IL-18) is a pro-inflammatory cytokine that belongs to the interleukin-1 (IL-1) family of cytokines. As occurs with IL-1β, it is synthetized as an inactive precursor peptide that is mainly processed by the cysteine protease caspase-1 in the inflammasome complex. In mammals, and in collaboration with IL-12, it has been described as an important cytokine controlling the Th1-mediated immune responses through the induction of IFN-γ. Although its function in mammals is well stablished, the activity of this cytokine in teleost remains to be elucidated. This could be due, among other things, to the absence of this gene in the fish model species zebrafish, but also to its complex regulation. As it was observed for rainbow trout and human, il18 splicing variants were also found in turbot, which could represent a regulatory mechanism of its bioactivity. In the case of turbot, three splicing variants were observed (SV1-3), and one of them showed an insertion of 10 amino acids in the middle of the potential caspase-1 cleavage position, reflecting that this is probably a form resistant to the processing by the inflammasome. Phylogenetic and three-dimensional analyses of turbot Il18 revealed that it is relatively well-conserved in vertebrates, although only a partial conservation of the gene synteny was observed between fish and mammals. As it was expected, turbot il18 splicing variants were mainly expressed in immune tissues under healthy conditions, and their expression was induced by a bacterial challenge, although certain inhibitions were observed after viral and parasitic infections. In the case of the viral challenge, il18 downregulations did not seem to be due to the effect of type I IFNs.
Collapse
Affiliation(s)
- Patricia Pereiro
- Instituto de Investigaciones Marinas (IIM), Consejo Superior de Investigaciones Científicas (CSIC), C/ Eduardo Cabello 6, 36208, Vigo, Spain
| | - Raquel Lama
- Instituto de Investigaciones Marinas (IIM), Consejo Superior de Investigaciones Científicas (CSIC), C/ Eduardo Cabello 6, 36208, Vigo, Spain
| | - Antonio Figueras
- Instituto de Investigaciones Marinas (IIM), Consejo Superior de Investigaciones Científicas (CSIC), C/ Eduardo Cabello 6, 36208, Vigo, Spain
| | - Beatriz Novoa
- Instituto de Investigaciones Marinas (IIM), Consejo Superior de Investigaciones Científicas (CSIC), C/ Eduardo Cabello 6, 36208, Vigo, Spain.
| |
Collapse
|
22
|
Feng J, Lee T, Schiessl K, Oldroyd GED. Processing of NODULE INCEPTION controls the transition to nitrogen fixation in root nodules. Science 2021; 374:629-632. [PMID: 34709900 DOI: 10.1126/science.abg2804] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
[Figure: see text].
Collapse
Affiliation(s)
- Jian Feng
- Sainsbury Laboratory, University of Cambridge, 47 Bateman Street, Cambridge CB2 1LR, UK
| | - Tak Lee
- Sainsbury Laboratory, University of Cambridge, 47 Bateman Street, Cambridge CB2 1LR, UK.,Crop Science Centre, University of Cambridge, 93 Lawrence Weaver Road, Cambridge CB3 0LE, UK
| | - Katharina Schiessl
- Sainsbury Laboratory, University of Cambridge, 47 Bateman Street, Cambridge CB2 1LR, UK
| | - Giles E D Oldroyd
- Sainsbury Laboratory, University of Cambridge, 47 Bateman Street, Cambridge CB2 1LR, UK.,Crop Science Centre, University of Cambridge, 93 Lawrence Weaver Road, Cambridge CB3 0LE, UK
| |
Collapse
|
23
|
Fan Y, Wang W. Using multi-layer perceptron to identify origins of replication in eukaryotes via informative features. BMC Bioinformatics 2021; 22:516. [PMID: 34688247 PMCID: PMC8542328 DOI: 10.1186/s12859-021-04431-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 10/04/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The origin is the starting site of DNA replication, an extremely vital part of the informational inheritance between parents and children. More importantly, accurately identifying the origin of replication has great application value in the diagnosis and treatment of diseases related to genetic information errors, while the traditional biological experimental methods are time-consuming and laborious. RESULTS We carried out research on the origin of replication in a variety of eukaryotes and proposed a unique prediction method for each species. Throughout the experiment, we collected data from 7 species, including Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana, Kluyveromyces lactis, Pichia pastoris and Schizosaccharomyces pombe. In addition to the commonly used sequence feature extraction methods PseKNC-II and Base-content, we designed a feature extraction method based on TF-IDF. Then the two-step method was utilized for feature selection. After comparing a variety of traditional machine learning classification models, the multi-layer perceptron was employed as the classification algorithm. Ultimately, the data and codes involved in the experiment are available at https://github.com/Sarahyouzi/EukOriginPredict . CONCLUSIONS The prediction accuracy of the training set of the above-mentioned seven species after 100 times fivefold cross validation reach 92.60%, 90.80%, 91.22%, 96.15%, 96.72%, 99.86%, 96.72%, respectively. It denotes that compared with other methods, the methods we designed could accomplish superior performance. In addition, our experiments reveals that the models of multiple species could predict each other with high accuracy, and the results of STREME shows that they have a certain common motif.
Collapse
Affiliation(s)
- Yongxian Fan
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China.
| | - Wanru Wang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China
| |
Collapse
|
24
|
Zhang S, Zhao L, Zheng CH, Xia J. A feature-based approach to predict hot spots in protein-DNA binding interfaces. Brief Bioinform 2021; 21:1038-1046. [PMID: 30957840 DOI: 10.1093/bib/bbz037] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 02/20/2019] [Accepted: 03/07/2019] [Indexed: 12/21/2022] Open
Abstract
DNA-binding hot spot residues of proteins are dominant and fundamental interface residues that contribute most of the binding free energy of protein-DNA interfaces. As experimental methods for identifying hot spots are expensive and time consuming, computational approaches are urgently required in predicting hot spots on a large scale. In this work, we systematically assessed a wide variety of 114 features from a combination of the protein sequence, structure, network and solvent accessible information and their combinations along with various feature selection strategies for hot spot prediction. We then trained and compared four commonly used machine learning models, namely, support vector machine (SVM), random forest, Naïve Bayes and k-nearest neighbor, for the identification of hot spots using 10-fold cross-validation and the independent test set. Our results show that (1) features based on the solvent accessible surface area have significant effect on hot spot prediction; (2) different but complementary features generally enhance the prediction performance; and (3) SVM outperforms other machine learning methods on both training and independent test sets. In an effort to improve predictive performance, we developed a feature-based method, namely, PrPDH (Prediction of Protein-DNA binding Hot spots), for the prediction of hot spots in protein-DNA binding interfaces using SVM based on the selected 10 optimal features. Comparative results on benchmark data sets indicate that our predictor is able to achieve generally better performance in predicting hot spots compared to the state-of-the-art predictors. A user-friendly web server for PrPDH is well established and is freely available at http://bioinfo.ahu.edu.cn:8080/PrPDH.
Collapse
Affiliation(s)
- Sijia Zhang
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Le Zhao
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Chun-Hou Zheng
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Junfeng Xia
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| |
Collapse
|
25
|
Melo MCR, Maasch JRMA, de la Fuente-Nunez C. Accelerating antibiotic discovery through artificial intelligence. Commun Biol 2021; 4:1050. [PMID: 34504303 PMCID: PMC8429579 DOI: 10.1038/s42003-021-02586-0] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 07/16/2021] [Indexed: 02/07/2023] Open
Abstract
By targeting invasive organisms, antibiotics insert themselves into the ancient struggle of the host-pathogen evolutionary arms race. As pathogens evolve tactics for evading antibiotics, therapies decline in efficacy and must be replaced, distinguishing antibiotics from most other forms of drug development. Together with a slow and expensive antibiotic development pipeline, the proliferation of drug-resistant pathogens drives urgent interest in computational methods that promise to expedite candidate discovery. Strides in artificial intelligence (AI) have encouraged its application to multiple dimensions of computer-aided drug design, with increasing application to antibiotic discovery. This review describes AI-facilitated advances in the discovery of both small molecule antibiotics and antimicrobial peptides. Beyond the essential prediction of antimicrobial activity, emphasis is also given to antimicrobial compound representation, determination of drug-likeness traits, antimicrobial resistance, and de novo molecular design. Given the urgency of the antimicrobial resistance crisis, we analyze uptake of open science best practices in AI-driven antibiotic discovery and argue for openness and reproducibility as a means of accelerating preclinical research. Finally, trends in the literature and areas for future inquiry are discussed, as artificially intelligent enhancements to drug discovery at large offer many opportunities for future applications in antibiotic development.
Collapse
Affiliation(s)
- Marcelo C R Melo
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Jacqueline R M A Maasch
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA.
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
26
|
Hernández-Cuevas NA, Marín-Cervera A, Garcia-Polanco S, Martínez-Vega P, Rosado-Vallado M, Dumonteil E. Fibronectin degradation as biomarker for Trypanosoma cruzi infection and treatment monitoring in mice. Parasitology 2021; 148:1067-1073. [PMID: 34024298 PMCID: PMC11010125 DOI: 10.1017/s0031182021000809] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 05/13/2021] [Accepted: 05/19/2021] [Indexed: 11/06/2022]
Abstract
Biomarkers (coming from host or parasite) to monitor Chagas disease (CD) progression as well as the therapeutic response in chronic CD are critically needed, since seronegativization, which may be considered the best indicator of therapeutic cure, takes several years to be observed in adults. Several molecules have been suggested as biomarkers for CD, however, they have to be validated. Taking advantage of mouse models of Trypanosoma cruzi infection, we investigated changes in the degradation profile of fibronectin in plasma. The degradation profile of fibronectin was different in the acute phase compared to the chronic phase of the infection. Fibronectin fragments of approximately 150, 100, 40 and 30 kDa were identified. Furthermore, those degradation profiles correlated with acute parasitaemia as well as with cardiac parasite burden and tissue damage during the infection. The usefulness of fibronectin degradation as a biomarker for therapeutic response following drug treatment and immunotherapeutic vaccination also was evaluated and a decreased fibronectin degradation profile was observed upon benznidazole or a vaccine candidate treatment.
Collapse
Affiliation(s)
- Nora Adriana Hernández-Cuevas
- Laboratorio de Parasitología, Centro de Investigaciones Regionales ‘Dr. Hideyo Noguchi’, Universidad Autónoma de Yucatán, Mérida, México
| | - Andrea Marín-Cervera
- Laboratorio de Parasitología, Centro de Investigaciones Regionales ‘Dr. Hideyo Noguchi’, Universidad Autónoma de Yucatán, Mérida, México
| | - Shineily Garcia-Polanco
- Laboratorio de Parasitología, Centro de Investigaciones Regionales ‘Dr. Hideyo Noguchi’, Universidad Autónoma de Yucatán, Mérida, México
| | - Pedro Martínez-Vega
- Laboratorio de Parasitología, Centro de Investigaciones Regionales ‘Dr. Hideyo Noguchi’, Universidad Autónoma de Yucatán, Mérida, México
| | - Miguel Rosado-Vallado
- Laboratorio de Parasitología, Centro de Investigaciones Regionales ‘Dr. Hideyo Noguchi’, Universidad Autónoma de Yucatán, Mérida, México
| | - Eric Dumonteil
- Department of Tropical Medicine, School of Public Health and Tropical Medicine, and Vector-Borne and Infectious Disease Research Center, Tulane University, New Orleans, LA, USA
| |
Collapse
|
27
|
He S, Kong L, Chen J. iDNA6mA-Rice-DL: A local web server for identifying DNA N6-methyladenine sites in rice genome by deep learning method. J Bioinform Comput Biol 2021; 19:2150019. [PMID: 34291710 DOI: 10.1142/s0219720021500190] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Accurate detection of N6-methyladenine (6mA) sites by biochemical experiments will help to reveal their biological functions, still, these wet experiments are laborious and expensive. Therefore, it is necessary to introduce a powerful computational model to identify the 6mA sites on a genomic scale, especially for plant genomes. In view of this, we proposed a model called iDNA6mA-Rice-DL for the effective identification of 6mA sites in rice genome, which is an intelligent computing model based on deep learning method. Traditional machine learning methods assume the preparation of the features for analysis. However, our proposed model automatically encodes and extracts key DNA features through an embedded layer and several groups of dense layers. We use an independent dataset to evaluate the generalization ability of our model. An area under the receiver operating characteristic curve (auROC) of 0.98 with an accuracy of 95.96% was obtained. The experiment results demonstrate that our model had good performance in predicting 6mA sites in the rice genome. A user-friendly local web server has been established. The Docker image of the local web server can be freely downloaded at https://hub.docker.com/r/his1server/idna6ma-rice-dl.
Collapse
Affiliation(s)
- Shiqian He
- School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao 066000, P. R. China
| | - Liang Kong
- School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao 066000, P. R. China
| | - Jing Chen
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066000, P. R. China
| |
Collapse
|
28
|
Liang X, Li F, Chen J, Li J, Wu H, Li S, Song J, Liu Q. Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification. Brief Bioinform 2021; 22:bbaa312. [PMID: 33316035 PMCID: PMC8294543 DOI: 10.1093/bib/bbaa312] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 09/30/2020] [Accepted: 08/25/2020] [Indexed: 12/13/2022] Open
Abstract
Anti-cancer peptides (ACPs) are known as potential therapeutics for cancer. Due to their unique ability to target cancer cells without affecting healthy cells directly, they have been extensively studied. Many peptide-based drugs are currently evaluated in the preclinical and clinical trials. Accurate identification of ACPs has received considerable attention in recent years; as such, a number of machine learning-based methods for in silico identification of ACPs have been developed. These methods promote the research on the mechanism of ACPs therapeutics against cancer to some extent. There is a vast difference in these methods in terms of their training/testing datasets, machine learning algorithms, feature encoding schemes, feature selection methods and evaluation strategies used. Therefore, it is desirable to summarize the advantages and disadvantages of the existing methods, provide useful insights and suggestions for the development and improvement of novel computational tools to characterize and identify ACPs. With this in mind, we firstly comprehensively investigate 16 state-of-the-art predictors for ACPs in terms of their core algorithms, feature encoding schemes, performance evaluation metrics and webserver/software usability. Then, comprehensive performance assessment is conducted to evaluate the robustness and scalability of the existing predictors using a well-prepared benchmark dataset. We provide potential strategies for the model performance improvement. Moreover, we propose a novel ensemble learning framework, termed ACPredStackL, for the accurate identification of ACPs. ACPredStackL is developed based on the stacking ensemble strategy combined with SVM, Naïve Bayesian, lightGBM and KNN. Empirical benchmarking experiments against the state-of-the-art methods demonstrate that ACPredStackL achieves a comparative performance for predicting ACPs. The webserver and source code of ACPredStackL is freely available at http://bigdata.biocie.cn/ACPredStackL/ and https://github.com/liangxiaoq/ACPredStackL, respectively.
Collapse
Affiliation(s)
- Xiao Liang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
- Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling, Shaanxi 712100, China
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Centre for Data Science, Monash University, Melbourne, VIC 3800, Australia
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia
| | - Jinxiang Chen
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Junlong Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Hao Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Shuqin Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
- Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling, Shaanxi 712100, China
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Centre for Data Science, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
- Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling, Shaanxi 712100, China
| |
Collapse
|
29
|
Sadeghian I, Hemmati S. Characterization of a Stable Form of Carboxypeptidase G2 (Glucarpidase), a Potential Biobetter Variant, From Acinetobacter sp. 263903-1. Mol Biotechnol 2021; 63:1155-1168. [PMID: 34268672 DOI: 10.1007/s12033-021-00370-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 07/08/2021] [Indexed: 01/14/2023]
Abstract
Carboxypeptidase G2 (CPG2) is a bacterial enzyme widely used to detoxify methotrexate (MTX) and in enzyme/prodrug therapy for cancer treatment. However, several drawbacks, such as instability, have limited its efficiency. Herein, we have evaluated the properties of a putative CPG2 from Acinetobacter sp. 263903-1 (AcCPG2). AcCPG2 is compared with a CPG2 derived from Pseudomonas sp. strain RS-16 (PsCPG2), available as an FDA-approved medication called glucarpidase. After modeling AcCPG2 using the I-TASSER program, the refined model was validated by PROCHECK, VERIFY 3D and according to the Z score of the model. Using computational analyses, AcCPG2 displayed higher thermodynamic stability and a lower aggregation propensity than PsCPG2. AcCPG2 showed an optimum pH of 7.5 against MTX and was stable over a pH range of 5-10. AcCPG2 exhibited optimum activity at 50 °C and higher thermal stability at a temperature range of 20-70 °C compared to PsCPG2. The Km value of the purified AcCPG2 toward folate and MTX was 31.36 µM and 44.99 µM, respectively. The Vmax value of AcCPG2 for folate and MTX was 125.80 µmol/min/mg and 48.90 µmol/min/mg, respectively. Accordingly, thermostability and pH versatility makes AcCPG2 a potential biobetter variant for therapeutic applications.
Collapse
Affiliation(s)
- Issa Sadeghian
- Biotechnology Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Shiva Hemmati
- Biotechnology Research Center, Shiraz University of Medical Sciences, Shiraz, Iran.
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran.
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran.
| |
Collapse
|
30
|
Substrate-biased activity-based probes identify proteases that cleave receptor CDCP1. Nat Chem Biol 2021; 17:776-783. [PMID: 33859413 DOI: 10.1038/s41589-021-00783-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 03/04/2021] [Indexed: 02/02/2023]
Abstract
CUB domain-containing protein 1 (CDCP1) is an oncogenic orphan transmembrane receptor and a promising target for the detection and treatment of cancer. Extracellular proteolysis of CDCP1 by poorly defined mechanisms induces pro-metastatic signaling. We describe a new approach for the rapid identification of proteases responsible for key proteolytic events using a substrate-biased activity-based probe (sbABP) that incorporates a substrate cleavage motif grafted onto a peptidyl diphenyl phosphonate warhead for specific target protease capture, isolation and identification. Using a CDCP1-biased probe, we identify urokinase (uPA) as the master regulator of CDCP1 proteolysis, which acts both by directly cleaving CDCP1 and by activating CDCP1-cleaving plasmin. We show that coexpression of uPA and CDCP1 is strongly predictive of poor disease outcome across multiple cancers and demonstrate that uPA-mediated CDCP1 proteolysis promotes metastasis in disease-relevant preclinical in vivo models. These results highlight CDCP1 cleavage as a potential target to disrupt cancer and establish sbABP technology as a new approach to identify disease-relevant proteases.
Collapse
|
31
|
Bhattacharyya C, Das C, Ghosh A, Singh AK, Mukherjee S, Majumder PP, Basu A, Biswas NK. SARS-CoV-2 mutation 614G creates an elastase cleavage site enhancing its spread in high AAT-deficient regions. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2021; 90:104760. [PMID: 33556558 PMCID: PMC7863758 DOI: 10.1016/j.meegid.2021.104760] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 02/01/2021] [Accepted: 02/03/2021] [Indexed: 02/07/2023]
Abstract
SARS-CoV-2 was first reported from China. Within three months, it evolved to 10 additional subtypes. Two evolved subtypes (A2 and A2a) carry a non-synonymous Spike protein mutation (D614G). We conducted phylodynamic analysis of over 70,000 SARS-CoV-2 coronaviruses worldwide, sequenced until July2020, and found that the mutant subtype (614G) outcompeted the pre-existing type (614D), significantly faster in Europe and North-America than in East Asia. Bioinformatically and computationally, we identified a novel neutrophil elastase (ELANE) cleavage site introduced in the G-mutant, near the S1-S2 junction of the Spike protein. We hypothesised that elevation of neutrophil elastase level at the site of infection will enhance the activation of Spike protein thus facilitating host cell entry for 614G, but not the 614D, subtype. The level of neutrophil elastase in the lung is modulated by its inhibitor α1-antitrypsin (AAT). AAT prevents lung tissue damage by elastase. However, many individuals exhibit genotype-dependent deficiency of AAT. AAT deficiency eases host-cell entry of the 614G virus, by retarding inhibition of neutrophil elastase and consequently enhancing activation of the Spike protein. AAT deficiency is highly prevalent in European and North-American populations, but much less so in East Asia. Therefore, the 614G subtype is able to infect and spread more easily in populations of the former regions than in the latter region. Our analyses provide a molecular biological and evolutionary model for the higher observed virulence of the 614G subtype, in terms of causing higher morbidity in the host (higher infectivity and higher viral load), than the non-mutant 614D subtype.
Collapse
Affiliation(s)
| | - Chitrarpita Das
- National Institute of Biomedical Genomics, Kalyani 741251, India
| | - Arnab Ghosh
- National Institute of Biomedical Genomics, Kalyani 741251, India
| | - Animesh K. Singh
- National Institute of Biomedical Genomics, Kalyani 741251, India
| | - Souvik Mukherjee
- National Institute of Biomedical Genomics, Kalyani 741251, India
| | - Partha P. Majumder
- National Institute of Biomedical Genomics, Kalyani 741251, India,Indian Statistical Institute, Kolkata 700108, India
| | - Analabha Basu
- National Institute of Biomedical Genomics, Kalyani 741251, India
| | - Nidhan K. Biswas
- National Institute of Biomedical Genomics, Kalyani 741251, India,Corresponding author at: National Institute of Biomedical Genomics, P.O.: N.S.S., Kalyani 741251, West Bengal, India
| |
Collapse
|
32
|
Li Z, Hu L, Tang Z, Zhao C. Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning. Front Genet 2021; 12:658078. [PMID: 33868387 PMCID: PMC8044780 DOI: 10.3389/fgene.2021.658078] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 03/08/2021] [Indexed: 11/13/2022] Open
Abstract
Understanding the substrate specificity of HIV-1 protease plays an essential role in the prevention of HIV infection. A variety of computational models have thus been developed to predict substrate sites that are cleaved by HIV-1 protease, but most of them normally follow a supervised learning scheme to build classifiers by considering experimentally verified cleavable sites as positive samples and unknown sites as negative samples. However, certain noisy can be contained in the negative set, as false negative samples are possibly existed. Hence, the performance of the classifiers is not as accurate as they could be due to the biased prediction results. In this work, unknown substrate sites are regarded as unlabeled samples instead of negative ones. We propose a novel positive-unlabeled learning algorithm, namely PU-HIV, for an effective prediction of HIV-1 protease cleavage sites. Features used by PU-HIV are encoded from different perspectives of substrate sequences, including amino acid identities, coevolutionary patterns and chemical properties. By adjusting the weights of errors generated by positive and unlabeled samples, a biased support vector machine classifier can be built to complete the prediction task. In comparison with state-of-the-art prediction models, benchmarking experiments using cross-validation and independent tests demonstrated the superior performance of PU-HIV in terms of AUC, PR-AUC, and F-measure. Thus, with PU-HIV, it is possible to identify previously unknown, but physiologically existed substrate sites that are able to be cleaved by HIV-1 protease, thus providing valuable insights into designing novel HIV-1 protease inhibitors for HIV treatment.
Collapse
Affiliation(s)
- Zhenfeng Li
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Zehai Tang
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Cheng Zhao
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| |
Collapse
|
33
|
Ozols M, Eckersley A, Platt CI, Stewart-McGuinness C, Hibbert SA, Revote J, Li F, Griffiths CEM, Watson REB, Song J, Bell M, Sherratt MJ. Predicting Proteolysis in Complex Proteomes Using Deep Learning. Int J Mol Sci 2021; 22:3071. [PMID: 33803033 PMCID: PMC8002881 DOI: 10.3390/ijms22063071] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 03/10/2021] [Accepted: 03/12/2021] [Indexed: 12/27/2022] Open
Abstract
Both protease- and reactive oxygen species (ROS)-mediated proteolysis are thought to be key effectors of tissue remodeling. We have previously shown that comparison of amino acid composition can predict the differential susceptibilities of proteins to photo-oxidation. However, predicting protein susceptibility to endogenous proteases remains challenging. Here, we aim to develop bioinformatics tools to (i) predict cleavage site locations (and hence putative protein susceptibilities) and (ii) compare the predicted vulnerabilities of skin proteins to protease- and ROS-mediated proteolysis. The first goal of this study was to experimentally evaluate the ability of existing protease cleavage site prediction models (PROSPER and DeepCleave) to identify experimentally determined MMP9 cleavage sites in two purified proteins and in a complex human dermal fibroblast-derived extracellular matrix (ECM) proteome. We subsequently developed deep bidirectional recurrent neural network (BRNN) models to predict cleavage sites for 14 tissue proteases. The predictions of the new models were tested against experimental datasets and combined with amino acid composition analysis (to predict ultraviolet radiation (UVR)/ROS susceptibility) in a new web app: the Manchester proteome susceptibility calculator (MPSC). The BRNN models performed better in predicting cleavage sites in native dermal ECM proteins than existing models (DeepCleave and PROSPER), and application of MPSC to the skin proteome suggests that: compared with the elastic fiber network, fibrillar collagens may be susceptible primarily to protease-mediated proteolysis. We also identify additional putative targets of oxidative damage (dermatopontin, fibulins and defensins) and protease action (laminins and nidogen). MPSC has the potential to identify potential targets of proteolysis in disparate tissues and disease states.
Collapse
Affiliation(s)
- Matiss Ozols
- Division of Cell Matrix Biology & Regenerative Medicine, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, Manchester M13 9PT, UK; (A.E.); (C.I.P.); (C.S.-M.); (S.A.H.)
| | - Alexander Eckersley
- Division of Cell Matrix Biology & Regenerative Medicine, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, Manchester M13 9PT, UK; (A.E.); (C.I.P.); (C.S.-M.); (S.A.H.)
| | - Christopher I. Platt
- Division of Cell Matrix Biology & Regenerative Medicine, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, Manchester M13 9PT, UK; (A.E.); (C.I.P.); (C.S.-M.); (S.A.H.)
| | - Callum Stewart-McGuinness
- Division of Cell Matrix Biology & Regenerative Medicine, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, Manchester M13 9PT, UK; (A.E.); (C.I.P.); (C.S.-M.); (S.A.H.)
| | - Sarah A. Hibbert
- Division of Cell Matrix Biology & Regenerative Medicine, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, Manchester M13 9PT, UK; (A.E.); (C.I.P.); (C.S.-M.); (S.A.H.)
| | - Jerico Revote
- Monash Bioinformatics Platform, Monash University, Melbourne, VIC 3800, Australia;
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia;
| | - Fuyi Li
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC 3800, Australia;
| | - Christopher E. M. Griffiths
- Centre for Dermatology Research, Faculty of Biology, Medicine and Health, and Salford Royal NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester M13 9PT, UK; (C.E.M.G.); (R.E.B.W.)
- NIHR Manchester Biomedical Research Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester M13 9WL, UK
| | - Rachel E. B. Watson
- Centre for Dermatology Research, Faculty of Biology, Medicine and Health, and Salford Royal NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester M13 9PT, UK; (C.E.M.G.); (R.E.B.W.)
- NIHR Manchester Biomedical Research Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester M13 9WL, UK
| | - Jiangning Song
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia;
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Mike Bell
- Research and Development, Walgreens Boots Alliance, Thane Road, Nottingham NG90 1BS, UK;
| | - Michael J. Sherratt
- Division of Cell Matrix Biology & Regenerative Medicine, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, Manchester M13 9PT, UK; (A.E.); (C.I.P.); (C.S.-M.); (S.A.H.)
| |
Collapse
|
34
|
Mei S, Li F, Xiang D, Ayala R, Faridi P, Webb GI, Illing PT, Rossjohn J, Akutsu T, Croft NP, Purcell AW, Song J. Anthem: a user customised tool for fast and accurate prediction of binding between peptides and HLA class I molecules. Brief Bioinform 2021; 22:6102669. [PMID: 33454737 DOI: 10.1093/bib/bbaa415] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/29/2020] [Accepted: 12/16/2020] [Indexed: 12/17/2022] Open
Abstract
Neopeptide-based immunotherapy has been recognised as a promising approach for the treatment of cancers. For neopeptides to be recognised by CD8+ T cells and induce an immune response, their binding to human leukocyte antigen class I (HLA-I) molecules is a necessary first step. Most epitope prediction tools thus rely on the prediction of such binding. With the use of mass spectrometry, the scale of naturally presented HLA ligands that could be used to develop such predictors has been expanded. However, there are rarely efforts that focus on the integration of these experimental data with computational algorithms to efficiently develop up-to-date predictors. Here, we present Anthem for accurate HLA-I binding prediction. In particular, we have developed a user-friendly framework to support the development of customisable HLA-I binding prediction models to meet challenges associated with the rapidly increasing availability of large amounts of immunopeptidomic data. Our extensive evaluation, using both independent and experimental datasets shows that Anthem achieves an overall similar or higher area under curve value compared with other contemporary tools. It is anticipated that Anthem will provide a unique opportunity for the non-expert user to analyse and interpret their own in-house or publicly deposited datasets.
Collapse
Affiliation(s)
- Shutao Mei
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Fuyi Li
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Australia
| | - Dongxu Xiang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Rochelle Ayala
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Pouya Faridi
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | | | - Patricia T Illing
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Jamie Rossjohn
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan
| | - Nathan P Croft
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Anthony W Purcell
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Biochemistry and Molecular Biology, Monash University, Australia
| |
Collapse
|
35
|
Campbell KL, Haspel N, Gath C, Kurniatash N, Nouduri Akkiraju I, Stuffers N, Vadher U. Protein hormone fragmentation in intercellular signaling: hormones as nested information systems. Biol Reprod 2021; 104:887-901. [PMID: 33403392 DOI: 10.1093/biolre/ioaa234] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 12/21/2020] [Accepted: 01/04/2021] [Indexed: 11/14/2022] Open
Abstract
This study explores the hypothesis that protein hormones are nested information systems in which initial products of gene transcription, and their subsequent protein fragments, before and after secretion and initial target cell action, play additional physiological regulatory roles. The study produced four tools and key results: (1) a problem approach that proceeds, with examples and suggestions for in vivo organismal functional tests for peptide-protein interactions, from proteolytic breakdown prediction to models of hormone fragment modulation of protein-protein binding motifs in unrelated proteins; (2) a catalog of 461 known soluble human protein hormones and their predicted fragmentation patterns; (3) an analysis of the predicted proteolytic patterns of the canonical protein hormone transcripts demonstrating near-universal persistence of 9 ± 7 peptides of 8 ± 8 amino acids even after cleavage with 24 proteases from four protease classes; and (4) a coincidence analysis of the predicted proteolysis locations and the 1939 exon junctions within the transcripts that shows an excess (P < 0.001) of predicted proteolysis within 10 residues, especially at the exonal junction (P < 0.01). It appears all protein hormone transcripts generate multiple fragments the size of peptide hormones or protein-protein binding domains that may alter intracellular or extracellular functions by acting as modulators of metabolic enzymes, transduction factors, protein binding proteins, or hormone receptors. High proteolytic frequency at exonal junctions suggests proteolysis has evolved, as a complement to gene exon fusion, to extract structures or functions within single exons or protein segments to simplify the genome by discarding archaic one-exon genes.
Collapse
Affiliation(s)
- Kenneth L Campbell
- Department of Biology, University of Massachusetts Boston, Boston, MA, USA
| | - Nurit Haspel
- Department of Computer Sciences, University of Massachusetts Boston, Boston, MA, USA
| | - Cassandra Gath
- Department of Biology, University of Massachusetts Boston, Boston, MA, USA
| | - Nuzulul Kurniatash
- Department of Computer Sciences, University of Massachusetts Boston, Boston, MA, USA
| | | | - Naomi Stuffers
- Department of Biology, University of Massachusetts Boston, Boston, MA, USA
| | - Uma Vadher
- Department of Biology, University of Massachusetts Boston, Boston, MA, USA
| |
Collapse
|
36
|
Ochoa R, Magnitov M, Laskowski RA, Cossio P, Thornton JM. An automated protocol for modelling peptide substrates to proteases. BMC Bioinformatics 2020; 21:586. [PMID: 33375946 PMCID: PMC7771086 DOI: 10.1186/s12859-020-03931-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 12/09/2020] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND Proteases are key drivers in many biological processes, in part due to their specificity towards their substrates. However, depending on the family and molecular function, they can also display substrate promiscuity which can also be essential. Databases compiling specificity matrices derived from experimental assays have provided valuable insights into protease substrate recognition. Despite this, there are still gaps in our knowledge of the structural determinants. Here, we compile a set of protease crystal structures with bound peptide-like ligands to create a protocol for modelling substrates bound to protease structures, and for studying observables associated to the binding recognition. RESULTS As an application, we modelled a subset of protease-peptide complexes for which experimental cleavage data are available to compare with informational entropies obtained from protease-specificity matrices. The modelled complexes were subjected to conformational sampling using the Backrub method in Rosetta, and multiple observables from the simulations were calculated and compared per peptide position. We found that some of the calculated structural observables, such as the relative accessible surface area and the interaction energy, can help characterize a protease's substrate recognition, giving insights for the potential prediction of novel substrates by combining additional approaches. CONCLUSION Overall, our approach provides a repository of protease structures with annotated data, and an open source computational protocol to reproduce the modelling and dynamic analysis of the protease-peptide complexes.
Collapse
Affiliation(s)
- Rodrigo Ochoa
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia, 050010, Medellín, Colombia.
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Mikhail Magnitov
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- Department of Biological and Medical Physics, Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Russia, 141701
| | - Roman A Laskowski
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Pilar Cossio
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia, 050010, Medellín, Colombia
- Department of Theoretical Biophysics, Max Planck Institute of Biophysics, 60438, Frankfurt am Main, Germany
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
37
|
Hu L, Hu P, Luo X, Yuan X, You ZH. Incorporating the Coevolving Information of Substrates in Predicting HIV-1 Protease Cleavage Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2017-2028. [PMID: 31056514 DOI: 10.1109/tcbb.2019.2914208] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Human immunodeficiency virus 1 (HIV-1) protease (PR) plays a crucial role in the maturation of the virus. The study of substrate specificity of HIV-1 PR as a new endeavor strives to increase our ability to understand how HIV-1 PR recognizes its various cleavage sites. To predict HIV-1 PR cleavage sites, most of the existing approaches have been developed solely based on the homogeneity of substrate sequence information with supervised classification techniques. Although efficient, these approaches are found to be restricted to the ability of explaining their results and probably provide few insights into the mechanisms by which HIV-1 PR cleaves the substrates in a site-specific manner. In this work, a coevolutionary pattern-based prediction model for HIV-1 PR cleavage sites, namely EvoCleave, is proposed by integrating the coevolving information obtained from substrate sequences with a linear SVM classifier. The experiment results showed that EvoCleave yielded a very promising performance in terms of ROC analysis and f-measure. We also prospectively assessed the biological significance of coevolutionary patterns by applying them to study three fundamental issues of HIV-1 PR cleavage site. The analysis results demonstrated that the coevolutionary patterns offered valuable insights into the understanding of substrate specificity of HIV-1 PR.
Collapse
|
38
|
Li K, Zhang S, Yan D, Bin Y, Xia J. Prediction of hot spots in protein-DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting. BMC Bioinformatics 2020; 21:381. [PMID: 32938395 PMCID: PMC7495874 DOI: 10.1186/s12859-020-03683-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of hot spots in protein-DNA interfaces provides crucial information for the research on protein-DNA interaction and drug design. As experimental methods for determining hot spots are time-consuming, labor-intensive and expensive, there is a need for developing reliable computational method to predict hot spots on a large scale. RESULTS Here, we proposed a new method named sxPDH based on supervised isometric feature mapping (S-ISOMAP) and extreme gradient boosting (XGBoost) to predict hot spots in protein-DNA complexes. We obtained 114 features from a combination of the protein sequence, structure, network and solvent accessible information, and systematically assessed various feature selection methods and feature dimensionality reduction methods based on manifold learning. The results show that the S-ISOMAP method is superior to other feature selection or manifold learning methods. XGBoost was then used to develop hot spots prediction model sxPDH based on the three dimensionality-reduced features obtained from S-ISOMAP. CONCLUSION Our method sxPDH boosts prediction performance using S-ISOMAP and XGBoost. The AUC of the model is 0.773, and the F1 score is 0.713. Experimental results on benchmark dataset indicate that sxPDH can achieve generally better performance in predicting hot spots compared to the state-of-the-art methods.
Collapse
Affiliation(s)
- Ke Li
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China.,Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Sijia Zhang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Di Yan
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.,School of Life Sciences, Anhui University, Hefei, 230601, Anhui, China
| | - Yannan Bin
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Junfeng Xia
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.
| |
Collapse
|
39
|
Xu ZC, Feng PM, Yang H, Qiu WR, Chen W, Lin H. iRNAD: a computational tool for identifying D modification sites in RNA sequence. Bioinformatics 2020; 35:4922-4929. [PMID: 31077296 DOI: 10.1093/bioinformatics/btz358] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 03/01/2019] [Accepted: 04/27/2019] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION Dihydrouridine (D) is a common RNA post-transcriptional modification found in eukaryotes, bacteria and a few archaea. The modification can promote the conformational flexibility of individual nucleotide bases. And its levels are increased in cancerous tissues. Therefore, it is necessary to detect D in RNA for further understanding its functional roles. Since wet-experimental techniques for the aim are time-consuming and laborious, it is urgent to develop computational models to identify D modification sites in RNA. RESULTS We constructed a predictor, called iRNAD, for identifying D modification sites in RNA sequence. In this predictor, the RNA samples derived from five species were encoded by nucleotide chemical property and nucleotide density. Support vector machine was utilized to perform the classification. The final model could produce the overall accuracy of 96.18% with the area under the receiver operating characteristic curve of 0.9839 in jackknife cross-validation test. Furthermore, we performed a series of validations from several aspects and demonstrated the robustness and reliability of the proposed model. AVAILABILITY AND IMPLEMENTATION A user-friendly web-server called iRNAD can be freely accessible at http://lin-group.cn/server/iRNAD, which will provide convenience and guide to users for further studying D modification.
Collapse
Affiliation(s)
- Zhao-Chun Xu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Peng-Mian Feng
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wang-Ren Qiu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
40
|
Li H, Du H, Wang X, Gao P, Liu Y, Lin W. Remarks on Computational Method for Identifying Acid and Alkaline Enzymes. Curr Pharm Des 2020; 26:3105-3114. [PMID: 32552636 DOI: 10.2174/1381612826666200617170826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Accepted: 05/07/2020] [Indexed: 11/22/2022]
Abstract
The catalytic efficiency of the enzyme is thousands of times higher than that of ordinary catalysts. Thus, they are widely used in industrial and medical fields. However, enzymes with protein structure can be destroyed and inactivated in high temperature, over acid or over alkali environment. It is well known that most of enzymes work well in an environment with pH of 6-8, while some special enzymes remain active only in an alkaline environment with pH > 8 or an acidic environment with pH < 6. Therefore, the identification of acidic and alkaline enzymes has become a key task for industrial production. Because of the wide varieties of enzymes, it is hard work to determine the acidity and alkalinity of the enzyme by experimental methods, and even this task cannot be achieved. Converting protein sequences into digital features and building computational models can efficiently and accurately identify the acidity and alkalinity of enzymes. This review summarized the progress of the digital features to express proteins and computational methods to identify acidic and alkaline enzymes. We hope that this paper will provide more convenience, ideas, and guides for computationally classifying acid and alkaline enzymes.
Collapse
Affiliation(s)
- Hongfei Li
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Haoze Du
- Department of Computer Science, Wake Forest University, Winston-Salem, NC, 27109, United States
| | - Xianfang Wang
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Peng Gao
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Yifeng Liu
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Weizhong Lin
- Department of Computer Science, University of Missouri, Columbia, MO, 65211, United States
| |
Collapse
|
41
|
Chou KC. An Insightful 10-year Recollection Since the Emergence of the 5-steps Rule. Curr Pharm Des 2020; 25:4223-4234. [PMID: 31782354 DOI: 10.2174/1381612825666191129164042] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/25/2019] [Indexed: 11/22/2022]
Abstract
OBJECTIVE One of the most challenging and also the most difficult problems is how to formulate a biological sequence with a vector but considerably keep its sequence order information. METHODS To address such a problem, the approach of Pseudo Amino Acid Components or PseAAC has been developed. RESULTS AND CONCLUSION It has become increasingly clear via the 10-year recollection that the aforementioned proposal has been indeed very powerful.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, Massachusetts 02478, United States.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
42
|
Tan JX, Lv H, Wang F, Dao FY, Chen W, Ding H. A Survey for Predicting Enzyme Family Classes Using Machine Learning Methods. Curr Drug Targets 2020; 20:540-550. [PMID: 30277150 DOI: 10.2174/1389450119666181002143355] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Revised: 08/17/2018] [Accepted: 09/04/2018] [Indexed: 12/13/2022]
Abstract
Enzymes are proteins that act as biological catalysts to speed up cellular biochemical processes. According to their main Enzyme Commission (EC) numbers, enzymes are divided into six categories: EC-1: oxidoreductase; EC-2: transferase; EC-3: hydrolase; EC-4: lyase; EC-5: isomerase and EC-6: synthetase. Different enzymes have different biological functions and acting objects. Therefore, knowing which family an enzyme belongs to can help infer its catalytic mechanism and provide information about the relevant biological function. With the large amount of protein sequences influxing into databanks in the post-genomics age, the annotation of the family for an enzyme is very important. Since the experimental methods are cost ineffective, bioinformatics tool will be a great help for accurately classifying the family of the enzymes. In this review, we summarized the application of machine learning methods in the prediction of enzyme family from different aspects. We hope that this review will provide insights and inspirations for the researches on enzyme family classification.
Collapse
Affiliation(s)
- Jiu-Xin Tan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lv
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fang Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Chen
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.,Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063000, China.,Gordon Life Science Institute, Boston, MA 02478, United States
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
43
|
Zhang J, Kurgan L. SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences. Bioinformatics 2020; 35:i343-i353. [PMID: 31510679 PMCID: PMC6612887 DOI: 10.1093/bioinformatics/btz324] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Motivation Accurate predictions of protein-binding residues (PBRs) enhances understanding of molecular-level rules governing protein–protein interactions, helps protein–protein docking and facilitates annotation of protein functions. Recent studies show that current sequence-based predictors of PBRs severely cross-predict residues that interact with other types of protein partners (e.g. RNA and DNA) as PBRs. Moreover, these methods are relatively slow, prohibiting genome-scale use. Results We propose a novel, accurate and fast sequence-based predictor of PBRs that minimizes the cross-predictions. Our SCRIBER (SeleCtive pRoteIn-Binding rEsidue pRedictor) method takes advantage of three innovations: comprehensive dataset that covers multiple types of binding residues, novel types of inputs that are relevant to the prediction of PBRs, and an architecture that is tailored to reduce the cross-predictions. The dataset includes complete protein chains and offers improved coverage of binding annotations that are transferred from multiple protein–protein complexes. We utilize innovative two-layer architecture where the first layer generates a prediction of protein-binding, RNA-binding, DNA-binding and small ligand-binding residues. The second layer re-predicts PBRs by reducing overlap between PBRs and the other types of binding residues produced in the first layer. Empirical tests on an independent test dataset reveal that SCRIBER significantly outperforms current predictors and that all three innovations contribute to its high predictive performance. SCRIBER reduces cross-predictions by between 41% and 69% and our conservative estimates show that it is at least 3 times faster. We provide putative PBRs produced by SCRIBER for the entire human proteome and use these results to hypothesize that about 14% of currently known human protein domains bind proteins. Availability and implementation SCRIBER webserver is available at http://biomine.cs.vcu.edu/servers/SCRIBER/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China.,Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
44
|
|
45
|
Chen H, Li F, Wang L, Jin Y, Chi CH, Kurgan L, Song J, Shen J. Systematic evaluation of machine learning methods for identifying human-pathogen protein-protein interactions. Brief Bioinform 2020; 22:5847611. [PMID: 32459334 DOI: 10.1093/bib/bbaa068] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 03/31/2020] [Accepted: 04/01/2020] [Indexed: 12/11/2022] Open
Abstract
In recent years, high-throughput experimental techniques have significantly enhanced the accuracy and coverage of protein-protein interaction identification, including human-pathogen protein-protein interactions (HP-PPIs). Despite this progress, experimental methods are, in general, expensive in terms of both time and labour costs, especially considering that there are enormous amounts of potential protein-interacting partners. Developing computational methods to predict interactions between human and bacteria pathogen has thus become critical and meaningful, in both facilitating the detection of interactions and mining incomplete interaction maps. In this paper, we present a systematic evaluation of machine learning-based computational methods for human-bacterium protein-protein interactions (HB-PPIs). We first reviewed a vast number of publicly available databases of HP-PPIs and then critically evaluate the availability of these databases. Benefitting from its well-structured nature, we subsequently preprocess the data and identified six bacterium pathogens that could be used to study bacterium subjects in which a human was the host. Additionally, we thoroughly reviewed the literature on 'host-pathogen interactions' whereby existing models were summarized that we used to jointly study the impact of different feature representation algorithms and evaluate the performance of existing machine learning computational models. Owing to the abundance of sequence information and the limited scale of other protein-related information, we adopted the primary protocol from the literature and dedicated our analysis to a comprehensive assessment of sequence information and machine learning models. A systematic evaluation of machine learning models and a wide range of feature representation algorithms based on sequence information are presented as a comparison survey towards the prediction performance evaluation of HB-PPIs.
Collapse
|
46
|
Zhu YH, Hu J, Ge F, Li F, Song J, Zhang Y, Yu DJ. Accurate multistage prediction of protein crystallization propensity using deep-cascade forest with sequence-based features. Brief Bioinform 2020; 22:5839971. [PMID: 32436937 DOI: 10.1093/bib/bbaa076] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 04/09/2020] [Accepted: 04/13/2020] [Indexed: 11/13/2022] Open
Abstract
X-ray crystallography is the major approach for determining atomic-level protein structures. Because not all proteins can be easily crystallized, accurate prediction of protein crystallization propensity provides critical help in guiding experimental design and improving the success rate of X-ray crystallography experiments. This study has developed a new machine-learning-based pipeline that uses a newly developed deep-cascade forest (DCF) model with multiple types of sequence-based features to predict protein crystallization propensity. Based on the developed pipeline, two new protein crystallization propensity predictors, denoted as DCFCrystal and MDCFCrystal, have been implemented. DCFCrystal is a multistage predictor that can estimate the success propensities of the three individual steps (production of protein material, purification and production of crystals) in the protein crystallization process. MDCFCrystal is a single-stage predictor that aims to estimate the probability that a protein will pass through the entire crystallization process. Moreover, DCFCrystal is designed for general proteins, whereas MDCFCrystal is specially designed for membrane proteins, which are notoriously difficult to crystalize. DCFCrystal and MDCFCrystal were separately tested on two benchmark datasets consisting of 12 289 and 950 proteins, respectively, with known crystallization results from various experimental records. The experimental results demonstrated that DCFCrystal and MDCFCrystal increased the value of Matthew's correlation coefficient by 199.7% and 77.8%, respectively, compared to the best of other state-of-the-art protein crystallization propensity predictors. Detailed analyses show that the major advantages of DCFCrystal and MDCFCrystal lie in the efficiency of the DCF model and the sensitivity of the sequence-based features used, especially the newly designed pseudo-predicted hybrid solvent accessibility (PsePHSA) feature, which improves crystallization recognition by incorporating sequence-order information with solvent accessibility of residues. Meanwhile, the new crystal-dataset constructions help to train the models with more comprehensive crystallization knowledge.
Collapse
|
47
|
Feng CQ, Zhang ZY, Zhu XJ, Lin Y, Chen W, Tang H, Lin H. iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators. Bioinformatics 2020; 35:1469-1477. [PMID: 30247625 DOI: 10.1093/bioinformatics/bty827] [Citation(s) in RCA: 142] [Impact Index Per Article: 35.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 09/13/2018] [Accepted: 09/20/2018] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Transcription termination is an important regulatory step of gene expression. If there is no terminator in gene, transcription could not stop, which will result in abnormal gene expression. Detecting such terminators can determine the operon structure in bacterial organisms and improve genome annotation. Thus, accurate identification of transcriptional terminators is essential and extremely important in the research of transcription regulations. RESULTS In this study, we developed a new predictor called 'iTerm-PseKNC' based on support vector machine to identify transcription terminators. The binomial distribution approach was used to pick out the optimal feature subset derived from pseudo k-tuple nucleotide composition (PseKNC). The 5-fold cross-validation test results showed that our proposed method achieved an accuracy of 95%. To further evaluate the generalization ability of 'iTerm-PseKNC', the model was examined on independent datasets which are experimentally confirmed Rho-independent terminators in Escherichia coli and Bacillus subtilis genomes. As a result, all the terminators in E. coli and 87.5% of the terminators in B. subtilis were correctly identified, suggesting that the proposed model could become a powerful tool for bacterial terminator recognition. AVAILABILITY AND IMPLEMENTATION For the convenience of most of wet-experimental researchers, the web-server for 'iTerm-PseKNC' was established at http://lin-group.cn/server/iTerm-PseKNC/, by which users can easily obtain their desired result without the need to go through the detailed mathematical equations involved.
Collapse
Affiliation(s)
- Chao-Qin Feng
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhao-Yue Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiao-Juan Zhu
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yan Lin
- Key Laboratory for Animal Disease Resistance Nutrition of the Ministry of Education, Animal Nutrition Institute, Sichuan Agricultural University, Chengdu, China
| | - Wei Chen
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan, China
| | - Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
48
|
iterb-PPse: Identification of transcriptional terminators in bacterial by incorporating nucleotide properties into PseKNC. PLoS One 2020; 15:e0228479. [PMID: 32413030 PMCID: PMC7228126 DOI: 10.1371/journal.pone.0228479] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 05/01/2020] [Indexed: 11/19/2022] Open
Abstract
Terminator is a DNA sequence that gives the RNA polymerase the transcriptional termination signal. Identifying terminators correctly can optimize the genome annotation, more importantly, it has considerable application value in disease diagnosis and therapies. However, accurate prediction methods are deficient and in urgent need. Therefore, we proposed a prediction method "iterb-PPse" for terminators by incorporating 47 nucleotide properties into PseKNC-Ⅰ and PseKNC-Ⅱ and utilizing Extreme Gradient Boosting to predict terminators based on Escherichia coli and Bacillus subtilis. Combing with the preceding methods, we employed three new feature extraction methods K-pwm, Base-content, Nucleotidepro to formulate raw samples. The two-step method was applied to select features. When identifying terminators based on optimized features, we compared five single models as well as 16 ensemble models. As a result, the accuracy of our method on benchmark dataset achieved 99.88%, higher than the existing state-of-the-art predictor iTerm-PseKNC in 100 times five-fold cross-validation test. Its prediction accuracy for two independent datasets reached 94.24% and 99.45% respectively. For the convenience of users, we developed a software on the basis of "iterb-PPse" with the same name. The open software and source code of "iterb-PPse" are available at https://github.com/Sarahyouzi/iterb-PPse.
Collapse
|
49
|
Hu G, Wu Z, Oldfield CJ, Wang C, Kurgan L. Quality assessment for the putative intrinsic disorder in proteins. Bioinformatics 2020; 35:1692-1700. [PMID: 30329008 DOI: 10.1093/bioinformatics/bty881] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 09/19/2018] [Accepted: 10/15/2018] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION While putative intrinsic disorder is widely used, none of the predictors provides quality assessment (QA) scores. QA scores estimate the likelihood that predictions are correct at a residue level and have been applied in other bioinformatics areas. We recently reported that QA scores derived from putative disorder propensities perform relatively poorly for native disordered residues. Here we design and validate a general approach to construct QA predictors for disorder predictions. RESULTS The QUARTER (QUality Assessment for pRotein inTrinsic disordEr pRedictions) toolbox of methods accommodates a diverse set of ten disorder predictors. It builds upon several innovative design elements including use and scaling of selected physicochemical properties of the input sequence, post-processing of disorder propensity scores, and a feature selection that optimizes the predictive models to a specific disorder predictor. We empirically establish that each one of these elements contributes to the overall predictive performance of our tool and that QUARTER's outputs significantly outperform QA scores derived from the outputs generated the disorder predictors. The best performing QA scores for a single disorder predictor identify 13% of residues that are predicted with 98% precision. QA scores computed by combining results of the ten disorder predictors cover 40% of residues with 95% precision. Case studies are used to show how to interpret the QA scores. QA scores based on the high precision combined predictions are applied to analyze disorder in the human proteome. AVAILABILITY AND IMPLEMENTATION http://biomine.cs.vcu.edu/servers/QUARTER/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gang Hu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People's Republic of China
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People's Republic of China
| | | | - Chen Wang
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
50
|
Li P, Zhang H, Zhao X, Jia C, Li F, Song J. Pippin: A random forest-based method for identifying presynaptic and postsynaptic neurotoxins. J Bioinform Comput Biol 2020; 18:2050008. [PMID: 32372714 DOI: 10.1142/s0219720020500080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Presynaptic and postsynaptic neurotoxins are two types of neurotoxins from venomous animals and functionally important molecules in the neurosciences; however, their experimental characterization is difficult, time-consuming, and costly. Therefore, bioinformatics tools that can identify presynaptic and postsynaptic neurotoxins would be very useful for understanding their functions and mechanisms. In this study, we propose Pippin, a novel machine learning-based method that allows users to rapidly and accurately identify these two types of neurotoxins. Pippin was developed using the random forest (RF) algorithm and evaluated based on an up-to-date dataset. A variety of sequence and motif features were combined, and a two-step feature-selection algorithm was employed to characterize the optimal feature subset for presynaptic and postsynaptic neurotoxin prediction. Extensive benchmark tests illustrate that Pippin significantly improved predictive performance as compared with six other commonly used machine-learning algorithms, including the naïve Bayes classifier, Multinomial Naïve Bayes classifier (MNBC), AdaBoost, Bagging, K-nearest neighbors, and XGBoost. Additionally, we developed an online webserver for Pippin to facilitate public use. To the best of our knowledge, this is the first webserver for presynaptic and postsynaptic neurotoxin prediction.
Collapse
Affiliation(s)
- Pengyu Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - He Zhang
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Xuyang Zhao
- College of Information Engineering, Northwest A&F University, Yangling, 712100, P. R. China
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, P. R. China
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|