1
|
Das S, Das A, Das N, Nath T, Langthasa M, Pandey P, Kumar V, Choure K, Kumar S, Pandey P. Harnessing the potential of microbial keratinases for bioconversion of keratin waste. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024:10.1007/s11356-024-34233-6. [PMID: 38985428 DOI: 10.1007/s11356-024-34233-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Accepted: 06/30/2024] [Indexed: 07/11/2024]
Abstract
The increasing global consumption of poultry meat has led to the generation of a vast quantity of feather keratin waste daily, posing significant environmental challenges due to improper disposal methods. A growing focus is on utilizing keratinous polymeric waste, amounting to millions of tons annually. Keratins are biochemically rigid, fibrous, recalcitrant, physiologically insoluble, and resistant to most common proteolytic enzymes. Microbial biodegradation of feather keratin provides a viable solution for augmenting feather waste's nutritional value while mitigating environmental contamination. This approach offers an alternative to traditional physical and chemical treatments. This review focuses on the recent findings and work trends in the field of keratin degradation by microorganisms (bacteria, actinomycetes, and fungi) via keratinolytic and proteolytic enzymes, as well as the limitations and challenges encountered due to the low thermal stability of keratinase, and degradation in the complex environmental conditions. Therefore, recent biotechnological interventions such as designing novel keratinase with high keratinolytic activity, thermostability, and binding affinity have been elaborated here. Enhancing protein structural rigidity through critical engineering approaches, such as rational design, has shown promise in improving the thermal stability of proteins. Concurrently, metagenomic annotation offers insights into the genetic foundations of keratin breakdown, primarily predicting metabolic potential and identifying probable keratinases. This may extend the understanding of microbial keratinolytic mechanisms in a complex community, recognizing the significance of synergistic interactions, which could be further utilized in optimizing industrial keratin degradation processes.
Collapse
Affiliation(s)
- Sandeep Das
- Department of Microbiology, Assam University, Silchar, 788011, Assam, India
| | - Ankita Das
- Department of Microbiology, Assam University, Silchar, 788011, Assam, India
| | - Nandita Das
- Department of Microbiology, Assam University, Silchar, 788011, Assam, India
| | - Tamanna Nath
- Department of Microbiology, Assam University, Silchar, 788011, Assam, India
| | | | - Prisha Pandey
- Department of Biotechnology, Royal Global University, Guwahati, 781035, Assam, India
| | - Vijay Kumar
- Himalayan School of Biosciences, Swami Rama Himalayan University, Dehradun, India, 248016
| | - Kamlesh Choure
- Department of Biotechnology, AKS University, Satna, 485001, Madhya Pradesh, India
| | - Sanjeev Kumar
- Department of Life Sciences and Bioinformatics, Assam University, Silchar, 788011, Assam, India
| | - Piyush Pandey
- Department of Microbiology, Assam University, Silchar, 788011, Assam, India.
| |
Collapse
|
2
|
Murter BM, Robinson SC, Banerjee H, Lau L, Uche U, Szymczak-Workman AL, Kane LP. Downregulation of PIK3IP1/TrIP on T cells is controlled by TCR signal strength, PKC, and metalloprotease-mediated cleavage. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.29.591680. [PMID: 38746242 PMCID: PMC11092459 DOI: 10.1101/2024.04.29.591680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
The protein known as PI3K-interacting protein (PIK3IP1), or transmembrane inhibitor of PI3K (TrIP), is highly expressed by T cells and can modulate PI3K activity in these cells. Several studies have also revealed that TrIP is rapidly downregulated following T cell activation. However, it is unclear as to how this downregulation is controlled. Using a novel monoclonal antibody that robustly stains cell-surface TrIP, we demonstrate that TrIP is lost from the surface of activated T cells in a manner dependent on the strength of signaling through the T cell receptor (TCR) and specific downstream signaling pathways. In addition, TrIP expression returns after 24 hours, suggesting that it may play a role in resetting TCR signaling at later time points. Finally, by expressing truncated forms of TrIP in cells, we identify the region in the extracellular stalk domain of TrIP that is targeted for proteolytic cleavage by metalloprotease ADAM17.
Collapse
|
3
|
Shen L, Sun X, Chen Z, Guo Y, Shen Z, Song Y, Xin W, Ding H, Ma X, Xu W, Zhou W, Che J, Tan L, Chen L, Chen S, Dong X, Fang L, Zhu F. ADCdb: the database of antibody-drug conjugates. Nucleic Acids Res 2024; 52:D1097-D1109. [PMID: 37831118 PMCID: PMC10768060 DOI: 10.1093/nar/gkad831] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/07/2023] [Accepted: 09/28/2023] [Indexed: 10/14/2023] Open
Abstract
Antibody-drug conjugates (ADCs) are a class of innovative biopharmaceutical drugs, which, via their antibody (mAb) component, deliver and release their potent warhead (a.k.a. payload) at the disease site, thereby simultaneously improving the efficacy of delivered therapy and reducing its off-target toxicity. To design ADCs of promising efficacy, it is crucial to have the critical data of pharma-information and biological activities for each ADC. However, no such database has been constructed yet. In this study, a database named ADCdb focusing on providing ADC information (especially its pharma-information and biological activities) from multiple perspectives was thus developed. Particularly, a total of 6572 ADCs (359 approved by FDA or in clinical trial pipeline, 501 in preclinical test, 819 with in-vivo testing data, 1868 with cell line/target testing data, 3025 without in-vivo/cell line/target testing data) together with their explicit pharma-information was collected and provided. Moreover, a total of 9171 literature-reported activities were discovered, which were identified from diverse clinical trial pipelines, model organisms, patient/cell-derived xenograft models, etc. Due to the significance of ADCs and their relevant data, this new database was expected to attract broad interests from diverse research fields of current biopharmaceutical drug discovery. The ADCdb is now publicly accessible at: https://idrblab.org/adcdb/.
Collapse
Affiliation(s)
- Liteng Shen
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhen Chen
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yu Guo
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Zheyuan Shen
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yi Song
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Wenxiu Xin
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
| | - Haiying Ding
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
| | - Xinyue Ma
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
| | - Weiben Xu
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China
| | - Wanying Zhou
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
| | - Jinxin Che
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Lili Tan
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
| | - Liangsheng Chen
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
| | - Siqi Chen
- School of Pharmaceutical Science, Zhejiang Chinese Medical University, Hangzhou 310053, China
| | - Xiaowu Dong
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China
| | - Luo Fang
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China
- School of Pharmaceutical Science, Zhejiang Chinese Medical University, Hangzhou 310053, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
4
|
Ma X, Liang Y, Zhang S. iAVPs-ResBi: Identifying antiviral peptides by using deep residual network and bidirectional gated recurrent unit. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:21563-21587. [PMID: 38124610 DOI: 10.3934/mbe.2023954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Human history is also the history of the fight against viral diseases. From the eradication of viruses to coexistence, advances in biomedicine have led to a more objective understanding of viruses and a corresponding increase in the tools and methods to combat them. More recently, antiviral peptides (AVPs) have been discovered, which due to their superior advantages, have achieved great impact as antiviral drugs. Therefore, it is very necessary to develop a prediction model to accurately identify AVPs. In this paper, we develop the iAVPs-ResBi model using k-spaced amino acid pairs (KSAAP), encoding based on grouped weight (EBGW), enhanced grouped amino acid composition (EGAAC) based on the N5C5 sequence, composition, transition and distribution (CTD) based on physicochemical properties for multi-feature extraction. Then we adopt bidirectional long short-term memory (BiLSTM) to fuse features for obtaining the most differentiated information from multiple original feature sets. Finally, the deep model is built by combining improved residual network and bidirectional gated recurrent unit (BiGRU) to perform classification. The results obtained are better than those of the existing methods, and the accuracies are 95.07, 98.07, 94.29 and 97.50% on the four datasets, which show that iAVPs-ResBi can be used as an effective tool for the identification of antiviral peptides. The datasets and codes are freely available at https://github.com/yunyunliang88/iAVPs-ResBi.
Collapse
Affiliation(s)
- Xinyan Ma
- School of Science, Xi'an Polytechnic University, Xi'an 710048, China
| | - Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an 710048, China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| |
Collapse
|
5
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and design of protease enzyme specificity using a structure-aware graph convolutional network. Proc Natl Acad Sci U S A 2023; 120:e2303590120. [PMID: 37729196 PMCID: PMC10523478 DOI: 10.1073/pnas.2303590120] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 08/14/2023] [Indexed: 09/22/2023] Open
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key posttranslational modification involved in physiology and disease. The ability to robustly and rapidly predict protease-substrate specificity would also enable targeted proteolytic cleavage by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pretrained PGCN model to guide the design of protease libraries for cleaving two noncanonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Joseph H. Lubin
- Department of Chemistry and Chemical Biology, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | | | - Guanyang Wang
- Department of Statistics, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
- Department of Statistics, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
- Department of Chemistry and Chemical Biology, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| |
Collapse
|
6
|
Li F, Wang C, Guo X, Akutsu T, Webb GI, Coin LJM, Kurgan L, Song J. ProsperousPlus: a one-stop and comprehensive platform for accurate protease-specific substrate cleavage prediction and machine-learning model construction. Brief Bioinform 2023; 24:bbad372. [PMID: 37874948 DOI: 10.1093/bib/bbad372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 08/30/2023] [Accepted: 09/29/2023] [Indexed: 10/26/2023] Open
Abstract
Proteases contribute to a broad spectrum of cellular functions. Given a relatively limited amount of experimental data, developing accurate sequence-based predictors of substrate cleavage sites facilitates a better understanding of protease functions and substrate specificity. While many protease-specific predictors of substrate cleavage sites were developed, these efforts are outpaced by the growth of the protease substrate cleavage data. In particular, since data for 100+ protease types are available and this number continues to grow, it becomes impractical to publish predictors for new protease types, and instead it might be better to provide a computational platform that helps users to quickly and efficiently build predictors that address their specific needs. To this end, we conceptualized, developed, tested and released a versatile bioinformatics platform, ProsperousPlus, that empowers users, even those with no programming or little bioinformatics background, to build fast and accurate predictors of substrate cleavage sites. ProsperousPlus facilitates the use of the rapidly accumulating substrate cleavage data to train, empirically assess and deploy predictive models for user-selected substrate types. Benchmarking tests on test datasets show that our platform produces predictors that on average exceed the predictive performance of current state-of-the-art approaches. ProsperousPlus is available as a webserver and a stand-alone software package at http://prosperousplus.unimelb-biotools.cloud.edu.au/.
Collapse
Affiliation(s)
- Fuyi Li
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
- South Australian immunoGENomics Cancer Institute (SAiGENCI), Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Cong Wang
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Xudong Guo
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Data Futures Institute, Monash University, VIC 3800, Australia
| | - Lachlan J M Coin
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Jiangning Song
- Monash Data Futures Institute, Monash University, VIC 3800, Australia
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
| |
Collapse
|
7
|
Ameen SS, Griem-Krey N, Dufour A, Hossain MI, Hoque A, Sturgeon S, Nandurkar H, Draxler DF, Medcalf RL, Kamaruddin MA, Lucet IS, Leeming MG, Liu D, Dhillon A, Lim JP, Basheer F, Zhu HJ, Bokhari L, Roulston CL, Paradkar PN, Kleifeld O, Clarkson AN, Wellendorph P, Ciccotosto GD, Williamson NA, Ang CS, Cheng HC. N-Terminomic Changes in Neurons During Excitotoxicity Reveal Proteolytic Events Associated With Synaptic Dysfunctions and Potential Targets for Neuroprotection. Mol Cell Proteomics 2023; 22:100543. [PMID: 37030595 PMCID: PMC10199228 DOI: 10.1016/j.mcpro.2023.100543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 02/23/2023] [Accepted: 04/04/2023] [Indexed: 04/10/2023] Open
Abstract
Excitotoxicity, a neuronal death process in neurological disorders such as stroke, is initiated by the overstimulation of ionotropic glutamate receptors. Although dysregulation of proteolytic signaling networks is critical for excitotoxicity, the identity of affected proteins and mechanisms by which they induce neuronal cell death remain unclear. To address this, we used quantitative N-terminomics to identify proteins modified by proteolysis in neurons undergoing excitotoxic cell death. We found that most proteolytically processed proteins in excitotoxic neurons are likely substrates of calpains, including key synaptic regulatory proteins such as CRMP2, doublecortin-like kinase I, Src tyrosine kinase and calmodulin-dependent protein kinase IIβ (CaMKIIβ). Critically, calpain-catalyzed proteolytic processing of these proteins generates stable truncated fragments with altered activities that potentially contribute to neuronal death by perturbing synaptic organization and function. Blocking calpain-mediated proteolysis of one of these proteins, Src, protected against neuronal loss in a rat model of neurotoxicity. Extrapolation of our N-terminomic results led to the discovery that CaMKIIα, an isoform of CaMKIIβ, undergoes differential processing in mouse brains under physiological conditions and during ischemic stroke. In summary, by identifying the neuronal proteins undergoing proteolysis during excitotoxicity, our findings offer new insights into excitotoxic neuronal death mechanisms and reveal potential neuroprotective targets for neurological disorders.
Collapse
Affiliation(s)
- S Sadia Ameen
- Department of Biochemistry and Pharmacology, University of Melbourne, Parkville, Victoria, Australia; Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria, Australia
| | - Nane Griem-Krey
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Antoine Dufour
- Department of Physiology and Pharmacology, University of Calgary, Calgary, Alberta, Canada
| | - M Iqbal Hossain
- Department of Biochemistry and Pharmacology, University of Melbourne, Parkville, Victoria, Australia; Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria, Australia; Department of Pharmacology and Toxicology, University of Alabama, Birmingham, Alabama, USA
| | - Ashfaqul Hoque
- St Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia
| | - Sharelle Sturgeon
- Australian Centre for Blood Diseases, Monash University, Melbourne, Victoria, Australia
| | - Harshal Nandurkar
- Australian Centre for Blood Diseases, Monash University, Melbourne, Victoria, Australia
| | - Dominik F Draxler
- Australian Centre for Blood Diseases, Monash University, Melbourne, Victoria, Australia
| | - Robert L Medcalf
- Australian Centre for Blood Diseases, Monash University, Melbourne, Victoria, Australia
| | - Mohd Aizuddin Kamaruddin
- Department of Biochemistry and Pharmacology, University of Melbourne, Parkville, Victoria, Australia; Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria, Australia
| | - Isabelle S Lucet
- Chemical Biology Division, The Walter and Eliza Hall Institute for Medical Research, Parkville, Victoria, Australia; Department of Medical Biology, University of Melbourne, Parkville, Victoria, Australia
| | - Michael G Leeming
- Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria, Australia
| | - Dazhi Liu
- Department of Neurology, School of Medicine, University of California, Davis, California, USA
| | - Amardeep Dhillon
- Faculty of Health, Institute for Mental and Physical Health and Clinical Translation, School of Medicine, Deakin University, Waurn Ponds, Victoria, Australia
| | - Jet Phey Lim
- Faculty of Health, Institute for Mental and Physical Health and Clinical Translation, School of Medicine, Deakin University, Waurn Ponds, Victoria, Australia
| | - Faiza Basheer
- Faculty of Health, Institute for Mental and Physical Health and Clinical Translation, School of Medicine, Deakin University, Waurn Ponds, Victoria, Australia
| | - Hong-Jian Zhu
- Department of Surgery (Royal Melbourne Hospital), University of Melbourne, Parkville, Victoria, Australia
| | - Laita Bokhari
- Department of Biochemistry and Pharmacology, University of Melbourne, Parkville, Victoria, Australia; Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria, Australia
| | - Carli L Roulston
- Florey Institute of Neuroscience and Mental Health, Parkville, Victoria, Australia
| | - Prasad N Paradkar
- CSIRO Health & Biosecurity, Australian Centre for Disease Preparedness, East Geelong, Victoria, Australia
| | - Oded Kleifeld
- Faculty of Biology, Technion-Israel Institute of Technology, Technion City, Haifa, Israel
| | - Andrew N Clarkson
- Department of Anatomy, Brain Health Research Centre and Brain Research New Zealand, University of Otago, Dunedin, New Zealand
| | - Petrine Wellendorph
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Giuseppe D Ciccotosto
- Department of Biochemistry and Pharmacology, University of Melbourne, Parkville, Victoria, Australia; Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria, Australia.
| | - Nicholas A Williamson
- Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria, Australia.
| | - Ching-Seng Ang
- Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria, Australia.
| | - Heung-Chin Cheng
- Department of Biochemistry and Pharmacology, University of Melbourne, Parkville, Victoria, Australia; Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria, Australia.
| |
Collapse
|
8
|
Tušar L, Loboda J, Impens F, Sosnowski P, Van Quickelberghe E, Vidmar R, Demol H, Sedeyn K, Saelens X, Vizovišek M, Mihelič M, Fonović M, Horvat J, Kosec G, Turk B, Gevaert K, Turk D. Proteomic data and structure analysis combined reveal interplay of structural rigidity and flexibility on selectivity of cysteine cathepsins. Commun Biol 2023; 6:450. [PMID: 37095140 PMCID: PMC10124925 DOI: 10.1038/s42003-023-04772-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 03/28/2023] [Indexed: 04/26/2023] Open
Abstract
Addressing the elusive specificity of cysteine cathepsins, which in contrast to caspases and trypsin-like proteases lack strict specificity determining P1 pocket, calls for innovative approaches. Proteomic analysis of cell lysates with human cathepsins K, V, B, L, S, and F identified 30,000 cleavage sites, which we analyzed by software platform SAPS-ESI (Statistical Approach to Peptidyl Substrate-Enzyme Specific Interactions). SAPS-ESI is used to generate clusters and training sets for support vector machine learning. Cleavage site predictions on the SARS-CoV-2 S protein, confirmed experimentally, expose the most probable first cut under physiological conditions and suggested furin-like behavior of cathepsins. Crystal structure analysis of representative peptides in complex with cathepsin V reveals rigid and flexible sites consistent with analysis of proteomics data by SAPS-ESI that correspond to positions with heterogeneous and homogeneous distribution of residues. Thereby support for design of selective cleavable linkers of drug conjugates and drug discovery studies is provided.
Collapse
Affiliation(s)
- Livija Tušar
- Jožef Stefan Institute, Department of Biochemistry and Molecular and Structural Biology, Jamova cesta 39, 1000, Ljubljana, Slovenia
- Centre of Excellence for Integrated Approaches in Chemistry and Biology of Proteins (CIPKeBiP), Jamova cesta 39, 1000, Ljubljana, Slovenia
| | - Jure Loboda
- Jožef Stefan Institute, Department of Biochemistry and Molecular and Structural Biology, Jamova cesta 39, 1000, Ljubljana, Slovenia
- The Jožef Stefan International Postgraduate School, Jamova cesta 39, 1000, Ljubljana, Slovenia
| | - Francis Impens
- VIB-UGent Center for Medical Biotechnology and UGent Department of Biomolecular Medicine, Technologiepark-Zwijnaarde 75, 9052, Ghent, Belgium
| | - Piotr Sosnowski
- Centre of Excellence for Integrated Approaches in Chemistry and Biology of Proteins (CIPKeBiP), Jamova cesta 39, 1000, Ljubljana, Slovenia
| | - Emmy Van Quickelberghe
- VIB-UGent Center for Medical Biotechnology and UGent Department of Biomolecular Medicine, Technologiepark-Zwijnaarde 75, 9052, Ghent, Belgium
| | - Robert Vidmar
- Jožef Stefan Institute, Department of Biochemistry and Molecular and Structural Biology, Jamova cesta 39, 1000, Ljubljana, Slovenia
| | - Hans Demol
- VIB-UGent Center for Medical Biotechnology and UGent Department of Biomolecular Medicine, Technologiepark-Zwijnaarde 75, 9052, Ghent, Belgium
| | - Koen Sedeyn
- VIB-UGent Center for Medical Biotechnology and, Department for Biochemistry and Microbiology, Ghent University, 9052, Ghent, Belgium
| | - Xavier Saelens
- VIB-UGent Center for Medical Biotechnology and, Department for Biochemistry and Microbiology, Ghent University, 9052, Ghent, Belgium
| | - Matej Vizovišek
- Jožef Stefan Institute, Department of Biochemistry and Molecular and Structural Biology, Jamova cesta 39, 1000, Ljubljana, Slovenia
| | - Marko Mihelič
- Jožef Stefan Institute, Department of Biochemistry and Molecular and Structural Biology, Jamova cesta 39, 1000, Ljubljana, Slovenia
| | - Marko Fonović
- Jožef Stefan Institute, Department of Biochemistry and Molecular and Structural Biology, Jamova cesta 39, 1000, Ljubljana, Slovenia
| | - Jaka Horvat
- Acies Bio d.o.o., Tehnološki park 21, 1000, Ljubljana, Slovenia
| | - Gregor Kosec
- Acies Bio d.o.o., Tehnološki park 21, 1000, Ljubljana, Slovenia
| | - Boris Turk
- Jožef Stefan Institute, Department of Biochemistry and Molecular and Structural Biology, Jamova cesta 39, 1000, Ljubljana, Slovenia
- Faculty of Chemistry, University of Ljubljana, Večna pot 113, SI-1000, Ljubljana, Slovenia
| | - Kris Gevaert
- VIB-UGent Center for Medical Biotechnology and UGent Department of Biomolecular Medicine, Technologiepark-Zwijnaarde 75, 9052, Ghent, Belgium.
| | - Dušan Turk
- Jožef Stefan Institute, Department of Biochemistry and Molecular and Structural Biology, Jamova cesta 39, 1000, Ljubljana, Slovenia.
- Centre of Excellence for Integrated Approaches in Chemistry and Biology of Proteins (CIPKeBiP), Jamova cesta 39, 1000, Ljubljana, Slovenia.
| |
Collapse
|
9
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and Design of Protease Enzyme Specificity Using a Structure-Aware Graph Convolutional Network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.16.528728. [PMID: 36824945 PMCID: PMC9949123 DOI: 10.1101/2023.02.16.528728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key post-translational modification involved in physiology and disease. The ability to robustly and rapidly predict protease substrate specificity would also enable targeted proteolytic cleavage - editing - of a target protein by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally-derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the three-dimensional structure and energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically-grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases: the NS3/4 protease from the Hepatitis C virus (HCV) and the Tobacco Etch Virus (TEV) proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pre-trained PGCN model to guide the design of TEV protease libraries for cleaving two non-canonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Joseph H. Lubin
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | | | - Guanyang Wang
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| |
Collapse
|
10
|
Ding Y, He W, Tang J, Zou Q, Guo F. Laplacian Regularized Sparse Representation Based Classifier for Identifying DNA N4-Methylcytosine Sites via L 2,1/2-Matrix Norm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:500-511. [PMID: 34882559 DOI: 10.1109/tcbb.2021.3133309] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
N4-methylcytosine (4mC) is one of important epigenetic modifications in DNA sequences. Detecting 4mC sites is time-consuming. The computational method based on machine learning has provided effective help for identifying 4mC. To further improve the performance of prediction, we propose a Laplacian Regularized Sparse Representation based Classifier with L2,1/2-matrix norm (LapRSRC). We also utilize kernel trick to derive the kernel LapRSRC for nonlinear modeling. Matrix factorization technology is employed to solve the sparse representation coefficients of all test samples in the training set. And an efficient iterative algorithm is proposed to solve the objective function. We implement our model on six benchmark datasets of 4mC and eight UCI datasets to evaluate performance. The results show that the performance of our method is better or comparable.
Collapse
|
11
|
Stanovova MV, Gazizova GR, Gorbushin AM. Transcriptomic profiling of immune-associated molecules in the coelomocytes of lugworm Arenicola marina (Linnaeus, 1758). JOURNAL OF EXPERIMENTAL ZOOLOGY. PART B, MOLECULAR AND DEVELOPMENTAL EVOLUTION 2023; 340:34-55. [PMID: 35438249 DOI: 10.1002/jez.b.23135] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 02/04/2022] [Accepted: 03/11/2022] [Indexed: 12/16/2022]
Abstract
Organization and functioning of immune system remain unevenly studied in different taxa of lophotrochozoan animals. We analyzed transcriptomic data on coelomocytes of the lugworm Arenicola marina (Linnaeus, 1758; Annelida, Polychaeta) to gain insights into the molecular mechanisms involved in polychaete immunity. Coelomocytes are specialized motile cells populating coelomic fluid of annelids, responsible for cellular defense reactions and providing humoral immune factors. The transcriptome was enriched with immune-related transcripts by challenging the cells in vitro with lipopolysaccharides of Escherichia coli and Zymosan from Saccharomyces cerevisiae. Our analysis revealed a multifaceted and complex internal defense system of the lugworm. A. marina possesses orthologs of proto-complement-like factors: six thioester-containing proteins, a complement-like receptor, and a MASP-related serine protease (MReM2). A. marina coelomocytes employ pattern-recognition receptors to detect pathogens and regulate immune responses. Among them, there are 18 Toll-like receptors and various putative lectin-like proteins with evolutionary conserved and taxa-specific domains. C-type lectins and a novel family of Gal-binding and CUB domains containing receptors were the most abundant in the transcriptome. The array of pore-forming proteins in the coelomocytes was surprisingly reduced compared to that of other invertebrate species. We characterized a set of conserved proteins metabolizing reactive oxygen species and nitric oxide and expanded the arsenal of potential antimicrobial peptides. Phenoloxidase activity in immune cells of lugworm is mediated only by laccase enzyme. The described repertoire of immune-associated molecules provides valuable candidates for further functional and comparative research on the immunity of annelids.
Collapse
Affiliation(s)
- Maria V Stanovova
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Guzel R Gazizova
- Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russia
| | - Alexander M Gorbushin
- Sechenov Institute of Evolutionary Physiology and Biochemistry (IEPhB RAS), St. Petersburg, Russia
| |
Collapse
|
12
|
Henehan GT, Ryan BJ, Kinsella GK. Approaches to Avoid Proteolysis During Protein Expression and Purification. Methods Mol Biol 2023; 2699:77-95. [PMID: 37646995 DOI: 10.1007/978-1-0716-3362-5_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
All cells contain proteases, which hydrolyze the peptide bonds between amino acids of a protein backbone. Typically, proteases are prevented from nonspecific proteolysis by regulation and by their physical separation into different subcellular compartments; however, this segregation is not retained during cell lysis, which is the initial step in any protein isolation procedure. Prevention of proteolysis during protein purification often takes the form of a two-pronged approach: first, inhibition of proteolysis in situ, followed by the early separation of the protease from the protein of interest via chromatographic purification. Protease inhibitors are routinely used to limit the effect of the proteases before they are physically separated from the protein of interest via column chromatography. In this chapter, commonly used approaches to reducing or avoiding proteolysis during protein expression and purification are reviewed.
Collapse
Affiliation(s)
- Gary T Henehan
- School of Food Science and Environmental Health, Technological University Dublin, Grangegorman, Dublin, Ireland
| | - Barry J Ryan
- School of Food Science and Environmental Health, Technological University Dublin, Grangegorman, Dublin, Ireland
| | - Gemma K Kinsella
- School of Food Science and Environmental Health, Technological University Dublin, Grangegorman, Dublin, Ireland.
| |
Collapse
|
13
|
Bono N, Saroglia G, Marcuzzo S, Giagnorio E, Lauria G, Rosini E, De Nardo L, Athanassiou A, Candiani G, Perotto G. Silk fibroin microgels as a platform for cell microencapsulation. JOURNAL OF MATERIALS SCIENCE. MATERIALS IN MEDICINE 2022; 34:3. [PMID: 36586059 PMCID: PMC9805413 DOI: 10.1007/s10856-022-06706-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 11/27/2022] [Indexed: 06/17/2023]
Abstract
Cell microencapsulation has been utilized for years as a means of cell shielding from the external environment while facilitating the transport of gases, general metabolites, and secretory bioactive molecules at once. In this light, hydrogels may support the structural integrity and functionality of encapsulated biologics whereas ensuring cell viability and function and releasing potential therapeutic factors once in situ. In this work, we describe a straightforward strategy to fabricate silk fibroin (SF) microgels (µgels) and encapsulate cells into them. SF µgels (size ≈ 200 µm) were obtained through ultrasonication-induced gelation of SF in a water-oil emulsion phase. A thorough physicochemical (SEM analysis, and FT-IR) and mechanical (microindentation tests) characterization of SF µgels were carried out to assess their nanostructure, porosity, and stiffness. SF µgels were used to encapsulate and culture L929 and primary myoblasts. Interestingly, SF µgels showed a selective release of relatively small proteins (e.g., VEGF, molecular weight, MW = 40 kDa) by the encapsulated primary myoblasts, while bigger (macro)molecules (MW = 160 kDa) were hampered to diffusing through the µgels. This article provided the groundwork to expand the use of SF hydrogels into a versatile platform for encapsulating relevant cells able to release paracrine factors potentially regulating tissue and/or organ functions, thus promoting their regeneration.
Collapse
Affiliation(s)
- Nina Bono
- Department of Chemistry, Materials and Chemical Engineering "Giulio Natta", Politecnico di Milano, Via Mancinelli 7, 20131, Milan, Italy.
| | - Giulio Saroglia
- Department of Chemistry, Materials and Chemical Engineering "Giulio Natta", Politecnico di Milano, Via Mancinelli 7, 20131, Milan, Italy
- Smart Materials, Istituto Italiano di Tecnologia, Via Morego 30, 16163, Genova, Italy
| | - Stefania Marcuzzo
- Neurology IV-Neuroimmunology and Neuromuscular Diseases Unit, Fondazione IRCCS Istituto Neurologico Carlo Besta, Via Celoria 11, 20133, Milan, Italy
| | - Eleonora Giagnorio
- Neurology IV-Neuroimmunology and Neuromuscular Diseases Unit, Fondazione IRCCS Istituto Neurologico Carlo Besta, Via Celoria 11, 20133, Milan, Italy
| | - Giuseppe Lauria
- Department of Clinical Neurosciences, Fondazione IRCCS Istituto Neurologico Carlo Besta, Via Celoria 11, 20133, Milan, Italy
- Department of Medical Biotechnology and Translational Medicine, University of Milan, Via Vanvitelli 32, 20133, Milan, Italy
| | - Elena Rosini
- The Protein Factory 2.0, Department of Biotechnology and Life Sciences, University of Insubria, Via J.H. Dunant 3, 21100, Varese, Italy
| | - Luigi De Nardo
- Department of Chemistry, Materials and Chemical Engineering "Giulio Natta", Politecnico di Milano, Via Mancinelli 7, 20131, Milan, Italy
| | | | - Gabriele Candiani
- Department of Chemistry, Materials and Chemical Engineering "Giulio Natta", Politecnico di Milano, Via Mancinelli 7, 20131, Milan, Italy
| | - Giovanni Perotto
- Smart Materials, Istituto Italiano di Tecnologia, Via Morego 30, 16163, Genova, Italy.
| |
Collapse
|
14
|
Onah E, Uzor PF, Ugwoke IC, Eze JU, Ugwuanyi ST, Chukwudi IR, Ibezim A. Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors. BMC Bioinformatics 2022; 23:466. [DOI: 10.1186/s12859-022-05017-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 10/11/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for the treatment of human immunodeficiency virus (HIV) infection, which is responsible for AIDS. Peptide cleavage by HIV-1 protease is an essential step in the replication of HIV-1. Thus, correct and timely prediction of the cleavage site of HIV-1 protease can significantly speed up and optimize the drug discovery process of novel HIV-1 protease inhibitors. In this work, we built and compared the performance of selected machine learning models for the prediction of HIV-1 protease cleavage site utilizing a hybrid of octapeptide sequence information comprising bond composition, amino acid binary profile (AABP), and physicochemical properties as numerical descriptors serving as input variables for some selected machine learning algorithms. Our work differs from antecedent studies exploring the same subject in the combination of octapeptide descriptors and method used. Instead of using various subsets of the dataset for training and testing the models, we combined the dataset, applied a 3-way data split, and then used a "stratified" 10-fold cross-validation technique alongside the testing set to evaluate the models.
Results
Among the 8 models evaluated in the “stratified” 10-fold CV experiment, logistic regression, multi-layer perceptron classifier, linear discriminant analysis, gradient boosting classifier, Naive Bayes classifier, and decision tree classifier with AUC, F-score, and B. Acc. scores in the ranges of 0.91–0.96, 0.81–0.88, and 80.1–86.4%, respectively, have the closest predictive performance to the state-of-the-art model (AUC 0.96, F-score 0.80 and B. Acc. ~ 80.0%). Whereas, the perceptron classifier and the K-nearest neighbors had statistically lower performance (AUC 0.77–0.82, F-score 0.53–0.69, and B. Acc. 60.0–68.5%) at p < 0.05. On the other hand, logistic regression, and multi-layer perceptron classifier (AUC of 0.97, F-score > 0.89, and B. Acc. > 90.0%) had the best performance on further evaluation on the testing set, though linear discriminant analysis, gradient boosting classifier, and Naive Bayes classifier equally performed well (AUC > 0.94, F-score > 0.87, and B. Acc. > 86.0%).
Conclusions
Logistic regression and multi-layer perceptron classifiers have comparable predictive performances to the state-of-the-art model when octapeptide sequence descriptors consisting of AABP, bond composition and standard physicochemical properties are used as input variables. In our future work, we hope to develop a standalone software for HIV-1 protease cleavage site prediction utilizing the linear regression algorithm and the aforementioned octapeptide sequence descriptors.
Collapse
|
15
|
Yan K, Lv H, Guo Y, Peng W, Liu B. sAMPpred-GAT: prediction of antimicrobial peptide by graph attention network and predicted peptide structure. Bioinformatics 2022; 39:6808615. [PMID: 36342186 PMCID: PMC9805557 DOI: 10.1093/bioinformatics/btac715] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 10/24/2022] [Accepted: 11/04/2022] [Indexed: 11/09/2022] Open
Abstract
MOTIVATION Antimicrobial peptides (AMPs) are essential components of therapeutic peptides for innate immunity. Researchers have developed several computational methods to predict the potential AMPs from many candidate peptides. With the development of artificial intelligent techniques, the protein structures can be accurately predicted, which are useful for protein sequence and function analysis. Unfortunately, the predicted peptide structure information has not been applied to the field of AMP prediction so as to improve the predictive performance. RESULTS In this study, we proposed a computational predictor called sAMPpred-GAT for AMP identification. To the best of our knowledge, sAMPpred-GAT is the first approach based on the predicted peptide structures for AMP prediction. The sAMPpred-GAT predictor constructs the graphs based on the predicted peptide structures, sequence information and evolutionary information. The Graph Attention Network (GAT) is then performed on the graphs to learn the discriminative features. Finally, the full connection networks are utilized as the output module to predict whether the peptides are AMP or not. Experimental results show that sAMPpred-GAT outperforms the other state-of-the-art methods in terms of AUC, and achieves better or highly comparable performance in terms of the other metrics on the eight independent test datasets, demonstrating that the predicted peptide structure information is important for AMP prediction. AVAILABILITY AND IMPLEMENTATION A user-friendly webserver of sAMPpred-GAT can be accessed at http://bliulab.net/sAMPpred-GAT and the source code is available at https://github.com/HongWuL/sAMPpred-GAT/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Hongwu Lv
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yichen Guo
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Wei Peng
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- To whom correspondence should be addressed.
| |
Collapse
|
16
|
Hu L, Li Z, Tang Z, Zhao C, Zhou X, Hu P. Effectively predicting HIV-1 protease cleavage sites by using an ensemble learning approach. BMC Bioinformatics 2022; 23:447. [PMID: 36303135 PMCID: PMC9608884 DOI: 10.1186/s12859-022-04999-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 10/13/2022] [Indexed: 11/10/2022] Open
Abstract
Background The site information of substrates that can be cleaved by human immunodeficiency virus 1 proteases (HIV-1 PRs) is of great significance for designing effective inhibitors against HIV-1 viruses. A variety of machine learning-based algorithms have been developed to predict HIV-1 PR cleavage sites by extracting relevant features from substrate sequences. However, only relying on the sequence information is not sufficient to ensure a promising performance due to the uncertainty in the way of separating the datasets used for training and testing. Moreover, the existence of noisy data, i.e., false positive and false negative cleavage sites, could negatively influence the accuracy performance. Results In this work, an ensemble learning algorithm for predicting HIV-1 PR cleavage sites, namely EM-HIV, is proposed by training a set of weak learners, i.e., biased support vector machine classifiers, with the asymmetric bagging strategy. By doing so, the impact of data imbalance and noisy data can thus be alleviated. Besides, in order to make full use of substrate sequences, the features used by EM-HIV are collected from three different coding schemes, including amino acid identities, chemical properties and variable-length coevolutionary patterns, for the purpose of constructing more relevant feature vectors of octamers. Experiment results on three independent benchmark datasets demonstrate that EM-HIV outperforms state-of-the-art prediction algorithm in terms of several evaluation metrics. Hence, EM-HIV can be regarded as a useful tool to accurately predict HIV-1 PR cleavage sites.
Collapse
Affiliation(s)
- Lun Hu
- grid.9227.e0000000119573309Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Zhenfeng Li
- grid.162110.50000 0000 9291 3229School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China
| | - Zehai Tang
- grid.162110.50000 0000 9291 3229School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China
| | - Cheng Zhao
- grid.162110.50000 0000 9291 3229School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China
| | - Xi Zhou
- grid.9227.e0000000119573309Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Pengwei Hu
- grid.9227.e0000000119573309Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| |
Collapse
|
17
|
FRTpred: A novel approach for accurate prediction of protein folding rate and type. Comput Biol Med 2022; 149:105911. [DOI: 10.1016/j.compbiomed.2022.105911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 07/08/2022] [Accepted: 07/23/2022] [Indexed: 11/20/2022]
|
18
|
Hu J, Wang J, Li J, Hu H, Wu B, Ren H, Wang J. AHLS-pred: a novel sequence-based predictor of acyl-homoserine-lactone synthases using machine learning algorithms. ENVIRONMENTAL MICROBIOLOGY REPORTS 2022; 14:616-631. [PMID: 35403334 DOI: 10.1111/1758-2229.13068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 03/28/2022] [Accepted: 03/30/2022] [Indexed: 06/14/2023]
Abstract
Acyl-homoserine-lactones (AHLs), as the major quorum sensing (QS) signalling molecules in Gram-negative bacteria, have shown great application potential in regulating biological nutrient removal process. The identification of AHLs synthases plays an essential role in in-depth research on QS mechanisms and applications of biological wastewater treatment processes. This work proposed the first prediction model for AHLs synthases based on machine learning algorithms, namely, AHLS-pred. The training dataset AHLS1400 and the independent testing dataset AHLS132 for AHLSs prediction were first established. Three sequence-based feature extraction methods are utilized to generate feature descriptors, namely, amino acid composition, dipeptide composition and G-gap dipeptide composition respectively. Subsequently, the optimal features were obtained based on the sorted feature descriptors (in F-score order) and the sequential forward search strategy. By comparing five different machine learning algorithms, the final prediction model is trained with support vector machine classifier on AHLS1400 in fivefold cross-validation with the best performance (ACC = 99.43%, MCC = 0.989, AUC = 0.997). The results show that AHLS-pred achieves an ACC of 94.70%, MCC of 0.894 and AUC of 0.995 on the independent testing dataset AHLS132. It demonstrates that AHLS-pred is a promising and powerful prediction method for accelerating the process of AHLSs computational identification.
Collapse
Affiliation(s)
- Jie Hu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, 210023, China
| | - Jin Wang
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, 210023, China
| | - Jiahao Li
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, 210023, China
| | - Haidong Hu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, 210023, China
| | - Bin Wu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, 210023, China
| | - Hongqiang Ren
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, 210023, China
| | - Jinfeng Wang
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, 210023, China
| |
Collapse
|
19
|
Bell PA, Scheuermann S, Renner F, Pan CL, Lu HY, Turvey SE, Bornancin F, Régnier CH, Overall CM. Integrating knowledge of protein sequence with protein function for the prediction and validation of new MALT1 substrates. Comput Struct Biotechnol J 2022; 20:4717-4732. [PMID: 36147669 PMCID: PMC9463181 DOI: 10.1016/j.csbj.2022.08.021] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 08/07/2022] [Accepted: 08/08/2022] [Indexed: 11/30/2022] Open
Affiliation(s)
- Peter A. Bell
- Centre for Blood Research, Life Sciences Centre, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Department of Oral Biological and Medical Sciences, Faculty of Dentistry, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | - Sophia Scheuermann
- Centre for Blood Research, Life Sciences Centre, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Department of Oral Biological and Medical Sciences, Faculty of Dentistry, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Department of Immunology, Eberhard Karl University Tübingen, 72076 Tübingen, Germany
- Department of Hematology and Oncology, University Hospital Tübingen, Children's Hospital, 72076 Tübingen, Germany
| | - Florian Renner
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4056 Basel, Switzerland
- Molecular Targeted Therapy - Discovery Oncology, Roche Pharma Research & Early Development, F. Hoffmann-La Roche Ltd, 4070 Basel, Switzerland
| | - Christina L. Pan
- Centre for Blood Research, Life Sciences Centre, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Department of Oral Biological and Medical Sciences, Faculty of Dentistry, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | - Henry Y. Lu
- Department of Pediatrics, British Columbia Children's Hospital, The University of British Columbia, Vancouver, BC V5Z 4H4, Canada
- Department of Experimental Medicine, Faculty of Medicine, The University of British Columbia, Vancouver, BC V5Z 1M9, Canada
| | - Stuart E. Turvey
- Department of Pediatrics, British Columbia Children's Hospital, The University of British Columbia, Vancouver, BC V5Z 4H4, Canada
- Department of Experimental Medicine, Faculty of Medicine, The University of British Columbia, Vancouver, BC V5Z 1M9, Canada
| | - Frédéric Bornancin
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4056 Basel, Switzerland
| | - Catherine H. Régnier
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4056 Basel, Switzerland
| | - Christopher M. Overall
- Centre for Blood Research, Life Sciences Centre, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Department of Oral Biological and Medical Sciences, Faculty of Dentistry, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Corresponding author at: Centre for Blood Research, Life Sciences Centre, University of British Columbia, Vancouver, BC V6T 1Z3, Canada.
| |
Collapse
|
20
|
Asim MN, Ibrahim MA, Imran Malik M, Dengel A, Ahmed S. Circ-LocNet: A Computational Framework for Circular RNA Sub-Cellular Localization Prediction. Int J Mol Sci 2022; 23:ijms23158221. [PMID: 35897818 PMCID: PMC9329987 DOI: 10.3390/ijms23158221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/15/2022] [Accepted: 07/20/2022] [Indexed: 02/04/2023] Open
Abstract
Circular ribonucleic acids (circRNAs) are novel non-coding RNAs that emanate from alternative splicing of precursor mRNA in reversed order across exons. Despite the abundant presence of circRNAs in human genes and their involvement in diverse physiological processes, the functionality of most circRNAs remains a mystery. Like other non-coding RNAs, sub-cellular localization knowledge of circRNAs has the aptitude to demystify the influence of circRNAs on protein synthesis, degradation, destination, their association with different diseases, and potential for drug development. To date, wet experimental approaches are being used to detect sub-cellular locations of circular RNAs. These approaches help to elucidate the role of circRNAs as protein scaffolds, RNA-binding protein (RBP) sponges, micro-RNA (miRNA) sponges, parental gene expression modifiers, alternative splicing regulators, and transcription regulators. To complement wet-lab experiments, considering the progress made by machine learning approaches for the determination of sub-cellular localization of other non-coding RNAs, the paper in hand develops a computational framework, Circ-LocNet, to precisely detect circRNA sub-cellular localization. Circ-LocNet performs comprehensive extrinsic evaluation of 7 residue frequency-based, residue order and frequency-based, and physio-chemical property-based sequence descriptors using the five most widely used machine learning classifiers. Further, it explores the performance impact of K-order sequence descriptor fusion where it ensembles similar as well dissimilar genres of statistical representation learning approaches to reap the combined benefits. Considering the diversity of statistical representation learning schemes, it assesses the performance of second-order, third-order, and going all the way up to seventh-order sequence descriptor fusion. A comprehensive empirical evaluation of Circ-LocNet over a newly developed benchmark dataset using different settings reveals that standalone residue frequency-based sequence descriptors and tree-based classifiers are more suitable to predict sub-cellular localization of circular RNAs. Further, K-order heterogeneous sequence descriptors fusion in combination with tree-based classifiers most accurately predict sub-cellular localization of circular RNAs. We anticipate this study will act as a rich baseline and push the development of robust computational methodologies for the accurate sub-cellular localization determination of novel circRNAs.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
- Correspondence:
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Imran Malik
- School of Computer Science & Electrical Engineering, National University of Sciences and Technology, Islamabad 44000, Pakistan;
| | - Andreas Dengel
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- DeepReader GmbH, Trippstadter Str. 122, 67663 Kaiserslautern, Germany
| |
Collapse
|
21
|
Soleimany AP, Martin-Alonso C, Anahtar M, Wang CS, Bhatia SN. Protease Activity Analysis: A Toolkit for Analyzing Enzyme Activity Data. ACS OMEGA 2022; 7:24292-24301. [PMID: 35874224 PMCID: PMC9301967 DOI: 10.1021/acsomega.2c01559] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Analyzing the activity of proteases and their substrates is critical to defining the biological functions of these enzymes and to designing new diagnostics and therapeutics that target protease dysregulation in disease. While a wide range of databases and algorithms have been created to better predict protease cleavage sites, there is a dearth of computational tools to automate analysis of in vitro and in vivo protease assays. This necessitates individual researchers to develop their own analytical pipelines, resulting in a lack of standardization across the field. To facilitate protease research, here we present Protease Activity Analysis (PAA), a toolkit for the preprocessing, visualization, machine learning analysis, and querying of protease activity data sets. PAA leverages a Python-based object-oriented implementation that provides a modular framework for streamlined analysis across three major components. First, PAA provides a facile framework to query data sets of synthetic peptide substrates and their cleavage susceptibilities across a diverse set of proteases. To complement the database functionality, PAA also includes tools for the automated analysis and visualization of user-input enzyme-substrate activity measurements generated through in vitro screens against synthetic peptide substrates. Finally, PAA supports a set of modular machine learning functions to analyze in vivo protease activity signatures that are generated by activity-based sensors. Overall, PAA offers the protease community a breadth of computational tools to streamline research, taking a step toward standardizing data analysis across the field and in chemical biology and biochemistry at large.
Collapse
Affiliation(s)
- Ava P. Soleimany
- Harvard-MIT
Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, United States
- Program
in Biophysics, Harvard University, Boston, Massachusetts 02115, United States
- Microsoft
Research New England, Cambridge, Massachusetts 02142, United States
| | - Carmen Martin-Alonso
- Harvard-MIT
Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, United States
| | - Melodi Anahtar
- Harvard-MIT
Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, United States
| | - Cathy S. Wang
- Department
of Biological Engineering, MIT, Cambridge, Massachusetts 02139, United States
| | - Sangeeta N. Bhatia
- Harvard-MIT
Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, United States
- Department
of Electrical Engineering and Computer Science, MIT, Cambridge, Massachusetts 02139, United States
- Howard Hughes
Medical Institute, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
22
|
Rezende PM, Xavier JS, Ascher DB, Fernandes GR, Pires DEV. Evaluating hierarchical machine learning approaches to classify biological databases. Brief Bioinform 2022; 23:6611916. [PMID: 35724625 PMCID: PMC9310517 DOI: 10.1093/bib/bbac216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 04/29/2022] [Accepted: 05/09/2022] [Indexed: 12/04/2022] Open
Abstract
The rate of biological data generation has increased dramatically in recent years, which has driven the importance of databases as a resource to guide innovation and the generation of biological insights. Given the complexity and scale of these databases, automatic data classification is often required. Biological data sets are often hierarchical in nature, with varying degrees of complexity, imposing different challenges to train, test and validate accurate and generalizable classification models. While some approaches to classify hierarchical data have been proposed, no guidelines regarding their utility, applicability and limitations have been explored or implemented. These include ‘Local’ approaches considering the hierarchy, building models per level or node, and ‘Global’ hierarchical classification, using a flat classification approach. To fill this gap, here we have systematically contrasted the performance of ‘Local per Level’ and ‘Local per Node’ approaches with a ‘Global’ approach applied to two different hierarchical datasets: BioLip and CATH. The results show how different components of hierarchical data sets, such as variation coefficient and prediction by depth, can guide the choice of appropriate classification schemes. Finally, we provide guidelines to support this process when embarking on a hierarchical classification task, which will help optimize computational resources and predictive performance.
Collapse
Affiliation(s)
- Pâmela M Rezende
- Universidade Federal de Minas Gerais.,Instituto René Rachou, Fundação Oswaldo Cruz.,Stilingue Inteligência Artificial
| | - Joicymara S Xavier
- Universidade Federal de Minas Gerais.,Instituto René Rachou, Fundação Oswaldo Cruz.,Institute of Agricultural Sciences, Universidade Federal dos Vales do Jequitinhonha e Mucuri
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland.,Systems and Computational Biology, Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute
| | | | - Douglas E V Pires
- Systems and Computational Biology, Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute.,School of Computing and Information Systems, University of Melbourne
| |
Collapse
|
23
|
Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2499:285-322. [PMID: 35696087 DOI: 10.1007/978-1-0716-2317-6_15] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Posttranslational modification (PTM ) is a ubiquitous phenomenon in both eukaryotes and prokaryotes which gives rise to enormous proteomic diversity. PTM mostly comes in two flavors: covalent modification to polypeptide chain and proteolytic cleavage. Understanding and characterization of PTM is a fundamental step toward understanding the underpinning of biology. Recent advances in experimental approaches, mainly mass-spectrometry-based approaches, have immensely helped in obtaining and characterizing PTMs. However, experimental approaches are not enough to understand and characterize more than 450 different types of PTMs and complementary computational approaches are becoming popular. Recently, due to the various advancements in the field of Deep Learning (DL), along with the explosion of applications of DL to various fields, the field of computational prediction of PTM has also witnessed the development of a plethora of deep learning (DL)-based approaches. In this book chapter, we first review some recent DL-based approaches in the field of PTM site prediction. In addition, we also review the recent advances in the not-so-studied PTM , that is, proteolytic cleavage predictions. We describe advances in PTM prediction by highlighting the Deep learning architecture, feature encoding, novelty of the approaches, and availability of the tools/approaches. Finally, we provide an outlook and possible future research directions for DL-based approaches for PTM prediction.
Collapse
|
24
|
Matrikines as mediators of tissue remodelling. Adv Drug Deliv Rev 2022; 185:114240. [PMID: 35378216 DOI: 10.1016/j.addr.2022.114240] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 02/21/2022] [Accepted: 03/26/2022] [Indexed: 11/21/2022]
Abstract
Extracellular matrix (ECM) proteins confer biomechanical properties, maintain cell phenotype and mediate tissue repair (via release of sequestered cytokines and proteases). In contrast to intracellular proteomes, where proteins are monitored and replaced over short time periods, many ECM proteins function for years (decades in humans) without replacement. The longevity of abundant ECM proteins, such as collagen I and elastin, leaves them vulnerable to damage accumulation and their host organs prone to chronic, age-related diseases. However, ECM protein fragmentation can potentially produce peptide cytokines (matrikines) which may exacerbate and/or ameliorate age- and disease-related ECM remodelling. In this review, we discuss ECM composition, function and degradation and highlight examples of endogenous matrikines. We then critically and comprehensively analyse published studies of matrix-derived peptides used as topical skin treatments, before considering the potential for improvements in the discovery and delivery of novel matrix-derived peptides to skin and internal organs. From this, we conclude that while the translational impact of matrix-derived peptide therapeutics is evident, the mechanisms of action of these peptides are poorly defined. Further, well-designed, multimodal studies are required.
Collapse
|
25
|
Naseer S, Hussain W, Khan YD, Rasool N. iPhosS(Deep)-PseAAC: Identification of Phosphoserine Sites in Proteins Using Deep Learning on General Pseudo Amino Acid Compositions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1703-1714. [PMID: 33242308 DOI: 10.1109/tcbb.2020.3040747] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Among all the PTMs, the protein phosphorylation is pivotal for various pathological and physiological processes. About 30 percent of eukaryotic proteins undergo the phosphorylation modification, leading to various changes in conformation, function, stability, localization, and so forth. In eukaryotic proteins, phosphorylation occurs on serine (S), Threonine (T) and Tyrosine (Y) residues. Among these all, serine phosphorylation has its own importance as it is associated with various importance biological processes, including energy metabolism, signal transduction pathways, cell cycling, and apoptosis. Thus, its identification is important, however, the in vitro, ex vivo and in vivo identification can be laborious, time-taking and costly. There is a dire need of an efficient and accurate computational model to help researchers and biologists identifying these sites, in an easy manner. Herein, we propose a novel predictor for identification of Phosphoserine sites (PhosS) in proteins, by integrating the Chou's Pseudo Amino Acid Composition (PseAAC) with deep features. We used well-known DNNs for both the tasks of learning a feature representation of peptide sequences and performing classifications. Among different DNNs, the best score is shown by Covolutional Neural Network based model which renders CNN based prediction model the best for Phosphoserine prediction. Based on these results, it is concluded that the proposed model can help to identify PhosS sites in a very efficient and accurate manner which can help scientists understand the mechanism of this modification in proteins.
Collapse
|
26
|
Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties. Int J Mol Sci 2022; 23:ijms23063044. [PMID: 35328461 PMCID: PMC8950657 DOI: 10.3390/ijms23063044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 02/25/2022] [Accepted: 03/09/2022] [Indexed: 12/03/2022] Open
Abstract
Dihydrouridine (D) is an abundant post-transcriptional modification present in transfer RNA from eukaryotes, bacteria, and archaea. D has contributed to treatments for cancerous diseases. Therefore, the precise detection of D modification sites can enable further understanding of its functional roles. Traditional experimental techniques to identify D are laborious and time-consuming. In addition, there are few computational tools for such analysis. In this study, we utilized eleven sequence-derived feature extraction methods and implemented five popular machine algorithms to identify an optimal model. During data preprocessing, data were partitioned for training and testing. Oversampling was also adopted to reduce the effect of the imbalance between positive and negative samples. The best-performing model was obtained through a combination of random forest and nucleotide chemical property modeling. The optimized model presented high sensitivity and specificity values of 0.9688 and 0.9706 in independent tests, respectively. Our proposed model surpassed published tools in independent tests. Furthermore, a series of validations across several aspects was conducted in order to demonstrate the robustness and reliability of our model.
Collapse
|
27
|
Gupta Y, Maciorowski D, Medernach B, Becker DP, Durvasula R, Libertin CR, Kempaiah P. Iron dysregulation in COVID-19 and reciprocal evolution of SARS-CoV-2: Natura nihil frustra facit. J Cell Biochem 2022; 123:601-619. [PMID: 34997606 PMCID: PMC9015563 DOI: 10.1002/jcb.30207] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 12/16/2021] [Indexed: 12/12/2022]
Abstract
After more than a year of the COVID-19 pandemic, SARS-CoV-2 infection rates with newer variants continue to devastate much of the world. Global healthcare systems are overwhelmed with high positive patient numbers. Silent hypoxia accompanied by rapid deterioration and some cases with septic shock is responsible for COVID-19 mortality in many hospitalized patients. There is an urgent need to further understand the relationships and interplay with human host components during pathogenesis and immune evasion strategies. Currently, acquired immunity through vaccination or prior infection usually provides sufficient protection against the emerging variants of SARS-CoV-2 except Omicron variant requiring recent booster. New strains have shown higher viral loads and greater transmissibility with more severe disease presentations. Notably, COVID-19 has a peculiar prognosis in severe patients with iron dysregulation and hypoxia which is still poorly understood. Studies have shown abnormally low serum iron levels in severe infection but a high iron overload in lung fibrotic tissue. Data from our in-silico structural analysis of the spike protein sequence along with host proteolysis processing suggests that the viral spike protein fragment mimics Hepcidin and is resistant to the major human proteases. This functional spike-derived peptide dubbed "Covidin" thus may be intricately involved with host ferroportin binding and internalization leading to dysregulated host iron metabolism. Here, we propose the possible role of this potentially allogenic mimetic hormone corresponding to severe COVID-19 immunopathology and illustrate that this molecular mimicry is responsible for a major pathway associated with severe disease status. Furthermore, through 3D molecular modeling and docking followed by MD simulation validation, we have unraveled the likely role of Covidin in iron dysregulation in COVID-19 patients. Our meta-analysis suggests the Hepcidin mimetic mechanism is highly conserved among its host range as well as among all new variants to date including Omicron. Extensive analysis of current mutations revealed that new variants are becoming alarmingly more resistant to selective human proteases associated with host defense.
Collapse
Affiliation(s)
- Yash Gupta
- Infectious DiseasesMayo ClinicJacksonvilleFloridaUSA
| | - Dawid Maciorowski
- School of Medicine and Public HealthUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
| | - Brian Medernach
- Department of MedicineLoyola University Medical CenterChicagoIllinoisUSA
| | - Daniel P. Becker
- Department of Chemistry and BiochemistryLoyola University ChicagoChicagoIllinoisUSA
| | | | | | | |
Collapse
|
28
|
Shahid M, Ilyas M, Hussain W, Khan YD. ORI-Deep: improving the accuracy for predicting origin of replication sites by using a blend of features and long short-term memory network. Brief Bioinform 2022; 23:6511972. [PMID: 35048955 DOI: 10.1093/bib/bbac001] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 12/30/2021] [Accepted: 01/02/2022] [Indexed: 11/14/2022] Open
Abstract
Replication of DNA is an important process for the cell division cycle, gene expression regulation and other biological evolution processes. It also has a crucial role in a living organism's physical growth and structure. Replication of DNA comprises of three stages known as initiation, elongation and termination, whereas the origin of replication sites (ORI) is the location of initiation of the DNA replication process. There exist various methodologies to identify ORIs in the genomic sequences, however, these methods have used either extensive computations for execution, or have limited optimization for the large datasets. Herein, a model called ORI-Deep is proposed to identify ORIs from the multiple cell type genomic sequence benchmark data. An efficient method is proposed using a deep neural network to identify ORIs for four different eukaryotic species. For better representation of data, a feature vector is constructed using statistical moments for the training and testing of data and is further fed to a long short-term memory (LSTM) network. To prove the effectiveness of the proposed model, we applied several validation techniques at different levels to obtain seven accuracy metrics, and the accuracy score for self-consistency, 10-fold cross-validation, jackknife and the independent set test is observed to be 0.977, 0.948, 0.976 and 0.977, respectively. Based on the results, it can be concluded that ORI-Deep can efficiently predict the sites of origin replication in DNA sequence with high accuracy. Webserver for ORI-Deep is available at (https://share.streamlit.io/waqarhusain/orideep/main/app.py), whereas source code is available at (https://github.com/WaqarHusain/OriDeep).
Collapse
Affiliation(s)
- Mahwish Shahid
- School of Systems and Technologies, University of Management and Technology, Lahore, Pakistan
| | - Maham Ilyas
- University of Management and Technology, Lahore, Pakistan
| | - Waqar Hussain
- University of Management and Technology, Lahore, Pakistan
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
29
|
Dyer RP, Weiss GA. Making the cut with protease engineering. Cell Chem Biol 2021; 29:177-190. [PMID: 34921772 PMCID: PMC9127713 DOI: 10.1016/j.chembiol.2021.12.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 07/30/2021] [Accepted: 11/29/2021] [Indexed: 12/30/2022]
Abstract
Proteases cut with enviable precision and regulate diverse molecular events in biology. Such qualities drive a seemingly inexhaustible appetite for proteases with new activities and capabilities. Comprising 25% of the total industrial enzyme market, proteases appear in consumer goods, such as detergents, textile processing, and numerous foods; additionally, proteases include 25 US Food and Drug Administration-approved medicines and various research tools. Recent advances in protease engineering strategies address target specificity, catalytic efficiency, and stability. This guide to protease engineering surveys best practices and emerging strategies. We further highlight gaps and flexibilities inherent to each system that suggest opportunities for new technology development along with engineered proteases to solve challenges in proteomics, protein sequencing, and synthetic gene circuits.
Collapse
Affiliation(s)
- Rebekah P Dyer
- Department of Molecular Biology and Biochemistry, University of California, Irvine, 1102 NS-2, Irvine, CA 92697-2025, USA
| | - Gregory A Weiss
- Department of Chemistry, University of California, Irvine, 1102 NS-2, Irvine, CA 92697-2025, USA; Department of Molecular Biology and Biochemistry, University of California, Irvine, 1102 NS-2, Irvine, CA 92697-2025, USA; Department of Pharmaceutical Sciences, University of California, Irvine, 1102 NS-2, Irvine, CA 92697-2025, USA.
| |
Collapse
|
30
|
Fu T, Li F, Zhang Y, Yin J, Qiu W, Li X, Liu X, Xin W, Wang C, Yu L, Gao J, Zheng Q, Zeng S, Zhu F. VARIDT 2.0: structural variability of drug transporter. Nucleic Acids Res 2021; 50:D1417-D1431. [PMID: 34747471 PMCID: PMC8728241 DOI: 10.1093/nar/gkab1013] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/08/2021] [Accepted: 11/04/2021] [Indexed: 12/20/2022] Open
Abstract
The structural variability data of drug transporter (DT) are key for research on precision medicine and rational drug use. However, these valuable data are not sufficiently covered by the available databases. In this study, a major update of VARIDT (a database previously constructed to provide DTs' variability data) was thus described. First, the experimentally resolved structures of all DTs reported in the original VARIDT were discovered from PubMed and Protein Data Bank. Second, the structural variability data of each DT were collected by literature review, which included: (a) mutation-induced spatial variations in folded state, (b) difference among DT structures of human and model organisms, (c) outward/inward-facing DT conformations and (d) xenobiotics-driven alterations in the 3D complexes. Third, for those DTs without experimentally resolved structural variabilities, homology modeling was further applied as well-established protocol to enrich such valuable data. As a result, 145 mutation-induced spatial variations of 42 DTs, 1622 inter-species structures originating from 292 DTs, 118 outward/inward-facing conformations belonging to 59 DTs, and 822 xenobiotics-regulated structures in complex with 57 DTs were updated to VARIDT (https://idrblab.org/varidt/ and http://varidt.idrblab.net/). All in all, the newly collected structural variabilities will be indispensable for explaining drug sensitivity/selectivity, bridging preclinical research with clinical trial, revealing the mechanism underlying drug-drug interaction, and so on.
Collapse
Affiliation(s)
- Tingting Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Institute of Theoretical Chemistry, College of Chemistry, Jilin University, Changchun 130023, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yang Zhang
- Department of Pharmacology, Hebei Medical University, Shijiazhuang 050017, China
| | - Jiayi Yin
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Wenqi Qiu
- Department of Surgery, HKU-SZH & Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Xuedong Li
- Department of Pharmacology, Hebei Medical University, Shijiazhuang 050017, China
| | - Xingang Liu
- Department of Pharmacology, Hebei Medical University, Shijiazhuang 050017, China
| | - Wenwen Xin
- Department of Pharmacology, Hebei Medical University, Shijiazhuang 050017, China
| | - Chengzhao Wang
- Department of Pharmacology, Hebei Medical University, Shijiazhuang 050017, China
| | - Lushan Yu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jianqing Gao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China
| | - Qingchuan Zheng
- Institute of Theoretical Chemistry, College of Chemistry, Jilin University, Changchun 130023, China
| | - Su Zeng
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
31
|
Zhao YW, Zhang S, Ding H. Recent development of machine learning methods in sumoylation sites prediction. Curr Med Chem 2021; 29:894-907. [PMID: 34525906 DOI: 10.2174/0929867328666210915112030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 07/24/2021] [Accepted: 08/07/2021] [Indexed: 11/22/2022]
Abstract
Sumoylation of proteins is an important reversible post-translational modification of proteins and mediates a variety of cellular processes. Sumo-modified proteins can change their subcellular localization, activity and stability. In addition, it also plays an important role in various cellular processes such as transcriptional regulation and signal transduction. The abnormal sumoylation is involved in many diseases, including neurodegeneration and immune-related diseases, as well as the development of cancer. Therefore, identification of the sumoylation site (SUMO site) is fundamental to understanding their molecular mechanisms and regulatory roles. In contrast to labor-intensive and costly experimental approaches, computational prediction of sumoylation sites in silico also attracted much attention for its accuracy, convenience and speed. At present, many computational prediction models have been used to identify SUMO sites, but these contents have not been comprehensively summarized and reviewed. Therefore, the research progress of relevant models is summarized and discussed in this paper. We will briefly summarize the development of bioinformatics methods on sumoylation site prediction. We will mainly focus on the benchmark dataset construction, feature extraction, machine learning method, published results and online tools. We hope the review will provide more help for wet-experimental scholars.
Collapse
Affiliation(s)
- Yi-Wei Zhao
- School of Medicine, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Shihua Zhang
- College of Life Science and Health, Wuhan University of Science and Technology, Wuhan 430065. China
| | - Hui Ding
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| |
Collapse
|
32
|
Melo MCR, Maasch JRMA, de la Fuente-Nunez C. Accelerating antibiotic discovery through artificial intelligence. Commun Biol 2021; 4:1050. [PMID: 34504303 PMCID: PMC8429579 DOI: 10.1038/s42003-021-02586-0] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 07/16/2021] [Indexed: 02/07/2023] Open
Abstract
By targeting invasive organisms, antibiotics insert themselves into the ancient struggle of the host-pathogen evolutionary arms race. As pathogens evolve tactics for evading antibiotics, therapies decline in efficacy and must be replaced, distinguishing antibiotics from most other forms of drug development. Together with a slow and expensive antibiotic development pipeline, the proliferation of drug-resistant pathogens drives urgent interest in computational methods that promise to expedite candidate discovery. Strides in artificial intelligence (AI) have encouraged its application to multiple dimensions of computer-aided drug design, with increasing application to antibiotic discovery. This review describes AI-facilitated advances in the discovery of both small molecule antibiotics and antimicrobial peptides. Beyond the essential prediction of antimicrobial activity, emphasis is also given to antimicrobial compound representation, determination of drug-likeness traits, antimicrobial resistance, and de novo molecular design. Given the urgency of the antimicrobial resistance crisis, we analyze uptake of open science best practices in AI-driven antibiotic discovery and argue for openness and reproducibility as a means of accelerating preclinical research. Finally, trends in the literature and areas for future inquiry are discussed, as artificially intelligent enhancements to drug discovery at large offer many opportunities for future applications in antibiotic development.
Collapse
Affiliation(s)
- Marcelo C R Melo
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Jacqueline R M A Maasch
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA.
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
33
|
Akmal MA, Hussain W, Rasool N, Khan YD, Khan SA, Chou KC. Using CHOU'S 5-Steps Rule to Predict O-Linked Serine Glycosylation Sites by Blending Position Relative Features and Statistical Moment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2045-2056. [PMID: 31985438 DOI: 10.1109/tcbb.2020.2968441] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Glycosylation of proteins in eukaryote cells is an important and complicated post-translation modification due to its pivotal role and association with crucial physiological functions within most of the proteins. Identification of glycosylation sites in a polypeptide chain is not an easy task due to multiple impediments. Analytical identification of these sites is expensive and laborious. There is a dire need to develop a reliable computational method for precise determination of such sites which can help researchers to save time and effort. Herein, we propose a novel predictor namely iGlycoS-PseAAC by integrating the Chou's Pseudo Amino Acid Composition (PseAAC) and relative/absolute position-based features. The self-consistency results show that the accuracy revealed by the model using the benchmark dataset for prediction of O-linked glycosylation having serine sites is 98.8 percent. The overall accuracy of predictor achieved through 10-fold cross validation by combining the positive and negative results is 97.2 percent. The overall accuracy achieved through Jackknife test is 96.195 percent by aggregating of all the prediction results. Thus the proposed predictor can help in predicting the O-linked glycosylated serine sites in an efficient and accurate way. The overall results show that the accuracy of the iGlycoS-PseAAC is higher than the existing tools.
Collapse
|
34
|
Yan K, Wen J, Liu JX, Xu Y, Liu B. Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity Scores. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2008-2016. [PMID: 31940548 DOI: 10.1109/tcbb.2020.2966450] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein fold recognition is one of the most essential steps for protein structure prediction, aiming to classify proteins into known protein folds. There are two main computational approaches: one is the template-based method based on the alignment scores between query-template protein pairs and the other is the machine learning method based on the feature representation and classifier. These two approaches have their own advantages and disadvantages. Can we combine these methods to establish more accurate predictors for protein fold recognition? In this study, we made an initial attempt and proposed two novel algorithms: TSVM-fold and ESVM-fold. TSVM-fold was based on the Support Vector Machines (SVMs), which utilizes a set of pairwise sequence similarity scores generated by three complementary template-based methods, including HHblits, SPARKS-X, and DeepFR. These scores measured the global relationships between query sequences and templates. The comprehensive features of the attributes of the sequences were fed into the SVMs for the prediction. Then the TSVM-fold was further combined with the HHblits algorithm so as to improve its generalization ability. The combined method is called ESVM-fold. Experimental results in two rigorous benchmark datasets (LE and YK datasets) showed that the proposed methods outperform some state-of-the-art methods, indicating that the TSVM-fold and ESVM-fold are efficient predictors for protein fold recognition.
Collapse
|
35
|
Jia C, Zhang M, Fan C, Li F, Song J. Formator: Predicting Lysine Formylation Sites Based on the Most Distant Undersampling and Safe-Level Synthetic Minority Oversampling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1937-1945. [PMID: 31804942 DOI: 10.1109/tcbb.2019.2957758] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Lysine formylation is a reversible type of protein post-translational modification and has been found to be involved in a myriad of biological processes, including modulation of chromatin conformation and gene expression in histones and other nuclear proteins. Accurate identification of lysine formylation sites is essential for elucidating the underlying molecular mechanisms of formylation. Traditional experimental methods are time-consuming and expensive. As such, it is desirable and necessary to develop computational methods for accurate prediction of formylation sites. In this study, we propose a novel predictor, termed Formator, for identifying lysine formylation sites from sequences information. Formator is developed using the ensemble learning (EL) strategy based on four individual support vector machine classifiers via a voting system. Moreover, the most distant undersampling and Safe-Level-SMOTE oversampling techniques were integrated to deal with the data imbalance problem of the training dataset. Four effective feature extraction methods, namely bi-profile Bayes (BPB), k-nearest neighbor (KNN), amino acid physicochemical properties (AAindex), and composition and transition (CTD) were employed to encode the surrounding sequence features of potential formylation sites. Extensive empirical studies show that Formator achieved the accuracy of 87.24 and 74.96 percent on jackknife test and the independent test, respectively. Performance comparison results on the independent test indicate that Formator outperforms current existing prediction tool, LFPred, suggesting that it has a great potential to serve as a useful tool in identifying novel lysine formylation sites and facilitating hypothesis-driven experimental efforts.
Collapse
|
36
|
Perpetuo L, Klein J, Ferreira R, Guedes S, Amado F, Leite-Moreira A, Silva AMS, Thongboonkerd V, Vitorino R. How can artificial intelligence be used for peptidomics? Expert Rev Proteomics 2021; 18:527-556. [PMID: 34343059 DOI: 10.1080/14789450.2021.1962303] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
INTRODUCTION Peptidomics is an emerging field of omics sciences using advanced isolation, analysis, and computational techniques that enable qualitative and quantitative analyses of various peptides in biological samples. Peptides can act as useful biomarkers and as therapeutic molecules for diseases. AREAS COVERED The use of therapeutic peptides can be predicted quickly and efficiently using data-driven computational methods, particularly artificial intelligence (AI) approach. Various AI approaches are useful for peptide-based drug discovery, such as support vector machine, random forest, extremely randomized trees, and other more recently developed deep learning methods. AI methods are relatively new to the development of peptide-based therapies, but these techniques already become essential tools in protein science by dissecting novel therapeutic peptides and their functions (Figure 1).[Figure: see text]. EXPERT OPINION Researchers have shown that AI models can facilitate the development of peptidomics and selective peptide therapies in the field of peptide science. Biopeptide prediction is important for the discovery and development of successful peptide-based drugs. Due to their ability to predict therapeutic roles based on sequence details, many AI-dependent prediction tools have been developed (Figure 1).
Collapse
Affiliation(s)
- Luís Perpetuo
- iBiMED, Department of Medical Sciences, University of Aveiro, Aveiro
| | - Julie Klein
- Institut National de la Santé et de la Recherche Médicale (INSERM), U1297, Institute of Cardiovascular and Metabolic Disease, Université Toulouse III, Toulouse, France
| | - Rita Ferreira
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro
| | - Sofia Guedes
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro
| | - Francisco Amado
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro
| | - Adelino Leite-Moreira
- UnIC, Departamento de Cirurgia e Fisiologia, Faculdade de Medicina da Universidade do Porto, Porto
| | - Artur M S Silva
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro
| | - Visith Thongboonkerd
- Medical Proteomics Unit, Office for Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Rui Vitorino
- iBiMED, Department of Medical Sciences, University of Aveiro, Aveiro.,LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro.,UnIC, Departamento de Cirurgia e Fisiologia, Faculdade de Medicina da Universidade do Porto, Porto
| |
Collapse
|
37
|
He S, Kong L, Chen J. iDNA6mA-Rice-DL: A local web server for identifying DNA N6-methyladenine sites in rice genome by deep learning method. J Bioinform Comput Biol 2021; 19:2150019. [PMID: 34291710 DOI: 10.1142/s0219720021500190] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Accurate detection of N6-methyladenine (6mA) sites by biochemical experiments will help to reveal their biological functions, still, these wet experiments are laborious and expensive. Therefore, it is necessary to introduce a powerful computational model to identify the 6mA sites on a genomic scale, especially for plant genomes. In view of this, we proposed a model called iDNA6mA-Rice-DL for the effective identification of 6mA sites in rice genome, which is an intelligent computing model based on deep learning method. Traditional machine learning methods assume the preparation of the features for analysis. However, our proposed model automatically encodes and extracts key DNA features through an embedded layer and several groups of dense layers. We use an independent dataset to evaluate the generalization ability of our model. An area under the receiver operating characteristic curve (auROC) of 0.98 with an accuracy of 95.96% was obtained. The experiment results demonstrate that our model had good performance in predicting 6mA sites in the rice genome. A user-friendly local web server has been established. The Docker image of the local web server can be freely downloaded at https://hub.docker.com/r/his1server/idna6ma-rice-dl.
Collapse
Affiliation(s)
- Shiqian He
- School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao 066000, P. R. China
| | - Liang Kong
- School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao 066000, P. R. China
| | - Jing Chen
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066000, P. R. China
| |
Collapse
|
38
|
Vandooren J, Pereira RVS, Ugarte-Berzal E, Rybakin V, Noppen S, Stas MR, Bernaerts E, Ganseman E, Metzemaekers M, Schols D, Proost P, Opdenakker G. Internal Disulfide Bonding and Glycosylation of Interleukin-7 Protect Against Proteolytic Inactivation by Neutrophil Metalloproteinases and Serine Proteases. Front Immunol 2021; 12:701739. [PMID: 34276694 PMCID: PMC8278288 DOI: 10.3389/fimmu.2021.701739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 06/14/2021] [Indexed: 11/13/2022] Open
Abstract
Interleukin 7 (IL-7) is a cell growth factor with a central role in normal T cell development, survival and differentiation. The lack of IL-7–IL-7 receptor(R)-mediated signaling compromises lymphoid development, whereas increased signaling activity contributes to the development of chronic inflammation, cancer and autoimmunity. Gain-of-function alterations of the IL-7R and the signaling through Janus kinases (JAKs) and signal transducers and activators of transcription (STATs) are enriched in T cell acute lymphoblastic leukemia (T-ALL) and autocrine production of IL-7 by T-ALL cells is involved in the phenotypes of leukemic initiation and oncogenic spreading. Several IL-7-associated pathologies are also characterized by increased presence of matrix metalloproteinase-9 (MMP-9), due to neutrophil degranulation and its regulated production by other cell types. Since proteases secreted by neutrophils are known to modulate the activity of many cytokines, we investigated the interactions between IL-7, MMP-9 and several other neutrophil-derived proteases. We demonstrated that MMP-9 efficiently cleaved human IL-7 in the exposed loop between the α-helices C and D and that this process is delayed by IL-7 N-linked glycosylation. Functionally, the proteolytic cleavage of IL-7 did not influence IL-7Rα binding and internalization nor the direct pro-proliferative effects of IL-7 on a T-ALL cell line (HPB-ALL) or in primary CD8+ human peripheral blood mononuclear cells. A comparable effect was observed for the neutrophil serine proteases neutrophil elastase, proteinase 3 and combinations of neutrophil proteases. Hence, glycosylation and disulfide bonding as two posttranslational modifications influence IL-7 bioavailability in the human species: glycosylation protects against proteolysis, whereas internal cysteine bridging under physiological redox state keeps the IL-7 conformations as active proteoforms. Finally, we showed that mouse IL-7 does not contain the protease-sensitive loop and, consequently, was not cleaved by MMP-9. With the latter finding we discovered differences in IL-7 biology between the human and mouse species.
Collapse
Affiliation(s)
- Jennifer Vandooren
- Laboratory of Immunobiology, Rega Institute for Medical Research/KU Leuven, Department of Microbiology, Immunology and Transplantation, Leuven, Belgium
| | - Rafaela Vaz Sousa Pereira
- Laboratory of Immunobiology, Rega Institute for Medical Research/KU Leuven, Department of Microbiology, Immunology and Transplantation, Leuven, Belgium
| | - Estefania Ugarte-Berzal
- Laboratory of Immunobiology, Rega Institute for Medical Research/KU Leuven, Department of Microbiology, Immunology and Transplantation, Leuven, Belgium
| | - Vasily Rybakin
- Laboratory of Immunobiology, Rega Institute for Medical Research/KU Leuven, Department of Microbiology, Immunology and Transplantation, Leuven, Belgium
| | - Sam Noppen
- Laboratory of Virology and Chemotherapy, Rega Institute for Medical Research/KU Leuven, Department of Microbiology, Immunology and Transplantation, Leuven, Belgium
| | - Melissa R Stas
- Laboratory of Immunobiology, Rega Institute for Medical Research/KU Leuven, Department of Microbiology, Immunology and Transplantation, Leuven, Belgium
| | - Eline Bernaerts
- Laboratory of Immunobiology, Rega Institute for Medical Research/KU Leuven, Department of Microbiology, Immunology and Transplantation, Leuven, Belgium
| | - Eva Ganseman
- Laboratory of Molecular Immunology, Rega Institute for Medical Research/KU Leuven, Department of Microbiology, Immunology and Transplantation, Leuven, Belgium
| | - Mieke Metzemaekers
- Laboratory of Molecular Immunology, Rega Institute for Medical Research/KU Leuven, Department of Microbiology, Immunology and Transplantation, Leuven, Belgium
| | - Dominique Schols
- Laboratory of Virology and Chemotherapy, Rega Institute for Medical Research/KU Leuven, Department of Microbiology, Immunology and Transplantation, Leuven, Belgium
| | - Paul Proost
- Laboratory of Molecular Immunology, Rega Institute for Medical Research/KU Leuven, Department of Microbiology, Immunology and Transplantation, Leuven, Belgium
| | - Ghislain Opdenakker
- Laboratory of Immunobiology, Rega Institute for Medical Research/KU Leuven, Department of Microbiology, Immunology and Transplantation, Leuven, Belgium
| |
Collapse
|
39
|
Lai X, Tang J, ElSayed MEH. Recent advances in proteolytic stability for peptide, protein, and antibody drug discovery. Expert Opin Drug Discov 2021; 16:1467-1482. [PMID: 34187273 DOI: 10.1080/17460441.2021.1942837] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Introduction: To discover and develop a peptide, protein, or antibody into a drug requires overcoming multiple challenges to obtain desired properties. Proteolytic stability is one of the challenges and deserves a focused investigation.Areas covered: This review concentrates on improving proteolytic stability by engineering the amino acids around the cleavage sites of a liable peptide, protein, or antibody. Peptidases are discussed on three levels including all peptidases in databases, mixtures based on organ and tissue types, and individual peptidases. The technique to identify cleavage sites is spotlighted on mass spectrometry-based approaches such as MALDI-TOF and LC-MS. For sequence engineering, the replacements that have been commonly applied with a higher chance of success are highlighted at the beginning, while the rarely used and more complicated replacements are discussed later. Although a one-size-fits-all approach does not exist to apply to different projects, this review provides a 3-step strategy for effectively and efficiently conducting the proteolytic stability experiments to achieve the eventual goal of improving the stability by engineering the molecule itself.Expert opinion: Improving the proteolytic stability is a spiraling up process sequenced by testing and engineering. There are many ways to engineer amino acids, but the choice must consider the cost and properties affected by the changes of the amino acids.
Collapse
Affiliation(s)
- Xianyin Lai
- Biotechnology Discovery Research, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, IN, USA
| | - Jason Tang
- Biotechnology Discovery Research, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, IN, USA
| | - Mohamed E H ElSayed
- Biotechnology Discovery Research, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, IN, USA
| |
Collapse
|
40
|
Li Q. Structure, Application, and Biochemistry of Microbial Keratinases. Front Microbiol 2021; 12:674345. [PMID: 34248885 PMCID: PMC8260994 DOI: 10.3389/fmicb.2021.674345] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 05/17/2021] [Indexed: 12/17/2022] Open
Abstract
Keratinases belong to a class of proteases that are able to degrade keratins into amino acids. Microbial keratinases play important roles in turning keratin-containing wastes into value-added products by participating in the degradation of keratin. Keratin is found in human and animal hard tissues, and its complicated structures make it resistant to degradation by common proteases. Although breaking disulfide bonds are involved in keratin degradation, keratinase is responsible for the cleavage of peptides, making it attractive in pharmaceutical and feather industries. Keratinase can serve as an important tool to convert keratin-rich wastes such as feathers from poultry industry into diverse products applicable to many fields. Despite of some progress made in isolating keratinase-producing microorganisms, structural studies of keratinases, and biochemical characterization of these enzymes, effort is still required to expand the biotechnological application of keratinase in diverse fields by identifying more keratinases, understanding the mechanism of action and constructing more active enzymes through molecular biology and protein engineering. Herein, this review covers structures, applications, biochemistry of microbial keratinases, and strategies to improve its efficiency in keratin degradation.
Collapse
Affiliation(s)
- Qingxin Li
- Guangdong Provincial Engineering Laboratory of Biomass High Value Utilization, Institute of Bioengineering, Guangdong Academy of Sciences, Guangzhou, China
| |
Collapse
|
41
|
Malebary SJ, Khan YD. Evaluating machine learning methodologies for identification of cancer driver genes. Sci Rep 2021; 11:12281. [PMID: 34112883 PMCID: PMC8192921 DOI: 10.1038/s41598-021-91656-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 05/19/2021] [Indexed: 02/06/2023] Open
Abstract
Cancer is driven by distinctive sorts of changes and basic variations in genes. Recognizing cancer driver genes is basic for accurate oncological analysis. Numerous methodologies to distinguish and identify drivers presently exist, but efficient tools to combine and optimize them on huge datasets are few. Most strategies for prioritizing transformations depend basically on frequency-based criteria. Strategies are required to dependably prioritize organically dynamic driver changes over inert passengers in high-throughput sequencing cancer information sets. This study proposes a model namely PCDG-Pred which works as a utility capable of distinguishing cancer driver and passenger attributes of genes based on sequencing data. Keeping in view the significance of the cancer driver genes an efficient method is proposed to identify the cancer driver genes. Further, various validation techniques are applied at different levels to establish the effectiveness of the model and to obtain metrics like accuracy, Mathew's correlation coefficient, sensitivity, and specificity. The results of the study strongly indicate that the proposed strategy provides a fundamental functional advantage over other existing strategies for cancer driver genes identification. Subsequently, careful experiments exhibit that the accuracy metrics obtained for self-consistency, independent set, and cross-validation tests are 91.08%., 87.26%, and 92.48% respectively.
Collapse
Affiliation(s)
- Sharaf J Malebary
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, P.O. Box 344, Rabigh, 21911, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan.
| |
Collapse
|
42
|
Naseer S, Hussain W, Khan YD, Rasool N. NPalmitoylDeep-PseAAC: A Predictor of N-Palmitoylation Sites in Proteins Using Deep Representations of Proteins and PseAAC via Modified 5-Steps Rule. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200605142828] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Among all the major Post-translational modification, lipid modifications
possess special significance due to their widespread functional importance in eukaryotic cells. There
exist multiple types of lipid modifications and Palmitoylation, among them, is one of the broader
types of modification, having three different types. The N-Palmitoylation is carried out by
attachment of palmitic acid to an N-terminal cysteine. Due to the association of N-Palmitoylation
with various biological functions and diseases such as Alzheimer’s and other neurodegenerative
diseases, its identification is very important.
Objective:
The in vitro, ex vivo and in vivo identification of Palmitoylation is laborious, time-taking
and costly. There is a dire need for an efficient and accurate computational model to help researchers
and biologists identify these sites, in an easy manner. Herein, we propose a novel prediction model
for the identification of N-Palmitoylation sites in proteins.
Method:
The proposed prediction model is developed by combining the Chou’s Pseudo Amino
Acid Composition (PseAAC) with deep neural networks. We used well-known deep neural
networks (DNNs) for both the tasks of learning a feature representation of peptide sequences and
developing a prediction model to perform classification.
Results:
Among different DNNs, Gated Recurrent Unit (GRU) based RNN model showed the
highest scores in terms of accuracy, and all other computed measures, and outperforms all the
previously reported predictors.
Conclusion:
The proposed GRU based RNN model can help to identify N-Palmitoylation in a very
efficient and accurate manner which can help scientists understand the mechanism of this
modification in proteins.
Collapse
Affiliation(s)
- Sheraz Naseer
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, P.O. Box 10033, C-II, Johar Town, Lahore 54770, Pakistan
| | - Waqar Hussain
- National Center of Artificial Intelligence, Punjab University College of Information Technology, University of the Punjab, Lahore, Pakistan
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, P.O. Box 10033, C-II, Johar Town, Lahore 54770, Pakistan
| | - Nouman Rasool
- Dr Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, 75270, Pakistan
| |
Collapse
|
43
|
Liu S, Tang H, Liu H, Wang J. Multi-label Learning for the Diagnosis of Cancer and Identification of Novel Biomarkers with High-throughput Omics. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200623130416] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
The advancement of bioinformatics and machine learning has facilitated the
diagnosis of cancer and the discovery of omics-based biomarkers.
Objective:
Our study employed a novel data-driven approach to classifying the normal samples and
different types of gastrointestinal cancer samples, to find potential biomarkers for effective diagnosis
and prognosis assessment of gastrointestinal cancer patients.
Methods:
Different feature selection methods were used, and the diagnostic performance of the proposed
biosignatures was benchmarked using support vector machine (SVM) and random forest (RF)
models.
Results:
All models showed satisfactory performance in which Multilabel-RF appeared to be the best.
The accuracy of the Multilabel-RF based model was 83.12%, with precision, recall, F1, and Hamming-
Loss of 79.70%, 68.31%, 0.7357 and 0.1688, respectively. Moreover, proposed biomarker signatures
were highly associated with multifaceted hallmarks in cancer. Functional enrichment analysis and impact
of the biomarker candidates in the prognosis of the patients were also examined.
Conclusion:
We successfully introduced a solid workflow based on multi-label learning with High-
Throughput Omics for diagnosis of cancer and identification of novel biomarkers. Novel transcriptome
biosignatures that may improve the diagnostic accuracy in gastrointestinal cancer are introduced for
further validations in various clinical settings.
Collapse
Affiliation(s)
- Shicai Liu
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Hailin Tang
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Hongde Liu
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Jinke Wang
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| |
Collapse
|
44
|
iAmideV-Deep: Valine Amidation Site Prediction in Proteins Using Deep Learning and Pseudo Amino Acid Compositions. Symmetry (Basel) 2021. [DOI: 10.3390/sym13040560] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Amidation is an important post translational modification where a peptide ends with an amide group (–NH2) rather than carboxyl group (–COOH). These amidated peptides are less sensitive to proteolytic degradation with extended half-life in the bloodstream. Amides are used in different industries like pharmaceuticals, natural products, and biologically active compounds. The in-vivo, ex-vivo, and in-vitro identification of amidation sites is a costly and time-consuming but important task to study the physiochemical properties of amidated peptides. A less costly and efficient alternative is to supplement wet lab experiments with accurate computational models. Hence, an urgent need exists for efficient and accurate computational models to easily identify amidated sites in peptides. In this study, we present a new predictor, based on deep neural networks (DNN) and Pseudo Amino Acid Compositions (PseAAC), to learn efficient, task-specific, and effective representations for valine amidation site identification. Well-known DNN architectures are used in this contribution to learn peptide sequence representations and classify peptide chains. Of all the different DNN based predictors developed in this study, Convolutional neural network-based model showed the best performance surpassing all other DNN based models and reported literature contributions. The proposed model will supplement in-vivo methods and help scientists to determine valine amidation very efficiently and accurately, which in turn will enhance understanding of the valine amidation in different biological processes.
Collapse
|
45
|
Li Z, Hu L, Tang Z, Zhao C. Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning. Front Genet 2021; 12:658078. [PMID: 33868387 PMCID: PMC8044780 DOI: 10.3389/fgene.2021.658078] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 03/08/2021] [Indexed: 11/13/2022] Open
Abstract
Understanding the substrate specificity of HIV-1 protease plays an essential role in the prevention of HIV infection. A variety of computational models have thus been developed to predict substrate sites that are cleaved by HIV-1 protease, but most of them normally follow a supervised learning scheme to build classifiers by considering experimentally verified cleavable sites as positive samples and unknown sites as negative samples. However, certain noisy can be contained in the negative set, as false negative samples are possibly existed. Hence, the performance of the classifiers is not as accurate as they could be due to the biased prediction results. In this work, unknown substrate sites are regarded as unlabeled samples instead of negative ones. We propose a novel positive-unlabeled learning algorithm, namely PU-HIV, for an effective prediction of HIV-1 protease cleavage sites. Features used by PU-HIV are encoded from different perspectives of substrate sequences, including amino acid identities, coevolutionary patterns and chemical properties. By adjusting the weights of errors generated by positive and unlabeled samples, a biased support vector machine classifier can be built to complete the prediction task. In comparison with state-of-the-art prediction models, benchmarking experiments using cross-validation and independent tests demonstrated the superior performance of PU-HIV in terms of AUC, PR-AUC, and F-measure. Thus, with PU-HIV, it is possible to identify previously unknown, but physiologically existed substrate sites that are able to be cleaved by HIV-1 protease, thus providing valuable insights into designing novel HIV-1 protease inhibitors for HIV treatment.
Collapse
Affiliation(s)
- Zhenfeng Li
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Zehai Tang
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Cheng Zhao
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| |
Collapse
|
46
|
Ozols M, Eckersley A, Platt CI, Stewart-McGuinness C, Hibbert SA, Revote J, Li F, Griffiths CEM, Watson REB, Song J, Bell M, Sherratt MJ. Predicting Proteolysis in Complex Proteomes Using Deep Learning. Int J Mol Sci 2021; 22:3071. [PMID: 33803033 PMCID: PMC8002881 DOI: 10.3390/ijms22063071] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 03/10/2021] [Accepted: 03/12/2021] [Indexed: 12/27/2022] Open
Abstract
Both protease- and reactive oxygen species (ROS)-mediated proteolysis are thought to be key effectors of tissue remodeling. We have previously shown that comparison of amino acid composition can predict the differential susceptibilities of proteins to photo-oxidation. However, predicting protein susceptibility to endogenous proteases remains challenging. Here, we aim to develop bioinformatics tools to (i) predict cleavage site locations (and hence putative protein susceptibilities) and (ii) compare the predicted vulnerabilities of skin proteins to protease- and ROS-mediated proteolysis. The first goal of this study was to experimentally evaluate the ability of existing protease cleavage site prediction models (PROSPER and DeepCleave) to identify experimentally determined MMP9 cleavage sites in two purified proteins and in a complex human dermal fibroblast-derived extracellular matrix (ECM) proteome. We subsequently developed deep bidirectional recurrent neural network (BRNN) models to predict cleavage sites for 14 tissue proteases. The predictions of the new models were tested against experimental datasets and combined with amino acid composition analysis (to predict ultraviolet radiation (UVR)/ROS susceptibility) in a new web app: the Manchester proteome susceptibility calculator (MPSC). The BRNN models performed better in predicting cleavage sites in native dermal ECM proteins than existing models (DeepCleave and PROSPER), and application of MPSC to the skin proteome suggests that: compared with the elastic fiber network, fibrillar collagens may be susceptible primarily to protease-mediated proteolysis. We also identify additional putative targets of oxidative damage (dermatopontin, fibulins and defensins) and protease action (laminins and nidogen). MPSC has the potential to identify potential targets of proteolysis in disparate tissues and disease states.
Collapse
Affiliation(s)
- Matiss Ozols
- Division of Cell Matrix Biology & Regenerative Medicine, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, Manchester M13 9PT, UK; (A.E.); (C.I.P.); (C.S.-M.); (S.A.H.)
| | - Alexander Eckersley
- Division of Cell Matrix Biology & Regenerative Medicine, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, Manchester M13 9PT, UK; (A.E.); (C.I.P.); (C.S.-M.); (S.A.H.)
| | - Christopher I. Platt
- Division of Cell Matrix Biology & Regenerative Medicine, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, Manchester M13 9PT, UK; (A.E.); (C.I.P.); (C.S.-M.); (S.A.H.)
| | - Callum Stewart-McGuinness
- Division of Cell Matrix Biology & Regenerative Medicine, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, Manchester M13 9PT, UK; (A.E.); (C.I.P.); (C.S.-M.); (S.A.H.)
| | - Sarah A. Hibbert
- Division of Cell Matrix Biology & Regenerative Medicine, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, Manchester M13 9PT, UK; (A.E.); (C.I.P.); (C.S.-M.); (S.A.H.)
| | - Jerico Revote
- Monash Bioinformatics Platform, Monash University, Melbourne, VIC 3800, Australia;
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia;
| | - Fuyi Li
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC 3800, Australia;
| | - Christopher E. M. Griffiths
- Centre for Dermatology Research, Faculty of Biology, Medicine and Health, and Salford Royal NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester M13 9PT, UK; (C.E.M.G.); (R.E.B.W.)
- NIHR Manchester Biomedical Research Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester M13 9WL, UK
| | - Rachel E. B. Watson
- Centre for Dermatology Research, Faculty of Biology, Medicine and Health, and Salford Royal NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester M13 9PT, UK; (C.E.M.G.); (R.E.B.W.)
- NIHR Manchester Biomedical Research Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester M13 9WL, UK
| | - Jiangning Song
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia;
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Mike Bell
- Research and Development, Walgreens Boots Alliance, Thane Road, Nottingham NG90 1BS, UK;
| | - Michael J. Sherratt
- Division of Cell Matrix Biology & Regenerative Medicine, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, Manchester M13 9PT, UK; (A.E.); (C.I.P.); (C.S.-M.); (S.A.H.)
| |
Collapse
|
47
|
Mapping specificity, cleavage entropy, allosteric changes and substrates of blood proteases in a high-throughput screen. Nat Commun 2021; 12:1693. [PMID: 33727531 PMCID: PMC7966775 DOI: 10.1038/s41467-021-21754-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Accepted: 02/10/2021] [Indexed: 02/06/2023] Open
Abstract
Proteases are among the largest protein families and critical regulators of biochemical processes like apoptosis and blood coagulation. Knowledge of proteases has been expanded by the development of proteomic approaches, however, technology for multiplexed screening of proteases within native environments is currently lacking behind. Here we introduce a simple method to profile protease activity based on isolation of protease products from native lysates using a 96FASP filter, their analysis in a mass spectrometer and a custom data analysis pipeline. The method is significantly faster, cheaper, technically less demanding, easy to multiplex and produces accurate protease fingerprints. Using the blood cascade proteases as a case study, we obtain protease substrate profiles that can be used to map specificity, cleavage entropy and allosteric effects and to design protease probes. The data further show that protease substrate predictions enable the selection of potential physiological substrates for targeted validation in biochemical assays. Characterizing proteases in their native environment is still challenging. Here, the authors develop a proteomics workflow for analyzing protease-specific peptides from cell lysates in 96-well format, providing mechanistic insights into blood proteases and enabling the prediction of protease substrates.
Collapse
|
48
|
Awais M, Hussain W, Khan YD, Rasool N, Khan SA, Chou KC. iPhosH-PseAAC: Identify Phosphohistidine Sites in Proteins by Blending Statistical Moments and Position Relative Features According to the Chou's 5-Step Rule and General Pseudo Amino Acid Composition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:596-610. [PMID: 31144645 DOI: 10.1109/tcbb.2019.2919025] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Protein phosphorylation is one of the key mechanism in prokaryotes and eukaryotes and is responsible for various biological functions such as protein degradation, intracellular localization, the multitude of cellular processes, molecular association, cytoskeletal dynamics, and enzymatic inhibition/activation. Phosphohistidine (PhosH) has a key role in a number of biological processes, including central metabolism to signalling in eukaryotes and bacteria. Thus, identification of phosphohistidine sites in a protein sequence is crucial, and experimental identification can be expensive, time-taking, and laborious. To address this problem, here, we propose a novel computational model namely iPhosH-PseAAC for prediction of phosphohistidine sites in a given protein sequence using pseudo amino acid composition (PseAAC), statistical moments, and position relative features. The results of the proposed predictor are validated through self-consistency testing, 10-fold cross-validation, and jackknife testing. The self-consistency validation gave the 100 percent accuracy, whereas, for cross-validation, the accuracy achieved is 94.26 percent. Moreover, jackknife testing gave 97.07 percent accuracy for the proposed model. Thus, the proposed model iPhosH-PseAAC for prediction of iPhosH site has the great ability to predict the PhosH sites in given proteins.
Collapse
|
49
|
Dowell AC, Munford H, Goel A, Gordon NS, James ND, Cheng KK, Zeegers MP, Ward DG, Bryan RT. PD-L2 Is Constitutively Expressed in Normal and Malignant Urothelium. Front Oncol 2021; 11:626748. [PMID: 33718196 PMCID: PMC7951139 DOI: 10.3389/fonc.2021.626748] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 01/15/2021] [Indexed: 11/14/2022] Open
Abstract
The use of immune checkpoint blockade, in particular PD-1 and PD-L1 inhibitors, is now commonplace in many clinical settings including the treatment of muscle-invasive bladder cancer (MIBC). Notwithstanding, little information exists regarding the expression of the alternative PD-1 ligand, PD-L2 in urothelial bladder cancer (UBC). We therefore set out to characterise the expression of PD-L2 in comparison to PD-L1. Firstly, we assessed PD-L2 expression by immunohistochemistry and found widespread expression of PD-L2 in UBC, albeit with reduced expression in MIBC. We further investigated these findings using RNA-seq data from a cohort of 575 patients demonstrating that PDCD1LG2 (PD-L2) is widely expressed in UBC and correlated with CD274 (PD-L1). However, in contrast to our immunohistochemistry findings, expression was significantly increased in advanced disease. We have also provided detailed evidence of constitutive PD-L2 expression in normal urothelium and propose a mechanism by which PD-L2 is cleaved from the cell surface in MIBC. These data provide a comprehensive assessment of PD-L2 in UBC, showing PD-L2 is abundant in UBC and, importantly, constitutively present in normal urothelium. These data have implications for future development of immune checkpoint blockade, and also the understanding of the function of the immune system in the normal urinary bladder.
Collapse
Affiliation(s)
- Alexander C Dowell
- Institute of Immunology and Immunotherapy, University of Birmingham, Birmingham, United Kingdom
| | - Haydn Munford
- Institute of Immunology and Immunotherapy, University of Birmingham, Birmingham, United Kingdom
| | - Anshita Goel
- Bladder Cancer Research Centre, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Naheema S Gordon
- Bladder Cancer Research Centre, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Nicholas D James
- Prostate and Bladder Cancer Research Team, The Institute of Cancer Research, London, United Kingdom
| | - K K Cheng
- School of Health and Population Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Maurice P Zeegers
- Department of Complex Genetics and Epidemiology, School of Nutrition and Translational Research in Metabolism, Maastricht University, Maastricht, Netherlands.,CAPHRI School for Public Health and Primary Care, University of Maastricht, Maastricht, Netherlands
| | - Douglas G Ward
- Bladder Cancer Research Centre, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Richard T Bryan
- Bladder Cancer Research Centre, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom
| |
Collapse
|
50
|
iDRP-PseAAC: Identification of DNA Replication Proteins Using General PseAAC and Position Dependent Features. Int J Pept Res Ther 2021; 27:1315-1329. [PMID: 33584161 PMCID: PMC7869428 DOI: 10.1007/s10989-021-10170-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/18/2021] [Indexed: 10/25/2022]
Abstract
DNA replication is one of the specific processes to be considered in all the living organisms, specifically eukaryotes. The prevalence of DNA replication is significant for an evolutionary transition at the beginning of life. DNA replication proteins are those proteins which support the process of replication and are also reported to be important in drug design and discovery. This information depicts that DNA replication proteins have a very important role in human bodies, however, to study their mechanism, their identification is necessary. Thus, it is a very important task but, in any case, an experimental identification is time-consuming, highly-costly and laborious. To cope with this issue, a computational methodology is required for prediction of these proteins, however, no prior method exists. This study comprehends the construction of novel prediction model to serve the proposed purpose. The prediction model is developed based on the artificial neural network by integrating the position relative features and sequence statistical moments in PseAAC for training neural networks. Highest overall accuracy has been achieved through tenfold cross-validation and Jackknife testing that was computed to be 96.22% and 98.56%, respectively. Our astonishing experimental results demonstrated that the proposed predictor surpass the existing models that can be served as a time and cost-effective stratagem for designing novel drugs to strike the contemporary bacterial infection.
Collapse
|