1
|
Zheng D, Zhou S, Chen L, Pang G, Yang J. A deep learning method to predict bacterial ADP-ribosyltransferase toxins. Bioinformatics 2024; 40:btae378. [PMID: 38885365 PMCID: PMC11219481 DOI: 10.1093/bioinformatics/btae378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 06/03/2024] [Accepted: 06/13/2024] [Indexed: 06/20/2024] Open
Abstract
MOTIVATION ADP-ribosylation is a critical modification involved in regulating diverse cellular processes, including chromatin structure regulation, RNA transcription, and cell death. Bacterial ADP-ribosyltransferase toxins (bARTTs) serve as potent virulence factors that orchestrate the manipulation of host cell functions to facilitate bacterial pathogenesis. Despite their pivotal role, the bioinformatic identification of novel bARTTs poses a formidable challenge due to limited verified data and the inherent sequence diversity among bARTT members. RESULTS We proposed a deep learning-based model, ARTNet, specifically engineered to predict bARTTs from bacterial genomes. Initially, we introduced an effective data augmentation method to address the issue of data scarcity in training ARTNet. Subsequently, we employed a data optimization strategy by utilizing ART-related domain subsequences instead of the primary full sequences, thereby significantly enhancing the performance of ARTNet. ARTNet achieved a Matthew's correlation coefficient (MCC) of 0.9351 and an F1-score (macro) of 0.9666 on repeated independent test datasets, outperforming three other deep learning models and six traditional machine learning models in terms of time efficiency and accuracy. Furthermore, we empirically demonstrated the ability of ARTNet to predict novel bARTTs across domain superfamilies without sequence similarity. We anticipate that ARTNet will greatly facilitate the screening and identification of novel bARTTs from bacterial genomes. AVAILABILITY AND IMPLEMENTATION ARTNet is publicly accessible at http://www.mgc.ac.cn/ARTNet/. The source code of ARTNet is freely available at https://github.com/zhengdd0422/ARTNet/.
Collapse
Affiliation(s)
- Dandan Zheng
- NHC Key Laboratory of Systems Biology of Pathogens, National Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 102629, China
| | - Siyu Zhou
- NHC Key Laboratory of Systems Biology of Pathogens, National Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 102629, China
| | - Lihong Chen
- NHC Key Laboratory of Systems Biology of Pathogens, National Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 102629, China
| | - Guansong Pang
- School of Computing and Information Systems, Singapore Management University, Singapore 178902, Singapore
| | - Jian Yang
- NHC Key Laboratory of Systems Biology of Pathogens, National Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 102629, China
| |
Collapse
|
2
|
Wang J, Li J, Stubenrauch CJ. Use of Bastion for the Identification of Secreted Substrates. Methods Mol Biol 2024; 2715:519-531. [PMID: 37930548 DOI: 10.1007/978-1-0716-3445-5_31] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2023]
Abstract
Bacteria use secretion systems to translocate numerous proteins into and across cell membranes, but have evolved more specialized secretion systems that can disrupt the normal cellular processes of host cells and compete bacteria or protect the bacteria from host defenses. Among them, Gram-negative bacteria utilize a variety of different proteins secreted by Type 1 to Type 6 secretion systems to transfer substrates into target cells or the surrounding environment, which play key roles in disease and survival. Therefore, these secreted proteins have attracted the attention of a wealth of researchers. The first step to characterizing new substrates of secretion systems is typically identifying candidates bioinformatically, and the Bastion series of substrate predictors provide biologists machine learning tools that can accurately predict these substrates. This chapter will explain how to use the Bastion series for identifying and analyzing secreted substrates in Gram-negative bacteria.
Collapse
Affiliation(s)
- Jiawei Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK.
- Infection Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia.
- Centre to Impact AMR, Monash University, Melbourne, VIC, Australia.
| | - Jiahui Li
- Infection Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia
- Centre to Impact AMR, Monash University, Melbourne, VIC, Australia
| | - Christopher J Stubenrauch
- Infection Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia
- Centre to Impact AMR, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
3
|
Jiménez-Guerrero I, López-Baena FJ, Medina C. Multitask Approach to Localize Rhizobial Type Three Secretion System Effector Proteins Inside Eukaryotic Cells. PLANTS (BASEL, SWITZERLAND) 2023; 12:plants12112133. [PMID: 37299112 DOI: 10.3390/plants12112133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 05/25/2023] [Accepted: 05/25/2023] [Indexed: 06/12/2023]
Abstract
Rhizobia can establish mutually beneficial interactions with legume plants by colonizing their roots to induce the formation of a specialized structure known as a nodule, inside of which the bacteria are able to fix atmospheric nitrogen. It is well established that the compatibility of such interactions is mainly determined by the bacterial recognition of flavonoids secreted by the plants, which in response to these flavonoids trigger the synthesis of the bacterial Nod factors that drive the nodulation process. Additionally, other bacterial signals are involved in the recognition and the efficiency of this interaction, such as extracellular polysaccharides or some secreted proteins. Some rhizobial strains inject proteins through the type III secretion system to the cytosol of legume root cells during the nodulation process. Such proteins, called type III-secreted effectors (T3E), exert their function in the host cell and are involved, among other tasks, in the attenuation of host defense responses to facilitate the infection, contributing to the specificity of the process. One of the main challenges of studying rhizobial T3E is the inherent difficulty in localizing them in vivo in the different subcellular compartments within their host cells, since in addition to their low concentration under physiological conditions, it is not always known when or where they are being produced and secreted. In this paper, we use a well-known rhizobial T3E, named NopL, to illustrate by a multitask approach where it localizes in heterologous hosts models, such as tobacco plant leaf cells, and also for the first time in transfected and/or Salmonella-infected animal cells. The consistency of our results serves as an example to study the location inside eukaryotic cells of effectors in distinct hosts with different handling techniques that can be used in almost every research laboratory.
Collapse
Affiliation(s)
- Irene Jiménez-Guerrero
- Departamento de Microbiología, Universidad de Sevilla, Avenida de Reina Mercedes, 6, 41012 Sevilla, Spain
| | | | - Carlos Medina
- Departamento de Microbiología, Universidad de Sevilla, Avenida de Reina Mercedes, 6, 41012 Sevilla, Spain
| |
Collapse
|
4
|
Computational prediction of secreted proteins in gram-negative bacteria. Comput Struct Biotechnol J 2021; 19:1806-1828. [PMID: 33897982 PMCID: PMC8047123 DOI: 10.1016/j.csbj.2021.03.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 03/18/2021] [Accepted: 03/18/2021] [Indexed: 12/29/2022] Open
Abstract
Gram-negative bacteria harness multiple protein secretion systems and secrete a large proportion of the proteome. Proteins can be exported to periplasmic space, integrated into membrane, transported into extracellular milieu, or translocated into cytoplasm of contacting cells. It is important for accurate, genome-wide annotation of the secreted proteins and their secretion pathways. In this review, we systematically classified the secreted proteins according to the types of secretion systems in Gram-negative bacteria, summarized the known features of these proteins, and reviewed the algorithms and tools for their prediction.
Collapse
|
5
|
Grogan C, Bennett M, Moore S, Lampe D. Novel Asaia bogorensis Signal Sequences for Plasmodium Inhibition in Anopheles stephensi. Front Microbiol 2021; 12:633667. [PMID: 33664722 PMCID: PMC7921796 DOI: 10.3389/fmicb.2021.633667] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 01/27/2021] [Indexed: 12/11/2022] Open
Abstract
Mosquitoes vector many pathogens that cause human disease, such as malaria that is caused by parasites in the genus Plasmodium. Current strategies to control vector-transmitted diseases are hindered by mosquito and pathogen resistance, so research has turned to altering the microbiota of the vectors. In this strategy, called paratransgenesis, symbiotic bacteria are genetically modified to affect the mosquito's phenotype by engineering them to deliver antiplasmodial effector molecules into the midgut to kill parasites. One paratransgenesis candidate is Asaia bogorensis, a Gram-negative, rod-shaped bacterium colonizing the midgut, ovaries, and salivary glands of Anopheles sp. mosquitoes. However, common secretion signals from E. coli and closely related species do not function in Asaia. Here, we report evaluation of 20 native Asaia N-terminal signal sequences predicted from bioinformatics for their ability to mediate increased levels of antiplasmodial effector molecules directed to the periplasm and ultimately outside the cell. We tested the hypothesis that by increasing the amount of antiplasmodials released from the cell we would also increase parasite killing power. We scanned the Asaia bogorensis SF2.1 genome to identify signal sequences from extra-cytoplasmic proteins and fused these to the reporter protein alkaline phosphatase. Six signals resulted in significant levels of protein released from the Asaia bacterium. Three signals were successfully used to drive the release of the antimicrobial peptide, scorpine. Further testing in mosquitoes demonstrated that these three Asaia strains were able to suppress the number of oocysts formed after a blood meal containing P. berghei to a significantly greater degree than wild-type Asaia, although prevalence was not decreased beyond levels obtained with a previously isolated siderophore receptor signal sequence. We interpret these results to indicate that there is a maximum level of suppression that can be achieved when the effectors are constitutively driven due to stress on the symbionts. This suggests that simply increasing the amount of antiplasmodial effector molecules in the midgut is insufficient to create superior paratransgenic bacterial strains and that symbiont fitness must be considered as well.
Collapse
Affiliation(s)
- Christina Grogan
- Department of Biological Sciences, Bayer School of Natural and Environmental Sciences, Duquesne University, Pittsburgh, PA, United States
| | - Marissa Bennett
- Department of Biological Sciences, Bayer School of Natural and Environmental Sciences, Duquesne University, Pittsburgh, PA, United States
| | - Shannon Moore
- Department of Biological Sciences, Bayer School of Natural and Environmental Sciences, Duquesne University, Pittsburgh, PA, United States
| | - David Lampe
- Department of Biological Sciences, Bayer School of Natural and Environmental Sciences, Duquesne University, Pittsburgh, PA, United States
| |
Collapse
|
6
|
Wang J, Li J, Hou Y, Dai W, Xie R, Marquez-Lago TT, Leier A, Zhou T, Torres V, Hay I, Stubenrauch C, Zhang Y, Song J, Lithgow T. BastionHub: a universal platform for integrating and analyzing substrates secreted by Gram-negative bacteria. Nucleic Acids Res 2021; 49:D651-D659. [PMID: 33084862 PMCID: PMC7778982 DOI: 10.1093/nar/gkaa899] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 09/22/2020] [Accepted: 10/01/2020] [Indexed: 01/08/2023] Open
Abstract
Gram-negative bacteria utilize secretion systems to export substrates into their surrounding environment or directly into neighboring cells. These substrates are proteins that function to promote bacterial survival: by facilitating nutrient collection, disabling competitor species or, for pathogens, to disable host defenses. Following a rapid development of computational techniques, a growing number of substrates have been discovered and subsequently validated by wet lab experiments. To date, several online databases have been developed to catalogue these substrates but they have limited user options for in-depth analysis, and typically focus on a single type of secreted substrate. We therefore developed a universal platform, BastionHub, that incorporates extensive functional modules to facilitate substrate analysis and integrates the five major Gram-negative secreted substrate types (i.e. from types I-IV and VI secretion systems). To our knowledge, BastionHub is not only the most comprehensive online database available, it is also the first to incorporate substrates secreted by type I or type II secretion systems. By providing the most up-to-date details of secreted substrates and state-of-the-art prediction and visualized relationship analysis tools, BastionHub will be an important platform that can assist biologists in uncovering novel substrates and formulating new hypotheses. BastionHub is freely available at http://bastionhub.erc.monash.edu/.
Collapse
Affiliation(s)
- Jiawei Wang
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia
| | - Jiahui Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia.,Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China.,School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Yi Hou
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Wei Dai
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia.,School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Ruopeng Xie
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Tatiana T Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA.,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA.,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tieli Zhou
- Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Von Torres
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia
| | - Iain Hay
- School of Biological Sciences, The University of Auckland, Auckland 1010, New Zealand
| | - Christopher Stubenrauch
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia
| | - Yanju Zhang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, VIC 3800, Australia.,Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia.,ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, VIC 3800, Australia
| | - Trevor Lithgow
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia
| |
Collapse
|
7
|
Yu L, Liu F, Li Y, Luo J, Jing R. DeepT3_4: A Hybrid Deep Neural Network Model for the Distinction Between Bacterial Type III and IV Secreted Effectors. Front Microbiol 2021; 12:605782. [PMID: 33552038 PMCID: PMC7858263 DOI: 10.3389/fmicb.2021.605782] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Accepted: 01/04/2021] [Indexed: 01/17/2023] Open
Abstract
Gram-negative bacteria can deliver secreted proteins (also known as secreted effectors) directly into host cells through type III secretion system (T3SS), type IV secretion system (T4SS), and type VI secretion system (T6SS) and cause various diseases. These secreted effectors are heavily involved in the interactions between bacteria and host cells, so their identification is crucial for the discovery and development of novel anti-bacterial drugs. It is currently challenging to accurately distinguish type III secreted effectors (T3SEs) and type IV secreted effectors (T4SEs) because neither T3SEs nor T4SEs contain N-terminal signal peptides, and some of these effectors have similar evolutionary conserved profiles and sequence motifs. To address this challenge, we develop a deep learning (DL) approach called DeepT3_4 to correctly classify T3SEs and T4SEs. We generate amino-acid character dictionary and sequence-based features extracted from effector proteins and subsequently implement these features into a hybrid model that integrates recurrent neural networks (RNNs) and deep neural networks (DNNs). After training the model, the hybrid neural network classifies secreted effectors into two different classes with an accuracy, F-value, and recall of over 80.0%. Our approach stands for the first DL approach for the classification of T3SEs and T4SEs, providing a promising supplementary tool for further secretome studies.
Collapse
Affiliation(s)
- Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang, China
| | - Yizhou Li
- College of Cybersecurity, Sichuan University, Chengdu, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, China
| | - Runyu Jing
- College of Cybersecurity, Sichuan University, Chengdu, China
| |
Collapse
|
8
|
Zheng D, Pang G, Liu B, Chen L, Yang J. Learning transferable deep convolutional neural networks for the classification of bacterial virulence factors. Bioinformatics 2020; 36:3693-3702. [PMID: 32251507 DOI: 10.1093/bioinformatics/btaa230] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 03/25/2020] [Accepted: 04/01/2020] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Identification of virulence factors (VFs) is critical to the elucidation of bacterial pathogenesis and prevention of related infectious diseases. Current computational methods for VF prediction focus on binary classification or involve only several class(es) of VFs with sufficient samples. However, thousands of VF classes are present in real-world scenarios, and many of them only have a very limited number of samples available. RESULTS We first construct a large VF dataset, covering 3446 VF classes with 160 495 sequences, and then propose deep convolutional neural network models for VF classification. We show that (i) for common VF classes with sufficient samples, our models can achieve state-of-the-art performance with an overall accuracy of 0.9831 and an F1-score of 0.9803; (ii) for uncommon VF classes with limited samples, our models can learn transferable features from auxiliary data and achieve good performance with accuracy ranging from 0.9277 to 0.9512 and F1-score ranging from 0.9168 to 0.9446 when combined with different predefined features, outperforming traditional classifiers by 1-13% in accuracy and by 1-16% in F1-score. AVAILABILITY AND IMPLEMENTATION All of our datasets are made publicly available at http://www.mgc.ac.cn/VFNet/, and the source code of our models is publicly available at https://github.com/zhengdd0422/VFNet. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dandan Zheng
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100176, China
| | - Guansong Pang
- Australian Institute for Machine Learning, The University of Adelaide, Adelaide, SA 5005, Australia
| | - Bo Liu
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100176, China
| | - Lihong Chen
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100176, China
| | - Jian Yang
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100176, China
| |
Collapse
|
9
|
Chen T, Wang X, Chu Y, Wang Y, Jiang M, Wei DQ, Xiong Y. T4SE-XGB: Interpretable Sequence-Based Prediction of Type IV Secreted Effectors Using eXtreme Gradient Boosting Algorithm. Front Microbiol 2020; 11:580382. [PMID: 33072049 PMCID: PMC7541839 DOI: 10.3389/fmicb.2020.580382] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 08/21/2020] [Indexed: 12/19/2022] Open
Abstract
Type IV secreted effectors (T4SEs) can be translocated into the cytosol of host cells via type IV secretion system (T4SS) and cause diseases. However, experimental approaches to identify T4SEs are time- and resource-consuming, and the existing computational tools based on machine learning techniques have some obvious limitations such as the lack of interpretability in the prediction models. In this study, we proposed a new model, T4SE-XGB, which uses the eXtreme gradient boosting (XGBoost) algorithm for accurate identification of type IV effectors based on optimal features based on protein sequences. After trying 20 different types of features, the best performance was achieved when all features were fed into XGBoost by the 5-fold cross validation in comparison with other machine learning methods. Then, the ReliefF algorithm was adopted to get the optimal feature set on our dataset, which further improved the model performance. T4SE-XGB exhibited highest predictive performance on the independent test set and outperformed other published prediction tools. Furthermore, the SHAP method was used to interpret the contribution of features to model predictions. The identification of key features can contribute to improved understanding of multifactorial contributors to host-pathogen interactions and bacterial pathogenesis. In addition to type IV effector prediction, we believe that the proposed framework can provide instructive guidance for similar studies to construct prediction methods on related biological problems. The data and source code of this study can be freely accessed at https://github.com/CT001002/T4SE-XGB.
Collapse
Affiliation(s)
- Tianhang Chen
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Xiangeng Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Mingming Jiang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
10
|
Abstract
Many Gram-negative bacteria infect hosts and cause diseases by translocating a variety of type III secreted effectors (T3SEs) into the host cell cytoplasm. However, despite a dramatic increase in the number of available whole-genome sequences, it remains challenging for accurate prediction of T3SEs. Traditional prediction models have focused on atypical sequence features buried in the N-terminal peptides of T3SEs, but unfortunately, these models have had high false-positive rates. In this research, we integrated promoter information along with characteristic protein features for signal regions, chaperone-binding domains, and effector domains for T3SE prediction. Machine learning algorithms, including deep learning, were adopted to predict the atypical features mainly buried in signal sequences of T3SEs, followed by development of a voting-based ensemble model integrating the individual prediction results. We assembled this into a unified T3SE prediction pipeline, T3SEpp, which integrated the results of individual modules, resulting in high accuracy (i.e., ∼0.94) and >1-fold reduction in the false-positive rate compared to that of state-of-the-art software tools. The T3SEpp pipeline and sequence features observed here will facilitate the accurate identification of new T3SEs, with numerous benefits for future studies on host-pathogen interactions.IMPORTANCE Type III secreted effector (T3SE) prediction remains a big computational challenge. In practical applications, current software tools often suffer problems of high false-positive rates. One of the causal factors could be the relatively unitary type of biological features used for the design and training of the models. In this research, we made a comprehensive survey on the sequence-based features of T3SEs, including signal sequences, chaperone-binding domains, effector domains, and transcription factor binding promoter sites, and assembled a unified prediction pipeline integrating multi-aspect biological features within homology-based and multiple machine learning models. To our knowledge, we have compiled the most comprehensive biological sequence feature analysis for T3SEs in this research. The T3SEpp pipeline integrating the variety of features and assembling different models showed high accuracy, which should facilitate more accurate identification of T3SEs in new and existing bacterial whole-genome sequences.
Collapse
|
11
|
Lee YW, Wang J, Newton HJ, Lithgow T. Mapping bacterial effector arsenals: in vivo and in silico approaches to defining the protein features dictating effector secretion by bacteria. Curr Opin Microbiol 2020; 57:13-21. [PMID: 32505919 DOI: 10.1016/j.mib.2020.04.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2020] [Revised: 04/20/2020] [Accepted: 04/26/2020] [Indexed: 12/25/2022]
Abstract
Many bacterial pathogens rely on dedicated secretion systems to translocate virulence proteins termed 'effectors' into host cells. These effectors engage and manipulate host cellular functions to support bacterial colonization and propagation. The secretion systems are molecular machines that recognize targeting 'features' in these effector proteins in vivo to selectively and efficiently secrete them. The joint analysis of whole genome sequencing data and computational predictions of amino acid characteristics of effector proteins has made available extensive lists of candidate effectors for many bacterial pathogens, among which Dot/Icm type IVB secretion system in Legionella pneumophila reigns with the largest number of effectors identified to-date. This system is also used by the causative agent of Q fever, Coxiella burnetii, to secrete a large pool of distinct effectors. By comparing these two pathogens, we provide an understanding of the rationale behind effector repertoire expansion. We will also discuss recent bioinformatic advances facilitating high-throughput discovery of secreted effectors through in silico 'feature' recognition, and the current challenge to substantiate the biological relevance and bona fide nature of effectors identified in silico.
Collapse
Affiliation(s)
- Yi Wei Lee
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, 3000 Victoria, Australia
| | - Jiawei Wang
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton, 3800 Victoria, Australia
| | - Hayley J Newton
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, 3000 Victoria, Australia.
| | - Trevor Lithgow
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton, 3800 Victoria, Australia.
| |
Collapse
|
12
|
Carreón-Anguiano KG, Islas-Flores I, Vega-Arreguín J, Sáenz-Carbonell L, Canto-Canché B. EffHunter: A Tool for Prediction of Effector Protein Candidates in Fungal Proteomic Databases. Biomolecules 2020; 10:biom10050712. [PMID: 32375409 PMCID: PMC7277995 DOI: 10.3390/biom10050712] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Revised: 03/17/2020] [Accepted: 03/21/2020] [Indexed: 11/16/2022] Open
Abstract
Pathogens are able to deliver small-secreted, cysteine-rich proteins into plant cells to enable infection. The computational prediction of effector proteins remains one of the most challenging areas in the study of plant fungi interactions. At present, there are several bioinformatic programs that can help in the identification of these proteins; however, in most cases, these programs are managed independently. Here, we present EffHunter, an easy and fast bioinformatics tool for the identification of effectors. This predictor was used to identify putative effectors in 88 proteomes using characteristics such as size, cysteine residue content, secretion signal and transmembrane domains.
Collapse
Affiliation(s)
- Karla Gisel Carreón-Anguiano
- Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, A.C., Calle 43 No. 130 X 32 y 34, Col. Chuburná de Hidalgo, C.P. 97205 Mérida, México
| | - Ignacio Islas-Flores
- Unidad de Bioquímica y Biología Molecular de Plantas, Centro de Investigación Científica de Yucatán, A.C., Calle 43 No. 130 X 32 y 34, Col. Chuburná de Hidalgo, C.P. 97205 Mérida, México
| | - Julio Vega-Arreguín
- Laboratorio de Ciencias AgroGenómicas, Escuela Nacional de Estudios Superiores-UNAM, León, México
| | - Luis Sáenz-Carbonell
- Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, A.C., Calle 43 No. 130 X 32 y 34, Col. Chuburná de Hidalgo, C.P. 97205 Mérida, México
| | - Blondy Canto-Canché
- Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, A.C., Calle 43 No. 130 X 32 y 34, Col. Chuburná de Hidalgo, C.P. 97205 Mérida, México
| |
Collapse
|
13
|
Hong J, Luo Y, Mou M, Fu J, Zhang Y, Xue W, Xie T, Tao L, Lou Y, Zhu F. Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery. Brief Bioinform 2019; 21:1825-1836. [PMID: 31860715 DOI: 10.1093/bib/bbz120] [Citation(s) in RCA: 87] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2019] [Revised: 08/12/2019] [Accepted: 08/21/2019] [Indexed: 12/20/2022] Open
Abstract
The type IV bacterial secretion system (SS) is reported to be one of the most ubiquitous SSs in nature and can induce serious conditions by secreting type IV SS effectors (T4SEs) into the host cells. Recent studies mainly focus on annotating new T4SE from the huge amount of sequencing data, and various computational tools are therefore developed to accelerate T4SE annotation. However, these tools are reported as heavily dependent on the selected methods and their annotation performance need to be further enhanced. Herein, a convolution neural network (CNN) technique was used to annotate T4SEs by integrating multiple protein encoding strategies. First, the annotation accuracies of nine encoding strategies integrated with CNN were assessed and compared with that of the popular T4SE annotation tools based on independent benchmark. Second, false discovery rates of various models were systematically evaluated by (1) scanning the genome of Legionella pneumophila subsp. ATCC 33152 and (2) predicting the real-world non-T4SEs validated using published experiments. Based on the above analyses, the encoding strategies, (a) position-specific scoring matrix (PSSM), (b) protein secondary structure & solvent accessibility (PSSSA) and (c) one-hot encoding scheme (Onehot), were identified as well-performing when integrated with CNN. Finally, a novel strategy that collectively considers the three well-performing models (CNN-PSSM, CNN-PSSSA and CNN-Onehot) was proposed, and a new tool (CNN-T4SE, https://idrblab.org/cnnt4se/) was constructed to facilitate T4SE annotation. All in all, this study conducted a comprehensive analysis on the performance of a collection of encoding strategies when integrated with CNN, which could facilitate the suppression of T4SS in infection and limit the spread of antimicrobial resistance.
Collapse
Affiliation(s)
- Jiajun Hong
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jianbo Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yang Zhang
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou 310036, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou 310036, China
| | - Yan Lou
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, Zhejiang, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
14
|
Esna Ashari Z, Brayton KA, Broschat SL. Prediction of T4SS Effector Proteins for Anaplasma phagocytophilum Using OPT4e, A New Software Tool. Front Microbiol 2019; 10:1391. [PMID: 31293540 PMCID: PMC6598457 DOI: 10.3389/fmicb.2019.01391] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 06/03/2019] [Indexed: 01/01/2023] Open
Abstract
Type IV secretion systems (T4SS) are used by a number of bacterial pathogens to attack the host cell. The complex protein structure of the T4SS is used to directly translocate effector proteins into host cells, often causing fatal diseases in humans and animals. Identification of effector proteins is the first step in understanding how they function to cause virulence and pathogenicity. Accurate prediction of effector proteins via a machine learning approach can assist in the process of their identification. The main goal of this study is to predict a set of candidate effectors for the tick-borne pathogen Anaplasma phagocytophilum, the causative agent of anaplasmosis in humans. To our knowledge, we present the first computational study for effector prediction with a focus on A. phagocytophilum. In a previous study, we systematically selected a set of optimal features from more than 1,000 possible protein characteristics for predicting T4SS effector candidates. This was followed by a study of the features using the proteome of Legionella pneumophila strain Philadelphia deduced from its complete genome. In this manuscript we introduce the OPT4e software package for Optimal-features Predictor for T4SS Effector proteins. An earlier version of OPT4e was verified using cross-validation tests, accuracy tests, and comparison with previous results for L. pneumophila. We use OPT4e to predict candidate effectors from the proteomes of A. phagocytophilum strains HZ and HGE-1 and predict 48 and 46 candidates, respectively, with 16 and 18 deemed most probable as effectors. These latter include the three known validated effectors for A. phagocytophilum.
Collapse
Affiliation(s)
- Zhila Esna Ashari
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States
| | - Kelly A Brayton
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States.,Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, United States.,Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, United States
| | - Shira L Broschat
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States.,Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, United States.,Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, United States
| |
Collapse
|
15
|
Wang J, Yang B, An Y, Marquez-Lago T, Leier A, Wilksch J, Hong Q, Zhang Y, Hayashida M, Akutsu T, Webb GI, Strugnell RA, Song J, Lithgow T. Systematic analysis and prediction of type IV secreted effector proteins by machine learning approaches. Brief Bioinform 2019; 20:931-951. [PMID: 29186295 PMCID: PMC6585386 DOI: 10.1093/bib/bbx164] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2017] [Revised: 11/08/2017] [Indexed: 12/13/2022] Open
Abstract
In the course of infecting their hosts, pathogenic bacteria secrete numerous effectors, namely, bacterial proteins that pervert host cell biology. Many Gram-negative bacteria, including context-dependent human pathogens, use a type IV secretion system (T4SS) to translocate effectors directly into the cytosol of host cells. Various type IV secreted effectors (T4SEs) have been experimentally validated to play crucial roles in virulence by manipulating host cell gene expression and other processes. Consequently, the identification of novel effector proteins is an important step in increasing our understanding of host-pathogen interactions and bacterial pathogenesis. Here, we train and compare six machine learning models, namely, Naïve Bayes (NB), K-nearest neighbor (KNN), logistic regression (LR), random forest (RF), support vector machines (SVMs) and multilayer perceptron (MLP), for the identification of T4SEs using 10 types of selected features and 5-fold cross-validation. Our study shows that: (1) including different but complementary features generally enhance the predictive performance of T4SEs; (2) ensemble models, obtained by integrating individual single-feature models, exhibit a significantly improved predictive performance and (3) the 'majority voting strategy' led to a more stable and accurate classification performance when applied to predicting an ensemble learning model with distinct single features. We further developed a new method to effectively predict T4SEs, Bastion4 (Bacterial secretion effector predictor for T4SS), and we show our ensemble classifier clearly outperforms two recent prediction tools. In summary, we developed a state-of-the-art T4SE predictor by conducting a comprehensive performance evaluation of different machine learning algorithms along with a detailed analysis of single- and multi-feature selections.
Collapse
Affiliation(s)
- Jiawei Wang
- Biomedicine Discovery Institute and the Department of Microbiology at Monash University, Australia
| | - Bingjiao Yang
- National Engineering Research Center for Equipment and Technology of Cold Strip Rolling, College of Mechanical Engineering from Yanshan University, China
| | - Yi An
- College of Information Engineering, Northwest A&F University, China
| | - Tatiana Marquez-Lago
- Department of Genetics, University of Alabama at Birmingham (UAB) School of Medicine, USA
| | - André Leier
- Department of Genetics and the Informatics Institute, University of Alabama at Birmingham (UAB) School of Medicine, USA
| | - Jonathan Wilksch
- Department of Microbiology and Immunology at the University of Melbourne, Australia
| | | | - Yang Zhang
- Computer Science and Engineering in 2015 fromNorthwestern Polytechnical University, China
| | | | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan
| | - Geoffrey I Webb
- Faculty of Information Technology, Monash Centre for Data Science, Monash University
| | - Richard A Strugnell
- Department of Microbiology and Immunology, Faculty of Medicine Dentistry and Health Sciences, University of Melbourne
| | - Jiangning Song
- Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Trevor Lithgow
- Department of Microbiology at Monash University, Australia
| |
Collapse
|
16
|
Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila. PLoS One 2019; 14:e0202312. [PMID: 30682021 PMCID: PMC6347213 DOI: 10.1371/journal.pone.0202312] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Accepted: 01/12/2019] [Indexed: 12/26/2022] Open
Abstract
Type IV secretion systems exist in a number of bacterial pathogens and are used to secrete effector proteins directly into host cells in order to change their environment making the environment hospitable for the bacteria. In recent years, several machine learning algorithms have been developed to predict effector proteins, potentially facilitating experimental verification. However, inconsistencies exist between their results. Previously we analysed the disparate sets of predictive features used in these algorithms to determine an optimal set of 370 features for effector prediction. This study focuses on the best way to use these optimal features by designing three machine learning classifiers, comparing our results with those of others, and obtaining de novo results. We chose the pathogen Legionella pneumophila strain Philadelphia-1, a cause of Legionnaires’ disease, because it has many validated effector proteins and others have developed machine learning prediction tools for it. While all of our models give good results indicating that our optimal features are quite robust, Model 1, which uses all 370 features with a support vector machine, has slightly better accuracy. Moreover, Model 1 predicted 472 effector proteins that are deemed highly probable to be effectors and include 94% of known effectors. Although the results of our three models agree well with those of other researchers, their models only predicted 126 and 311 candidate effectors.
Collapse
|