1
|
Hu Y, Wang Y, Hu X, Chao H, Li S, Ni Q, Zhu Y, Hu Y, Zhao Z, Chen M. T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors. Comput Struct Biotechnol J 2024; 23:801-812. [PMID: 38328004 PMCID: PMC10847861 DOI: 10.1016/j.csbj.2024.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 01/20/2024] [Accepted: 01/20/2024] [Indexed: 02/09/2024] Open
Abstract
Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs and compared their performance. We integrated three modules into a model called T4SEpp. The first module searched for full-length homologs of known T4SEs, signal sequences, and effector domains; the second module fine-tuned a machine learning model using data for a signal sequence feature; and the third module used the three best-performing pre-trained protein language models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at a high specificity of ∼0.99, based on the assessment of an independent validation dataset. T4SEpp predicted 13 T4SEs from Helicobacter pylori, including the well-known CagA and 12 other potential ones, among which eleven could potentially interact with human proteins. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist in the identification of bacterial T4SEs and facilitates studies of bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp.
Collapse
Affiliation(s)
- Yueming Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Yejun Wang
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen, China
- Department of Cell Biology and Genetics, College of Basic Medicine, Shenzhen University Medical School, Shenzhen, China
| | - Xiaotian Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Haoyu Chao
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Sida Li
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Qinyang Ni
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Yanyan Zhu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Yixue Hu
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen, China
| | - Ziyi Zhao
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen, China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
- Institute of Hematology, Zhejiang University School of Medicine, The First Affiliated Hospital, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
2
|
Tang X, Luo L, Wang S. TSE-ARF: An adaptive prediction method of effectors across secretion system types. Anal Biochem 2024; 686:115407. [PMID: 38030053 DOI: 10.1016/j.ab.2023.115407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 11/12/2023] [Accepted: 11/20/2023] [Indexed: 12/01/2023]
Abstract
Bacterial effector proteins are secreted by a variety of protein secretion systems and play an important role in the interaction between the host and pathogenic bacteria. Therefore, it is important to find a fast and inexpensive method to discover bacterial effectors. In this study, we propose a multi-type secretion effector adaptive random forest (TSE-ARF) to adaptively identify secretion effectors across T1SE-T4SE and T6SE based only on protein sequences. First, we proposed two new feature descriptors by considering some characteristic protein information and fused them with some universal features to form a 290-dimensional feature vector with good versatility. Then, the TSE-ARF model was used to make classification predictions by parameter adaptation of different secretion effectors integrating Shuffled Frog Leaping Algorithm and random forest. The perfect performance in TSE-ARF under different data sets and settings shows its considerable generalization ability, with which more candidate effectors were screened in the whole genome. Source code is available at https://github.com/AIMOVE/TSE-ARF.
Collapse
Affiliation(s)
- Xianjun Tang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Longfei Luo
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China; Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming, Yunnan, China.
| |
Collapse
|
3
|
Akhter S, Miller JH. BaPreS: a software tool for predicting bacteriocins using an optimal set of features. BMC Bioinformatics 2023; 24:313. [PMID: 37592230 PMCID: PMC10433575 DOI: 10.1186/s12859-023-05330-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 05/09/2023] [Indexed: 08/19/2023] Open
Abstract
BACKGROUND Antibiotic resistance is a major public health concern around the globe. As a result, researchers always look for new compounds to develop new antibiotic drugs for combating antibiotic-resistant bacteria. Bacteriocin becomes a promising antimicrobial agent to fight against antibiotic resistance, due to cases of both broad and narrow killing spectra. Sequence matching methods are widely used to identify bacteriocins by comparing them with the known bacteriocin sequences; however, these methods often fail to detect new bacteriocin sequences due to their high diversity. The ability to use a machine learning approach can help find new highly dissimilar bacteriocins for developing highly effective antibiotic drugs. The aim of this work is to develop a machine learning-based software tool called BaPreS (Bacteriocin Prediction Software) using an optimal set of features for detecting bacteriocin protein sequences with high accuracy. We extracted potential features from known bacteriocin and non-bacteriocin sequences by considering the physicochemical and structural properties of the protein sequences. Then we reduced the feature set using statistical justifications and recursive feature elimination technique. Finally, we built support vector machine (SVM) and random forest (RF) models using the selected features and utilized the best machine learning model to implement the software tool. RESULTS We applied BaPreS to an established dataset and evaluated its prediction performance. Acquired results show that the software tool can achieve a prediction accuracy of 95.54% for testing protein sequences. This tool allows users to add new bacteriocin or non-bacteriocin sequences in the training dataset to further enhance the predictive power of the tool. We compared the prediction performance of the BaPreS with a popular sequence matching-based tool and a deep learning-based method, and our software tool outperformed both. CONCLUSIONS BaPreS is a bacteriocin prediction tool that can be used to discover new highly dissimilar bacteriocins for developing highly effective antibiotic drugs. This software tool can be used with Windows, Linux and macOS operating systems. The open-source software package and its user manual are available at https://github.com/suraiya14/BaPreS .
Collapse
Affiliation(s)
- Suraiya Akhter
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, USA.
- School of Engineering and Applied Sciences, Washington State University Tri-Cities, Richland, WA, USA.
| | - John H Miller
- School of Engineering and Applied Sciences, Washington State University Tri-Cities, Richland, WA, USA.
| |
Collapse
|
4
|
Wagner N, Alburquerque M, Ecker N, Dotan E, Zerah B, Pena MM, Potnis N, Pupko T. Natural language processing approach to model the secretion signal of type III effectors. FRONTIERS IN PLANT SCIENCE 2022; 13:1024405. [PMID: 36388586 PMCID: PMC9659976 DOI: 10.3389/fpls.2022.1024405] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Accepted: 10/11/2022] [Indexed: 06/16/2023]
Abstract
Type III effectors are proteins injected by Gram-negative bacteria into eukaryotic hosts. In many plant and animal pathogens, these effectors manipulate host cellular processes to the benefit of the bacteria. Type III effectors are secreted by a type III secretion system that must "classify" each bacterial protein into one of two categories, either the protein should be translocated or not. It was previously shown that type III effectors have a secretion signal within their N-terminus, however, despite numerous efforts, the exact biochemical identity of this secretion signal is generally unknown. Computational characterization of the secretion signal is important for the identification of novel effectors and for better understanding the molecular translocation mechanism. In this work we developed novel machine-learning algorithms for characterizing the secretion signal in both plant and animal pathogens. Specifically, we represented each protein as a vector in high-dimensional space using Facebook's protein language model. Classification algorithms were next used to separate effectors from non-effector proteins. We subsequently curated a benchmark dataset of hundreds of effectors and thousands of non-effector proteins. We showed that on this curated dataset, our novel approach yielded substantially better classification accuracy compared to previously developed methodologies. We have also tested the hypothesis that plant and animal pathogen effectors are characterized by different secretion signals. Finally, we integrated the novel approach in Effectidor, a web-server for predicting type III effector proteins, leading to a more accurate classification of effectors from non-effectors.
Collapse
Affiliation(s)
- Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Michael Alburquerque
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Noa Ecker
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Edo Dotan
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Ben Zerah
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Michelle Mendonca Pena
- Department of Entomology and Plant Pathology, Auburn University, Auburn, AL, United States
| | - Neha Potnis
- Department of Entomology and Plant Pathology, Auburn University, Auburn, AL, United States
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
5
|
Computational prediction of secreted proteins in gram-negative bacteria. Comput Struct Biotechnol J 2021; 19:1806-1828. [PMID: 33897982 PMCID: PMC8047123 DOI: 10.1016/j.csbj.2021.03.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 03/18/2021] [Accepted: 03/18/2021] [Indexed: 12/29/2022] Open
Abstract
Gram-negative bacteria harness multiple protein secretion systems and secrete a large proportion of the proteome. Proteins can be exported to periplasmic space, integrated into membrane, transported into extracellular milieu, or translocated into cytoplasm of contacting cells. It is important for accurate, genome-wide annotation of the secreted proteins and their secretion pathways. In this review, we systematically classified the secreted proteins according to the types of secretion systems in Gram-negative bacteria, summarized the known features of these proteins, and reviewed the algorithms and tools for their prediction.
Collapse
|
6
|
Yu L, Liu F, Li Y, Luo J, Jing R. DeepT3_4: A Hybrid Deep Neural Network Model for the Distinction Between Bacterial Type III and IV Secreted Effectors. Front Microbiol 2021; 12:605782. [PMID: 33552038 PMCID: PMC7858263 DOI: 10.3389/fmicb.2021.605782] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Accepted: 01/04/2021] [Indexed: 01/17/2023] Open
Abstract
Gram-negative bacteria can deliver secreted proteins (also known as secreted effectors) directly into host cells through type III secretion system (T3SS), type IV secretion system (T4SS), and type VI secretion system (T6SS) and cause various diseases. These secreted effectors are heavily involved in the interactions between bacteria and host cells, so their identification is crucial for the discovery and development of novel anti-bacterial drugs. It is currently challenging to accurately distinguish type III secreted effectors (T3SEs) and type IV secreted effectors (T4SEs) because neither T3SEs nor T4SEs contain N-terminal signal peptides, and some of these effectors have similar evolutionary conserved profiles and sequence motifs. To address this challenge, we develop a deep learning (DL) approach called DeepT3_4 to correctly classify T3SEs and T4SEs. We generate amino-acid character dictionary and sequence-based features extracted from effector proteins and subsequently implement these features into a hybrid model that integrates recurrent neural networks (RNNs) and deep neural networks (DNNs). After training the model, the hybrid neural network classifies secreted effectors into two different classes with an accuracy, F-value, and recall of over 80.0%. Our approach stands for the first DL approach for the classification of T3SEs and T4SEs, providing a promising supplementary tool for further secretome studies.
Collapse
Affiliation(s)
- Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang, China
| | - Yizhou Li
- College of Cybersecurity, Sichuan University, Chengdu, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, China
| | - Runyu Jing
- College of Cybersecurity, Sichuan University, Chengdu, China
| |
Collapse
|
7
|
Zhang J, Lv L, Lu D, Kong D, Al-Alashaari MAA, Zhao X. Variable selection from a feature representing protein sequences: a case of classification on bacterial type IV secreted effectors. BMC Bioinformatics 2020; 21:480. [PMID: 33109082 PMCID: PMC7590791 DOI: 10.1186/s12859-020-03826-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 10/19/2020] [Indexed: 12/13/2022] Open
Abstract
Background Classification of certain proteins with specific functions is momentous for biological research. Encoding approaches of protein sequences for feature extraction play an important role in protein classification. Many computational methods (namely classifiers) are used for classification on protein sequences according to various encoding approaches. Commonly, protein sequences keep certain labels corresponding to different categories of biological functions (e.g., bacterial type IV secreted effectors or not), which makes protein prediction a fantasy. As to protein prediction, a kernel set of protein sequences keeping certain labels certified by biological experiments should be existent in advance. However, it has been hardly ever seen in prevailing researches. Therefore, unsupervised learning rather than supervised learning (e.g. classification) should be considered. As to protein classification, various classifiers may help to evaluate the effectiveness of different encoding approaches. Besides, variable selection from an encoded feature representing protein sequences is an important issue that also needs to be considered. Results Focusing on the latter problem, we propose a new method for variable selection from an encoded feature representing protein sequences. Taking a benchmark dataset containing 1947 protein sequences as a case, experiments are made to identify bacterial type IV secreted effectors (T4SE) from protein sequences, which are composed of 399 T4SE and 1548 non-T4SE. Comparable and quantified results are obtained only using certain components of the encoded feature, i.e., position-specific scoring matix, and that indicates the effectiveness of our method. Conclusions Certain variables other than an encoded feature they belong to do work for discrimination between different types of proteins. In addition, ensemble classifiers with an automatic assignment of different base classifiers do achieve a better classification result.
Collapse
Affiliation(s)
- Jian Zhang
- College of Artificial Intelligence, Wuxi Vocational College of Science and Technology, No. 8 Xinxi Road, Wuxi, 214028, China
| | - Lixin Lv
- College of Artificial Intelligence, Wuxi Vocational College of Science and Technology, No. 8 Xinxi Road, Wuxi, 214028, China
| | - Donglei Lu
- College of Artificial Intelligence, Wuxi Vocational College of Science and Technology, No. 8 Xinxi Road, Wuxi, 214028, China
| | - Denan Kong
- College of Information and Computer Engineering, Northeast Forestry University, No. 26 Hexing Road, Harbin, 150040, China
| | | | - Xudong Zhao
- College of Information and Computer Engineering, Northeast Forestry University, No. 26 Hexing Road, Harbin, 150040, China.
| |
Collapse
|
8
|
Chen T, Wang X, Chu Y, Wang Y, Jiang M, Wei DQ, Xiong Y. T4SE-XGB: Interpretable Sequence-Based Prediction of Type IV Secreted Effectors Using eXtreme Gradient Boosting Algorithm. Front Microbiol 2020; 11:580382. [PMID: 33072049 PMCID: PMC7541839 DOI: 10.3389/fmicb.2020.580382] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 08/21/2020] [Indexed: 12/19/2022] Open
Abstract
Type IV secreted effectors (T4SEs) can be translocated into the cytosol of host cells via type IV secretion system (T4SS) and cause diseases. However, experimental approaches to identify T4SEs are time- and resource-consuming, and the existing computational tools based on machine learning techniques have some obvious limitations such as the lack of interpretability in the prediction models. In this study, we proposed a new model, T4SE-XGB, which uses the eXtreme gradient boosting (XGBoost) algorithm for accurate identification of type IV effectors based on optimal features based on protein sequences. After trying 20 different types of features, the best performance was achieved when all features were fed into XGBoost by the 5-fold cross validation in comparison with other machine learning methods. Then, the ReliefF algorithm was adopted to get the optimal feature set on our dataset, which further improved the model performance. T4SE-XGB exhibited highest predictive performance on the independent test set and outperformed other published prediction tools. Furthermore, the SHAP method was used to interpret the contribution of features to model predictions. The identification of key features can contribute to improved understanding of multifactorial contributors to host-pathogen interactions and bacterial pathogenesis. In addition to type IV effector prediction, we believe that the proposed framework can provide instructive guidance for similar studies to construct prediction methods on related biological problems. The data and source code of this study can be freely accessed at https://github.com/CT001002/T4SE-XGB.
Collapse
Affiliation(s)
- Tianhang Chen
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Xiangeng Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Mingming Jiang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
9
|
Carreón-Anguiano KG, Islas-Flores I, Vega-Arreguín J, Sáenz-Carbonell L, Canto-Canché B. EffHunter: A Tool for Prediction of Effector Protein Candidates in Fungal Proteomic Databases. Biomolecules 2020; 10:biom10050712. [PMID: 32375409 PMCID: PMC7277995 DOI: 10.3390/biom10050712] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Revised: 03/17/2020] [Accepted: 03/21/2020] [Indexed: 11/16/2022] Open
Abstract
Pathogens are able to deliver small-secreted, cysteine-rich proteins into plant cells to enable infection. The computational prediction of effector proteins remains one of the most challenging areas in the study of plant fungi interactions. At present, there are several bioinformatic programs that can help in the identification of these proteins; however, in most cases, these programs are managed independently. Here, we present EffHunter, an easy and fast bioinformatics tool for the identification of effectors. This predictor was used to identify putative effectors in 88 proteomes using characteristics such as size, cysteine residue content, secretion signal and transmembrane domains.
Collapse
Affiliation(s)
- Karla Gisel Carreón-Anguiano
- Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, A.C., Calle 43 No. 130 X 32 y 34, Col. Chuburná de Hidalgo, C.P. 97205 Mérida, México
| | - Ignacio Islas-Flores
- Unidad de Bioquímica y Biología Molecular de Plantas, Centro de Investigación Científica de Yucatán, A.C., Calle 43 No. 130 X 32 y 34, Col. Chuburná de Hidalgo, C.P. 97205 Mérida, México
| | - Julio Vega-Arreguín
- Laboratorio de Ciencias AgroGenómicas, Escuela Nacional de Estudios Superiores-UNAM, León, México
| | - Luis Sáenz-Carbonell
- Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, A.C., Calle 43 No. 130 X 32 y 34, Col. Chuburná de Hidalgo, C.P. 97205 Mérida, México
| | - Blondy Canto-Canché
- Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, A.C., Calle 43 No. 130 X 32 y 34, Col. Chuburná de Hidalgo, C.P. 97205 Mérida, México
| |
Collapse
|
10
|
Comparative genomic analysis of Erwinia amylovora reveals novel insights in phylogenetic arrangement, plasmid diversity, and streptomycin resistance. Genomics 2020; 112:3762-3772. [PMID: 32259573 DOI: 10.1016/j.ygeno.2020.04.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 03/16/2020] [Accepted: 04/01/2020] [Indexed: 01/06/2023]
Abstract
Erwinia amylovora is a destructive pathogen of Rosaceous plants and an economic concern worldwide. Herein, we report 93 new E. amylovora genomes from North America, Europe, the Mediterranean, and New Zealand. This new genomic information demonstrates the existence of three primary clades of Amygdaloideae (apple and pear) infecting E. amylovora and suggests all three independently originate from North America. The comprehensive sequencing also identified and confirmed the presence of 7 novel plasmids ranging in size from 2.9 to 34.7 kbp. While the function of the novel plasmids is unknown, the plasmids pEAR27, pEAR28, and pEAR35 encoded for type IV secretion systems. The strA-strB gene pair and the K43R point mutation at codon 43 of the rpsL gene have been previously documented to confer streptomycin resistance. Of the sequenced isolates, rpsL-based streptomycin resistance was more common and was found with the highest frequency in the Western North American clade.
Collapse
|
11
|
Esna Ashari Z, Brayton KA, Broschat SL. Prediction of T4SS Effector Proteins for Anaplasma phagocytophilum Using OPT4e, A New Software Tool. Front Microbiol 2019; 10:1391. [PMID: 31293540 PMCID: PMC6598457 DOI: 10.3389/fmicb.2019.01391] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 06/03/2019] [Indexed: 01/01/2023] Open
Abstract
Type IV secretion systems (T4SS) are used by a number of bacterial pathogens to attack the host cell. The complex protein structure of the T4SS is used to directly translocate effector proteins into host cells, often causing fatal diseases in humans and animals. Identification of effector proteins is the first step in understanding how they function to cause virulence and pathogenicity. Accurate prediction of effector proteins via a machine learning approach can assist in the process of their identification. The main goal of this study is to predict a set of candidate effectors for the tick-borne pathogen Anaplasma phagocytophilum, the causative agent of anaplasmosis in humans. To our knowledge, we present the first computational study for effector prediction with a focus on A. phagocytophilum. In a previous study, we systematically selected a set of optimal features from more than 1,000 possible protein characteristics for predicting T4SS effector candidates. This was followed by a study of the features using the proteome of Legionella pneumophila strain Philadelphia deduced from its complete genome. In this manuscript we introduce the OPT4e software package for Optimal-features Predictor for T4SS Effector proteins. An earlier version of OPT4e was verified using cross-validation tests, accuracy tests, and comparison with previous results for L. pneumophila. We use OPT4e to predict candidate effectors from the proteomes of A. phagocytophilum strains HZ and HGE-1 and predict 48 and 46 candidates, respectively, with 16 and 18 deemed most probable as effectors. These latter include the three known validated effectors for A. phagocytophilum.
Collapse
Affiliation(s)
- Zhila Esna Ashari
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States
| | - Kelly A Brayton
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States.,Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, United States.,Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, United States
| | - Shira L Broschat
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States.,Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, United States.,Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, United States
| |
Collapse
|
12
|
Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila. PLoS One 2019; 14:e0202312. [PMID: 30682021 PMCID: PMC6347213 DOI: 10.1371/journal.pone.0202312] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Accepted: 01/12/2019] [Indexed: 12/26/2022] Open
Abstract
Type IV secretion systems exist in a number of bacterial pathogens and are used to secrete effector proteins directly into host cells in order to change their environment making the environment hospitable for the bacteria. In recent years, several machine learning algorithms have been developed to predict effector proteins, potentially facilitating experimental verification. However, inconsistencies exist between their results. Previously we analysed the disparate sets of predictive features used in these algorithms to determine an optimal set of 370 features for effector prediction. This study focuses on the best way to use these optimal features by designing three machine learning classifiers, comparing our results with those of others, and obtaining de novo results. We chose the pathogen Legionella pneumophila strain Philadelphia-1, a cause of Legionnaires’ disease, because it has many validated effector proteins and others have developed machine learning prediction tools for it. While all of our models give good results indicating that our optimal features are quite robust, Model 1, which uses all 370 features with a support vector machine, has slightly better accuracy. Moreover, Model 1 predicted 472 effector proteins that are deemed highly probable to be effectors and include 94% of known effectors. Although the results of our three models agree well with those of other researchers, their models only predicted 126 and 311 candidate effectors.
Collapse
|