1
|
Nielsen H. Protein Sorting Prediction. Methods Mol Biol 2024; 2715:27-63. [PMID: 37930519 DOI: 10.1007/978-1-0716-3445-5_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2023]
Abstract
Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global property-based, and homology-based prediction. In this chapter, the strengths and drawbacks of each of these approaches are described through many examples of methods that predict secretion, integration into membranes, or subcellular locations in general. The aim of this chapter is to provide a user-level introduction to the field with a minimum of computational theory.
Collapse
Affiliation(s)
- Henrik Nielsen
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.
| |
Collapse
|
2
|
Wagner N, Alburquerque M, Ecker N, Dotan E, Zerah B, Pena MM, Potnis N, Pupko T. Natural language processing approach to model the secretion signal of type III effectors. FRONTIERS IN PLANT SCIENCE 2022; 13:1024405. [PMID: 36388586 PMCID: PMC9659976 DOI: 10.3389/fpls.2022.1024405] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Accepted: 10/11/2022] [Indexed: 06/16/2023]
Abstract
Type III effectors are proteins injected by Gram-negative bacteria into eukaryotic hosts. In many plant and animal pathogens, these effectors manipulate host cellular processes to the benefit of the bacteria. Type III effectors are secreted by a type III secretion system that must "classify" each bacterial protein into one of two categories, either the protein should be translocated or not. It was previously shown that type III effectors have a secretion signal within their N-terminus, however, despite numerous efforts, the exact biochemical identity of this secretion signal is generally unknown. Computational characterization of the secretion signal is important for the identification of novel effectors and for better understanding the molecular translocation mechanism. In this work we developed novel machine-learning algorithms for characterizing the secretion signal in both plant and animal pathogens. Specifically, we represented each protein as a vector in high-dimensional space using Facebook's protein language model. Classification algorithms were next used to separate effectors from non-effector proteins. We subsequently curated a benchmark dataset of hundreds of effectors and thousands of non-effector proteins. We showed that on this curated dataset, our novel approach yielded substantially better classification accuracy compared to previously developed methodologies. We have also tested the hypothesis that plant and animal pathogen effectors are characterized by different secretion signals. Finally, we integrated the novel approach in Effectidor, a web-server for predicting type III effector proteins, leading to a more accurate classification of effectors from non-effectors.
Collapse
Affiliation(s)
- Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Michael Alburquerque
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Noa Ecker
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Edo Dotan
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Ben Zerah
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Michelle Mendonca Pena
- Department of Entomology and Plant Pathology, Auburn University, Auburn, AL, United States
| | - Neha Potnis
- Department of Entomology and Plant Pathology, Auburn University, Auburn, AL, United States
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
3
|
Jing R, Wen T, Liao C, Xue L, Liu F, Yu L, Luo J. DeepT3 2.0: improving type III secreted effector predictions by an integrative deep learning framework. NAR Genom Bioinform 2021; 3:lqab086. [PMID: 34617013 PMCID: PMC8489581 DOI: 10.1093/nargab/lqab086] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 08/12/2021] [Accepted: 09/09/2021] [Indexed: 11/13/2022] Open
Abstract
Type III secretion systems (T3SSs) are bacterial membrane-embedded nanomachines that allow a number of humans, plant and animal pathogens to inject virulence factors directly into the cytoplasm of eukaryotic cells. Export of effectors through T3SSs is critical for motility and virulence of most Gram-negative pathogens. Current computational methods can predict type III secreted effectors (T3SEs) from amino acid sequences, but due to algorithmic constraints, reliable and large-scale prediction of T3SEs in Gram-negative bacteria remains a challenge. Here, we present DeepT3 2.0 (http://advintbioinforlab.com/deept3/), a novel web server that integrates different deep learning models for genome-wide predicting T3SEs from a bacterium of interest. DeepT3 2.0 combines various deep learning architectures including convolutional, recurrent, convolutional-recurrent and multilayer neural networks to learn N-terminal representations of proteins specifically for T3SE prediction. Outcomes from the different models are processed and integrated for discriminating T3SEs and non-T3SEs. Because it leverages diverse models and an integrative deep learning framework, DeepT3 2.0 outperforms existing methods in validation datasets. In addition, the features learned from networks are analyzed and visualized to explain how models make their predictions. We propose DeepT3 2.0 as an integrated and accurate tool for the discovery of T3SEs.
Collapse
Affiliation(s)
- Runyu Jing
- School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
| | - Tingke Wen
- School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
| | - Chengxiang Liao
- School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
| | - Li Xue
- School of Public Health, Southwest Medical University, Luzhou 646000, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang 550018, China
| | - Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang 550018, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China
| |
Collapse
|
4
|
Hasan MM, Alam MA, Shoombuatong W, Deng HW, Manavalan B, Kurata H. NeuroPred-FRL: an interpretable prediction model for identifying neuropeptide using feature representation learning. Brief Bioinform 2021; 22:6272801. [PMID: 33975333 DOI: 10.1093/bib/bbab167] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 03/23/2021] [Accepted: 04/09/2021] [Indexed: 12/13/2022] Open
Abstract
Neuropeptides (NPs) are the most versatile neurotransmitters in the immune systems that regulate various central anxious hormones. An efficient and effective bioinformatics tool for rapid and accurate large-scale identification of NPs is critical in immunoinformatics, which is indispensable for basic research and drug development. Although a few NP prediction tools have been developed, it is mandatory to improve their NPs' prediction performances. In this study, we have developed a machine learning-based meta-predictor called NeuroPred-FRL by employing the feature representation learning approach. First, we generated 66 optimal baseline models by employing 11 different encodings, six different classifiers and a two-step feature selection approach. The predicted probability scores of NPs based on the 66 baseline models were combined to be deemed as the input feature vector. Second, in order to enhance the feature representation ability, we applied the two-step feature selection approach to optimize the 66-D probability feature vector and then inputted the optimal one into a random forest classifier for the final meta-model (NeuroPred-FRL) construction. Benchmarking experiments based on both cross-validation and independent tests indicate that the NeuroPred-FRL achieves a superior prediction performance of NPs compared with the other state-of-the-art predictors. We believe that the proposed NeuroPred-FRL can serve as a powerful tool for large-scale identification of NPs, facilitating the characterization of their functional mechanisms and expediting their applications in clinical therapy. Moreover, we interpreted some model mechanisms of NeuroPred-FRL by leveraging the robust SHapley Additive exPlanation algorithm.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.,Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Md Ashad Alam
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Hong-Wen Deng
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA
| | | | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| |
Collapse
|
5
|
Computational prediction of secreted proteins in gram-negative bacteria. Comput Struct Biotechnol J 2021; 19:1806-1828. [PMID: 33897982 PMCID: PMC8047123 DOI: 10.1016/j.csbj.2021.03.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 03/18/2021] [Accepted: 03/18/2021] [Indexed: 12/29/2022] Open
Abstract
Gram-negative bacteria harness multiple protein secretion systems and secrete a large proportion of the proteome. Proteins can be exported to periplasmic space, integrated into membrane, transported into extracellular milieu, or translocated into cytoplasm of contacting cells. It is important for accurate, genome-wide annotation of the secreted proteins and their secretion pathways. In this review, we systematically classified the secreted proteins according to the types of secretion systems in Gram-negative bacteria, summarized the known features of these proteins, and reviewed the algorithms and tools for their prediction.
Collapse
|
6
|
Yu L, Liu F, Li Y, Luo J, Jing R. DeepT3_4: A Hybrid Deep Neural Network Model for the Distinction Between Bacterial Type III and IV Secreted Effectors. Front Microbiol 2021; 12:605782. [PMID: 33552038 PMCID: PMC7858263 DOI: 10.3389/fmicb.2021.605782] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Accepted: 01/04/2021] [Indexed: 01/17/2023] Open
Abstract
Gram-negative bacteria can deliver secreted proteins (also known as secreted effectors) directly into host cells through type III secretion system (T3SS), type IV secretion system (T4SS), and type VI secretion system (T6SS) and cause various diseases. These secreted effectors are heavily involved in the interactions between bacteria and host cells, so their identification is crucial for the discovery and development of novel anti-bacterial drugs. It is currently challenging to accurately distinguish type III secreted effectors (T3SEs) and type IV secreted effectors (T4SEs) because neither T3SEs nor T4SEs contain N-terminal signal peptides, and some of these effectors have similar evolutionary conserved profiles and sequence motifs. To address this challenge, we develop a deep learning (DL) approach called DeepT3_4 to correctly classify T3SEs and T4SEs. We generate amino-acid character dictionary and sequence-based features extracted from effector proteins and subsequently implement these features into a hybrid model that integrates recurrent neural networks (RNNs) and deep neural networks (DNNs). After training the model, the hybrid neural network classifies secreted effectors into two different classes with an accuracy, F-value, and recall of over 80.0%. Our approach stands for the first DL approach for the classification of T3SEs and T4SEs, providing a promising supplementary tool for further secretome studies.
Collapse
Affiliation(s)
- Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang, China
| | - Yizhou Li
- College of Cybersecurity, Sichuan University, Chengdu, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, China
| | - Runyu Jing
- College of Cybersecurity, Sichuan University, Chengdu, China
| |
Collapse
|
7
|
iT3SE-PX: Identification of Bacterial Type III Secreted Effectors Using PSSM Profiles and XGBoost Feature Selection. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:6690299. [PMID: 33505516 PMCID: PMC7806399 DOI: 10.1155/2021/6690299] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 12/24/2020] [Accepted: 12/26/2020] [Indexed: 11/18/2022]
Abstract
Identification of bacterial type III secreted effectors (T3SEs) has become a popular research topic in the field of bioinformatics due to its crucial role in understanding host-pathogen interaction and developing better therapeutic targets against the pathogens. However, the recognition of all effector proteins by using traditional experimental approaches is often time-consuming and laborious. Therefore, development of computational methods to accurately predict putative novel effectors is important in reducing the number of biological experiments for validation. In this study, we proposed a method, called iT3SE-PX, to identify T3SEs solely based on protein sequences. First, three kinds of features were extracted from the position-specific scoring matrix (PSSM) profiles to help train a machine learning (ML) model. Then, the extreme gradient boosting (XGBoost) algorithm was performed to rank these features based on their classification ability. Finally, the optimal features were selected as inputs to a support vector machine (SVM) classifier to predict T3SEs. Based on the two benchmark datasets, we conducted a 100-time randomized 5-fold cross validation (CV) and an independent test, respectively. The experimental results demonstrated that the proposed method achieved superior performance compared to most of the existing methods and could serve as a useful tool for identifying putative T3SEs, given only the sequence information.
Collapse
|
8
|
Jing R, Li Y, Xue L, Liu F, Li M, Luo J. autoBioSeqpy: A Deep Learning Tool for the Classification of Biological Sequences. J Chem Inf Model 2020; 60:3755-3764. [PMID: 32786512 DOI: 10.1021/acs.jcim.0c00409] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Deep learning has proven to be a powerful method with applications in various fields including image, language, and biomedical data. Thanks to the libraries and toolkits such as TensorFlow, PyTorch, and Keras, researchers can use different deep learning architectures and data sets for rapid modeling. However, the available implementations of neural networks using these toolkits are usually designed for a specific research and are difficult to transfer to other work. Here, we present autoBioSeqpy, a tool that uses deep learning for biological sequence classification. The advantage of this tool is its simplicity. Users only need to prepare the input data set and then use a command line interface. Then, autoBioSeqpy automatically executes a series of customizable steps including text reading, parameter initialization, sequence encoding, model loading, training, and evaluation. In addition, the tool provides various ready-to-apply and adapt model templates to improve the usability of these networks. We introduce the application of autoBioSeqpy on three biological sequence problems: the prediction of type III secreted proteins, protein subcellular localization, and CRISPR/Cas9 sgRNA activity. autoBioSeqpy is freely available with examples at https://github.com/jingry/autoBioSeqpy.
Collapse
Affiliation(s)
- Runyu Jing
- College of Cybersecurity, Sichuan University, Chengdu 610065, China
| | - Yizhou Li
- College of Cybersecurity, Sichuan University, Chengdu 610065, China
| | - Li Xue
- School of Public Health, Southwest Medical University, Luzhou, Sichuan 646000, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang 550018, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu 610065, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, Sichuan 646000, China
| |
Collapse
|
9
|
ACNNT3: Attention-CNN Framework for Prediction of Sequence-Based Bacterial Type III Secreted Effectors. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:3974598. [PMID: 32328150 PMCID: PMC7157791 DOI: 10.1155/2020/3974598] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 03/09/2020] [Accepted: 03/17/2020] [Indexed: 12/18/2022]
Abstract
The type III secretion system (T3SS) is a special protein delivery system in Gram-negative bacteria which delivers T3SS-secreted effectors (T3SEs) to host cells causing pathological changes. Numerous experiments have verified that T3SEs play important roles in many biological activities and in host-pathogen interactions. Accurate identification of T3SEs is therefore essential to help understand the pathogenic mechanism of bacteria; however, many existing biological experimental methods are time-consuming and expensive. New deep-learning methods have recently been successfully applied to T3SE recognition, but improving the recognition accuracy of T3SEs is still a challenge. In this study, we developed a new deep-learning framework, ACNNT3, based on the attention mechanism. We converted 100 residues of the N-terminal of the protein sequence into a fusion feature vector of protein primary structure information (one-hot encoding) and position-specific scoring matrix (PSSM) which are used as the feature input of the network model. We then embedded the attention layer into CNN to learn the characteristic preferences of type III effector proteins, which can accurately classify any protein directly as either T3SEs or non-T3SEs. We found that the introduction of new protein features can improve the recognition accuracy of the model. Our method combines the advantages of CNN and the attention mechanism and is superior in many indicators when compared to other popular methods. Using the common independent dataset, our method is more accurate than the previous method, showing an improvement of 4.1-20.0%.
Collapse
|
10
|
Li J, Wei L, Guo F, Zou Q. EP3: an ensemble predictor that accurately identifies type III secreted effectors. Brief Bioinform 2020; 22:1918-1928. [PMID: 32043137 DOI: 10.1093/bib/bbaa008] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 12/25/2019] [Accepted: 01/10/2020] [Indexed: 01/09/2023] Open
Abstract
Type III secretion systems (T3SS) can be found in many pathogenic bacteria, such as Dysentery bacillus, Salmonella typhimurium, Vibrio cholera and pathogenic Escherichia coli. The routes of infection of these bacteria include the T3SS transferring a large number of type III secreted effectors (T3SE) into host cells, thereby blocking or adjusting the communication channels of the host cells. Therefore, the accurate identification of T3SEs is the precondition for the further study of pathogenic bacteria. In this article, a new T3SEs ensemble predictor was developed, which can accurately distinguish T3SEs from any unknown protein. In the course of the experiment, methods and models are strictly trained and tested. Compared with other methods, EP3 demonstrates better performance, including the absence of overfitting, strong robustness and powerful predictive ability. EP3 (an ensemble predictor that accurately identifies T3SEs) is designed to simplify the user's (especially nonprofessional users) access to T3SEs for further investigation, which will have a significant impact on understanding the progression of pathogenic bacterial infections. Based on the integrated model that we proposed, a web server had been established to distinguish T3SEs from non-T3SEs, where have EP3_1 and EP3_2. The users can choose the model according to the species of the samples to be tested. Our related tools and data can be accessed through the link http://lab.malab.cn/∼lijing/EP3.html.
Collapse
|
11
|
Park J, Tae Eom G, Young Oh J, Hyun Park J, Chang Kim S, Kwang Song J, Hoon Ahn J. High-Level Production of Bacteriotoxic Phospholipase A1 in Bacterial Host Pseudomonas fluorescens Via ABC Transporter-Mediated Secretion and Inducible Expression. Microorganisms 2020; 8:microorganisms8020239. [PMID: 32053917 PMCID: PMC7074900 DOI: 10.3390/microorganisms8020239] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 02/05/2020] [Accepted: 02/09/2020] [Indexed: 02/03/2023] Open
Abstract
Bacterial phospholipase A1 (PLA1) is used in various industrial fields because it can catalyze the hydrolysis, esterification, and transesterification of phospholipids to their functional derivatives. It also has a role in the degumming process of crude plant oils. However, bacterial expression of the foreign PLA1-encoding gene was generally hampered because intracellularly expressed PLA1 is inherently toxic and damages the phospholipid membrane. In this study, we report that secretion-based production of recombinant PlaA, a bacterial PLA1 gene, or co-expression of PlaS, an accessory gene, minimizes this harmful effect. We were able to achieve high-level PlaA production via secretion-based protein production. Here, TliD/TliE/TliF, an ABC transporter complex of Pseudomonas fluorescens SIK-W1, was used to secrete recombinant proteins to the extracellular medium. In order to control the protein expression with induction, a new strain of P. fluorescens, which had the lac operon repressor gene lacI, was constructed and named ZYAI strain. The bacteriotoxic PlaA protein was successfully produced in a bacterial host, with help from ABC transporter-mediated secretion, induction-controlled protein expression, and fermentation. The final protein product is capable of degumming oil efficiently, signifying its application potential.
Collapse
Affiliation(s)
- Jiyeon Park
- Korea Science Academy of Korea Advanced Institute of Science and Technology, Busan 47162, Korea;
- Intelligent Synthetic Biology Center, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701, Korea;
| | - Gyeong Tae Eom
- Research Center for Bio-based Chemistry, Korea Research Institute of Chemical Technology (KRICT) 1, Ulsan 44429, Korea;
| | - Joon Young Oh
- Research Center for Bio-based Chemistry, Korea Research Institute of Chemical Technology (KRICT), Daejeon 34114, Korea; (J.Y.O.); (J.H.P.); (J.K.S.)
| | - Ji Hyun Park
- Research Center for Bio-based Chemistry, Korea Research Institute of Chemical Technology (KRICT), Daejeon 34114, Korea; (J.Y.O.); (J.H.P.); (J.K.S.)
| | - Sun Chang Kim
- Intelligent Synthetic Biology Center, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701, Korea;
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| | - Jae Kwang Song
- Research Center for Bio-based Chemistry, Korea Research Institute of Chemical Technology (KRICT), Daejeon 34114, Korea; (J.Y.O.); (J.H.P.); (J.K.S.)
| | - Jung Hoon Ahn
- Korea Science Academy of Korea Advanced Institute of Science and Technology, Busan 47162, Korea;
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
- Correspondence: ; Tel.: +82-51-606-2335
| |
Collapse
|
12
|
Fu X, Yang Y. WEDeepT3: predicting type III secreted effectors based on word embedding and deep learning. QUANTITATIVE BIOLOGY 2019. [DOI: 10.1007/s40484-019-0184-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
13
|
Zeng C, Zou L. An account of in silico identification tools of secreted effector proteins in bacteria and future challenges. Brief Bioinform 2019; 20:110-129. [PMID: 28981574 DOI: 10.1093/bib/bbx078] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Indexed: 01/08/2023] Open
Abstract
Bacterial pathogens secrete numerous effector proteins via six secretion systems, type I to type VI secretion systems, to adapt to new environments or to promote virulence by bacterium-host interactions. Many computational approaches have been used in the identification of effector proteins before the subsequent experimental verification because they tolerate laborious biological procedures and are genome scale, automated and highly efficient. Prevalent examples include machine learning methods and statistical techniques. In this article, we summarize the computational progress toward predicting secreted effector proteins in bacteria, with an opening of an introduction of features that are used to discriminate effectors from non-effectors. The mechanism, contribution and deficiency of previous developed detection tools are presented, which are further benchmarked based on a curated testing data set. According to the results of benchmarking, potential improvements of the prediction performance are discussed, which include (1) more informative features for discriminating the effectors from non-effectors; (2) the construction of comprehensive training data set of the machine learning algorithms; (3) the advancement of reliable prediction methods and (4) a better interpretation of the mechanisms behind the molecular processes. The future of in silico identification of bacterial secreted effectors includes both opportunities and challenges.
Collapse
Affiliation(s)
- Cong Zeng
- Bioinformatics Center, Third Military Medical University (TMMU), China
| | | |
Collapse
|
14
|
Hasan MM, Rashid MM, Khatun MS, Kurata H. Computational identification of microbial phosphorylation sites by the enhanced characteristics of sequence information. Sci Rep 2019; 9:8258. [PMID: 31164681 PMCID: PMC6547684 DOI: 10.1038/s41598-019-44548-x] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 05/20/2019] [Indexed: 11/30/2022] Open
Abstract
Protein phosphorylation on serine (S) and threonine (T) has emerged as a key device in the control of many biological processes. Recently phosphorylation in microbial organisms has attracted much attention for its critical roles in various cellular processes such as cell growth and cell division. Here a novel machine learning predictor, MPSite (Microbial Phosphorylation Site predictor), was developed to identify microbial phosphorylation sites using the enhanced characteristics of sequence features. The final feature vectors optimized via a Wilcoxon rank sum test. A random forest classifier was then trained using the optimum features to build the predictor. Benchmarking investigation using the 5-fold cross-validation and independent datasets test showed that the MPSite is able to achieve robust performance on the S- and T-phosphorylation site prediction. It also outperformed other existing methods on the comprehensive independent datasets. We anticipate that the MPSite is a powerful tool for proteome-wide prediction of microbial phosphorylation sites and facilitates hypothesis-driven functional interrogation of phosphorylation proteins. A web application with the curated datasets is freely available at http://kurata14.bio.kyutech.ac.jp/MPSite/.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Md Mamunur Rashid
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan. .,Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan.
| |
Collapse
|
15
|
Dhroso A, Eidson S, Korkin D. Genome-wide prediction of bacterial effector candidates across six secretion system types using a feature-based statistical framework. Sci Rep 2018; 8:17209. [PMID: 30464223 PMCID: PMC6249201 DOI: 10.1038/s41598-018-33874-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 10/06/2018] [Indexed: 01/12/2023] Open
Abstract
Gram-negative bacteria are responsible for hundreds of millions infections worldwide, including the emerging hospital-acquired infections and neglected tropical diseases in the third-world countries. Finding a fast and cheap way to understand the molecular mechanisms behind the bacterial infections is critical for efficient diagnostics and treatment. An important step towards understanding these mechanisms is the discovery of bacterial effectors, the proteins secreted into the host through one of the six common secretion system types. Unfortunately, current prediction methods are designed to specifically target one of three secretion systems, and no accurate "secretion system-agnostic" method is available. Here, we present PREFFECTOR, a computational feature-based approach to discover effector candidates in Gram-negative bacteria, without prior knowledge on bacterial secretion system(s) or cryptic secretion signals. Our approach was first evaluated using several assessment protocols on a manually curated, balanced dataset of experimentally determined effectors across all six secretion systems, as well as non-effector proteins. The evaluation revealed high accuracy of the top performing classifiers in PREFFECTOR, with the small false positive discovery rate across all six secretion systems. Our method was also applied to six bacteria that had limited knowledge on virulence factors or secreted effectors. PREFFECTOR web-server is freely available at: http://korkinlab.org/preffector .
Collapse
Affiliation(s)
- Andi Dhroso
- Department of Computer Science, and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Samantha Eidson
- Mathematics and Computer Science Department, Fontbonne University, St. Louis, MO, USA
| | - Dmitry Korkin
- Department of Computer Science, and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA.
| |
Collapse
|
16
|
Xue L, Tang B, Chen W, Luo J. DeepT3: deep convolutional neural networks accurately identify Gram-negative bacterial type III secreted effectors using the N-terminal sequence. Bioinformatics 2018; 35:2051-2057. [DOI: 10.1093/bioinformatics/bty931] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Revised: 10/22/2018] [Accepted: 11/07/2018] [Indexed: 11/12/2022] Open
Affiliation(s)
- Li Xue
- School of Public Health, Southwest Medical University, Luzhou, Sichuan, PR, China
| | - Bin Tang
- Basic Medical College of Southwest Medical University, Luzhou, Sichuan, PR, China
| | - Wei Chen
- Integrative Genomics Core, City of Hope National Medical Center, Duarte, CA, USA
| | - Jiesi Luo
- Key Laboratory for Aging and Regenerative Medicine, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, Sichuan, China
| |
Collapse
|
17
|
Wang J, Li J, Yang B, Xie R, Marquez-Lago TT, Leier A, Hayashida M, Akutsu T, Zhang Y, Chou KC, Selkrig J, Zhou T, Song J, Lithgow T. Bastion3: a two-layer ensemble predictor of type III secreted effectors. Bioinformatics 2018; 35:2017-2028. [PMID: 30388198 PMCID: PMC7963071 DOI: 10.1093/bioinformatics/bty914] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Revised: 10/15/2018] [Accepted: 10/31/2018] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen-host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (i) most of the trained machine learning models are based on the N-terminus (or incorporating also the C-terminus) instead of the proteins' complete sequences, and (ii) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. To achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model. RESULTS In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. We trained the models using a new gradient boosting machine, LightGBM and further boosted the models' performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively outperformed all other toolkits by more than 5.6% in ACC value, 5.7% in F-value, 12.4% in MCC value and 5.8% in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-the-art toolkit for T3SE prediction. AVAILABILITY AND IMPLEMENTATION http://bastion3.erc.monash.edu/. CONTACT selkrig@embl.de or wyztli@163.com or or trevor.lithgow@monash.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiawei Wang
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia
| | - Jiahui Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia,Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Bingjiao Yang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Ruopeng Xie
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Tatiana T Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Morihiro Hayashida
- National Institute of Technology, Matsue College, Matsue, Shimane, Japan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Yanju Zhang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| | - Joel Selkrig
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Tieli Zhou
- Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | | | - Trevor Lithgow
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
18
|
An Y, Wang J, Li C, Leier A, Marquez-Lago T, Wilksch J, Zhang Y, Webb GI, Song J, Lithgow T. Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI. Brief Bioinform 2018; 19:148-161. [PMID: 27777222 DOI: 10.1093/bib/bbw100] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2016] [Indexed: 11/15/2022] Open
Abstract
Bacterial effector proteins secreted by various protein secretion systems play crucial roles in host-pathogen interactions. In this context, computational tools capable of accurately predicting effector proteins of the various types of bacterial secretion systems are highly desirable. Existing computational approaches use different machine learning (ML) techniques and heterogeneous features derived from protein sequences and/or structural information. These predictors differ not only in terms of the used ML methods but also with respect to the used curated data sets, the features selection and their prediction performance. Here, we provide a comprehensive survey and benchmarking of currently available tools for the prediction of effector proteins of bacterial types III, IV and VI secretion systems (T3SS, T4SS and T6SS, respectively). We review core algorithms, feature selection techniques, tool availability and applicability and evaluate the prediction performance based on carefully curated independent test data sets. In an effort to improve predictive performance, we constructed three ensemble models based on ML algorithms by integrating the output of all individual predictors reviewed. Our benchmarks demonstrate that these ensemble models outperform all the reviewed tools for the prediction of effector proteins of T3SS and T4SS. The webserver of the proposed ensemble methods for T3SS and T4SS effector protein prediction is freely available at http://tbooster.erc.monash.edu/index.jsp. We anticipate that this survey will serve as a useful guide for interested users and that the new ensemble predictors will stimulate research into host-pathogen relationships and inspiration for the development of new bioinformatics tools for predicting effector proteins of T3SS, T4SS and T6SS.
Collapse
|
19
|
Hasan MM, Kurata H. GPSuc: Global Prediction of Generic and Species-specific Succinylation Sites by aggregating multiple sequence features. PLoS One 2018; 13:e0200283. [PMID: 30312302 PMCID: PMC6193575 DOI: 10.1371/journal.pone.0200283] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 06/22/2018] [Indexed: 01/09/2023] Open
Abstract
Lysine succinylation is one of the dominant post-translational modification of the protein that contributes to many biological processes including cell cycle, growth and signal transduction pathways. Identification of succinylation sites is an important step for understanding the function of proteins. The complicated sequence patterns of protein succinylation revealed by proteomic studies highlight the necessity of developing effective species-specific in silico strategies for global prediction succinylation sites. Here we have developed the generic and nine species-specific succinylation site classifiers through aggregating multiple complementary features. We optimized the consecutive features using the Wilcoxon-rank feature selection scheme. The final feature vectors were trained by a random forest (RF) classifier. With an integration of RF scores via logistic regression, the resulting predictor termed GPSuc achieved better performance than other existing generic and species-specific succinylation site predictors. To reveal the mechanism of succinylation and assist hypothesis-driven experimental design, our predictor serves as a valuable resource. To provide a promising performance in large-scale datasets, a web application was developed at http://kurata14.bio.kyutech.ac.jp/GPSuc/.
Collapse
Affiliation(s)
- Md. Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan
- Biomedi Informatics R&D Center, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan
- * E-mail:
| |
Collapse
|
20
|
Hasan MM, Khatun MS, Mollah MNH, Yong C, Dianjing G. NTyroSite: Computational Identification of Protein Nitrotyrosine Sites Using Sequence Evolutionary Features. Molecules 2018; 23:E1667. [PMID: 29987232 PMCID: PMC6099560 DOI: 10.3390/molecules23071667] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Revised: 06/28/2018] [Accepted: 06/28/2018] [Indexed: 02/06/2023] Open
Abstract
Nitrotyrosine is a product of tyrosine nitration mediated by reactive nitrogen species. As an indicator of cell damage and inflammation, protein nitrotyrosine serves to reveal biological change associated with various diseases or oxidative stress. Accurate identification of nitrotyrosine site provides the important foundation for further elucidating the mechanism of protein nitrotyrosination. However, experimental identification of nitrotyrosine sites through traditional methods are laborious and expensive. In silico prediction of nitrotyrosine sites based on protein sequence information are thus highly desired. Here, we report a novel predictor, NTyroSite, for accurate prediction of nitrotyrosine sites using sequence evolutionary information. The generated features were optimized using a Wilcoxon-rank sum test. A random forest classifier was then trained using these features to build the predictor. The final NTyroSite predictor achieved an area under a receiver operating characteristics curve (AUC) score of 0.904 in a 10-fold cross-validation test. It also significantly outperformed other existing implementations in an independent test. Meanwhile, for a better understanding of our prediction model, the predominant rules and informative features were extracted from the NTyroSite model to explain the prediction results. We expect that the NTyroSite predictor may serve as a useful computational resource for high-throughput nitrotyrosine site prediction. The online interface of the software is publicly available at https://biocomputer.bio.cuhk.edu.hk/NTyroSite/.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- School of Life Sciences and the State Key Lab of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong.
| | - Mst Shamima Khatun
- Laboratory of Bioinformatics, Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Md Nurul Haque Mollah
- Laboratory of Bioinformatics, Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Cao Yong
- Department of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen 518000, China.
| | - Guo Dianjing
- School of Life Sciences and the State Key Lab of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong.
| |
Collapse
|
21
|
Abstract
Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global-property-based and homology-based prediction. In this chapter, the strengths and drawbacks of each of these approaches is described through many examples of methods that predict secretion, integration into membranes, or subcellular locations in general. The aim of this chapter is to provide a user-level introduction to the field with a minimum of computational theory.
Collapse
Affiliation(s)
- Henrik Nielsen
- Technical University of Denmark, Kemitorvet, Building 208, DK-2800, Kgs. Lyngby, Denmark.
| |
Collapse
|
22
|
Hasan MM, Khatun MS, Mollah MNH, Yong C, Guo D. A systematic identification of species-specific protein succinylation sites using joint element features information. Int J Nanomedicine 2017; 12:6303-6315. [PMID: 28894368 PMCID: PMC5584904 DOI: 10.2147/ijn.s140875] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Lysine succinylation, an important type of protein posttranslational modification, plays significant roles in many cellular processes. Accurate identification of succinylation sites can facilitate our understanding about the molecular mechanism and potential roles of lysine succinylation. However, even in well-studied systems, a majority of the succinylation sites remain undetected because the traditional experimental approaches to succinylation site identification are often costly, time-consuming, and laborious. In silico approach, on the other hand, is potentially an alternative strategy to predict succinylation substrates. In this paper, a novel computational predictor SuccinSite2.0 was developed for predicting generic and species-specific protein succinylation sites. This predictor takes the composition of profile-based amino acid and orthogonal binary features, which were used to train a random forest classifier. We demonstrated that the proposed SuccinSite2.0 predictor outperformed other currently existing implementations on a complementarily independent dataset. Furthermore, the important features that make visible contributions to species-specific and cross-species-specific prediction of protein succinylation site were analyzed. The proposed predictor is anticipated to be a useful computational resource for lysine succinylation site prediction. The integrated species-specific online tool of SuccinSite2.0 is publicly accessible.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- School of Life Sciences and the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, New Territory, Hong Kong, People's Republic of China
| | - Mst Shamima Khatun
- Laboratory of Bioinformatics, Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh
| | - Md Nurul Haque Mollah
- Laboratory of Bioinformatics, Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh
| | - Cao Yong
- Department of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, People's Republic of China
| | - Dianjing Guo
- School of Life Sciences and the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, New Territory, Hong Kong, People's Republic of China
| |
Collapse
|
23
|
An Y, Wang J, Li C, Revote J, Zhang Y, Naderer T, Hayashida M, Akutsu T, Webb GI, Lithgow T, Song J. SecretEPDB: a comprehensive web-based resource for secreted effector proteins of the bacterial types III, IV and VI secretion systems. Sci Rep 2017; 7:41031. [PMID: 28112271 PMCID: PMC5253721 DOI: 10.1038/srep41031] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 12/14/2016] [Indexed: 12/28/2022] Open
Abstract
Bacteria translocate effector molecules to host cells through highly evolved secretion systems. By definition, the function of these effector proteins is to manipulate host cell biology and the sequence, structural and functional annotations of these effector proteins will provide a better understanding of how bacterial secretion systems promote bacterial survival and virulence. Here we developed a knowledgebase, termed SecretEPDB (Bacterial Secreted Effector Protein DataBase), for effector proteins of type III secretion system (T3SS), type IV secretion system (T4SS) and type VI secretion system (T6SS). SecretEPDB provides enriched annotations of the aforementioned three classes of effector proteins by manually extracting and integrating structural and functional information from currently available databases and the literature. The database is conservative and strictly curated to ensure that every effector protein entry is supported by experimental evidence that demonstrates it is secreted by a T3SS, T4SS or T6SS. The annotations of effector proteins documented in SecretEPDB are provided in terms of protein characteristics, protein function, protein secondary structure, Pfam domains, metabolic pathway and evolutionary details. It is our hope that this integrated knowledgebase will serve as a useful resource for biological investigation and the generation of new hypotheses for research efforts aimed at bacterial secretion systems.
Collapse
Affiliation(s)
- Yi An
- College of Information Engineering, Northwest A&F University, Yangling 712100, China.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Jiawei Wang
- School of Electronic and Computer Engineering, Peking University, Beijing 100871, China
| | - Chen Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
| | - Jerico Revote
- Monash Bioinformatics Platform, Monash University, Melbourne, VIC 3800, Australia
| | - Yang Zhang
- College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Thomas Naderer
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Morihiro Hayashida
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Trevor Lithgow
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
| | - Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia.,Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
24
|
Hasan MM, Guo D, Kurata H. Computational identification of protein S-sulfenylation sites by incorporating the multiple sequence features information. MOLECULAR BIOSYSTEMS 2017; 13:2545-2550. [DOI: 10.1039/c7mb00491e] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Cysteine S-sulfenylation is a major type of posttranslational modification that contributes to protein structure and function regulation in many cellular processes.
Collapse
Affiliation(s)
- Md. Mehedi Hasan
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
| | - Dianjing Guo
- School of Life Sciences and the State Key Lab of Agrobiotechnology
- The Chinese University of Hong Kong
- Shatin
- Hong Kong
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
- Biomedical Informatics R&D Center
| |
Collapse
|
25
|
Scheibner F, Schulz S, Hausner J, Marillonnet S, Büttner D. Type III-Dependent Translocation of HrpB2 by a Nonpathogenic hpaABC Mutant of the Plant-Pathogenic Bacterium Xanthomonas campestris pv. vesicatoria. Appl Environ Microbiol 2016; 82:3331-3347. [PMID: 27016569 PMCID: PMC4959247 DOI: 10.1128/aem.00537-16] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Accepted: 03/21/2016] [Indexed: 11/20/2022] Open
Abstract
UNLABELLED The plant-pathogenic bacterium Xanthomonas campestris pv. vesicatoria employs a type III secretion (T3S) system to translocate effector proteins into plant cells. The T3S apparatus spans both bacterial membranes and is associated with an extracellular pilus and a channel-like translocon in the host plasma membrane. T3S is controlled by the switch protein HpaC, which suppresses secretion and translocation of the predicted inner rod protein HrpB2 and promotes secretion of translocon and effector proteins. We previously reported that HrpB2 interacts with HpaC and the cytoplasmic domain of the inner membrane protein HrcU (C. Lorenz, S. Schulz, T. Wolsch, O. Rossier, U. Bonas, and D. Büttner, PLoS Pathog 4:e1000094, 2008, http://dx.doi.org/10.1371/journal.ppat.1000094). However, the molecular mechanisms underlying the control of HrpB2 secretion are not yet understood. Here, we located a T3S and translocation signal in the N-terminal 40 amino acids of HrpB2. The results of complementation experiments with HrpB2 deletion derivatives revealed that the T3S signal of HrpB2 is essential for protein function. Furthermore, interaction studies showed that the N-terminal region of HrpB2 interacts with the cytoplasmic domain of HrcU, suggesting that the T3S signal of HrpB2 contributes to substrate docking. Translocation of HrpB2 is suppressed not only by HpaC but also by the T3S chaperone HpaB and its secreted regulator, HpaA. Deletion of hpaA, hpaB, and hpaC leads to a loss of pathogenicity but allows the translocation of fusion proteins between the HrpB2 T3S signal and effector proteins into leaves of host and non-host plants. IMPORTANCE The T3S system of the plant-pathogenic bacterium Xanthomonas campestris pv. vesicatoria is essential for pathogenicity and delivers effector proteins into plant cells. T3S depends on HrpB2, which is a component of the predicted periplasmic inner rod structure of the secretion apparatus. HrpB2 is secreted during the early stages of the secretion process and interacts with the cytoplasmic domain of the inner membrane protein HrcU. Here, we localized the secretion and translocation signal of HrpB2 in the N-terminal 40 amino acids and show that this region is sufficient for the interaction with the cytoplasmic domain of HrcU. Our results suggest that the T3S signal of HrpB2 is required for the docking of HrpB2 to the secretion apparatus. Furthermore, we provide experimental evidence that the N-terminal region of HrpB2 is sufficient to target effector proteins for translocation in a nonpathogenic X. campestris pv. vesicatoria strain.
Collapse
Affiliation(s)
- Felix Scheibner
- Institute of Biology, Department of Genetics, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Steve Schulz
- Institute of Biology, Department of Genetics, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Jens Hausner
- Institute of Biology, Department of Genetics, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | | | - Daniela Büttner
- Institute of Biology, Department of Genetics, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| |
Collapse
|
26
|
Sonah H, Deshmukh RK, Bélanger RR. Computational Prediction of Effector Proteins in Fungi: Opportunities and Challenges. FRONTIERS IN PLANT SCIENCE 2016; 7:126. [PMID: 26904083 PMCID: PMC4751359 DOI: 10.3389/fpls.2016.00126] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 01/23/2016] [Indexed: 05/20/2023]
Abstract
Effector proteins are mostly secretory proteins that stimulate plant infection by manipulating the host response. Identifying fungal effector proteins and understanding their function is of great importance in efforts to curb losses to plant diseases. Recent advances in high-throughput sequencing technologies have facilitated the availability of several fungal genomes and 1000s of transcriptomes. As a result, the growing amount of genomic information has provided great opportunities to identify putative effector proteins in different fungal species. There is little consensus over the annotation and functionality of effector proteins, and mostly small secretory proteins are considered as effector proteins, a concept that tends to overestimate the number of proteins involved in a plant-pathogen interaction. With the characterization of Avr genes, criteria for computational prediction of effector proteins are becoming more efficient. There are 100s of tools available for the identification of conserved motifs, signature sequences and structural features in the proteins. Many pipelines and online servers, which combine several tools, are made available to perform genome-wide identification of effector proteins. In this review, available tools and pipelines, their strength and limitations for effective identification of fungal effector proteins are discussed. We also present an exhaustive list of classically secreted proteins along with their key conserved motifs found in 12 common plant pathogens (11 fungi and one oomycete) through an analytical pipeline.
Collapse
Affiliation(s)
| | | | - Richard R. Bélanger
- Département de Phytologie, Faculté des Sciences de l’Agriculture et de l’Alimentation, Centre de Recherche en Horticulture, Université Laval, QuébecQC, Canada
| |
Collapse
|
27
|
Dong X, Lu X, Zhang Z. BEAN 2.0: an integrated web resource for the identification and functional analysis of type III secreted effectors. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav064. [PMID: 26120140 PMCID: PMC4483310 DOI: 10.1093/database/bav064] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2015] [Accepted: 06/02/2015] [Indexed: 11/13/2022]
Abstract
Gram-negative pathogenic bacteria inject type III secreted effectors (T3SEs) into host cells to sabotage their immune signaling networks. Because T3SEs constitute a meeting-point of pathogen virulence and host defense, they are of keen interest to host-pathogen interaction research community. To accelerate the identification and functional understanding of T3SEs, we present BEAN 2.0 as an integrated web resource to predict, analyse and store T3SEs. BEAN 2.0 includes three major components. First, it provides an accurate T3SE predictor based on a hybrid approach. Using independent testing data, we show that BEAN 2.0 achieves a sensitivity of 86.05% and a specificity of 100%. Second, it integrates a set of online sequence analysis tools. Users can further perform functional analysis of putative T3SEs in a seamless way, such as subcellular location prediction, functional domain scan and disorder region annotation. Third, it compiles a database covering 1215 experimentally verified T3SEs and constructs two T3SE-related networks that can be used to explore the relationships among T3SEs. Taken together, by presenting a one-stop T3SE bioinformatics resource, we hope BEAN 2.0 can promote comprehensive understanding of the function and evolution of T3SEs.
Collapse
Affiliation(s)
- Xiaobao Dong
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Xiaotian Lu
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
28
|
Hasan MM, Zhou Y, Lu X, Li J, Song J, Zhang Z. Computational Identification of Protein Pupylation Sites by Using Profile-Based Composition of k-Spaced Amino Acid Pairs. PLoS One 2015; 10:e0129635. [PMID: 26080082 PMCID: PMC4469302 DOI: 10.1371/journal.pone.0129635] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2015] [Accepted: 05/10/2015] [Indexed: 11/20/2022] Open
Abstract
Prokaryotic proteins are regulated by pupylation, a type of post-translational modification that contributes to cellular function in bacterial organisms. In pupylation process, the prokaryotic ubiquitin-like protein (Pup) tagging is functionally analogous to ubiquitination in order to tag target proteins for proteasomal degradation. To date, several experimental methods have been developed to identify pupylated proteins and their pupylation sites, but these experimental methods are generally laborious and costly. Therefore, computational methods that can accurately predict potential pupylation sites based on protein sequence information are highly desirable. In this paper, a novel predictor termed as pbPUP has been developed for accurate prediction of pupylation sites. In particular, a sophisticated sequence encoding scheme [i.e. the profile-based composition of k-spaced amino acid pairs (pbCKSAAP)] is used to represent the sequence patterns and evolutionary information of the sequence fragments surrounding pupylation sites. Then, a Support Vector Machine (SVM) classifier is trained using the pbCKSAAP encoding scheme. The final pbPUP predictor achieves an AUC value of 0.849 in10-fold cross-validation tests and outperforms other existing predictors on a comprehensive independent test dataset. The proposed method is anticipated to be a helpful computational resource for the prediction of pupylation sites. The web server and curated datasets in this study are freely available at http://protein.cau.edu.cn/pbPUP/.
Collapse
Affiliation(s)
- Md. Mehedi Hasan
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Yuan Zhou
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Xiaotian Lu
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Jinyan Li
- Advanced Analytics Institute and Centre for Health Technologies, University of Technology, Sydney, 81 Broadway, NSW 2007, Australia
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- Monash Bioinformatics Platform and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC 3800, Australia
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
- * E-mail:
| |
Collapse
|
29
|
Luo J, Li W, Liu Z, Guo Y, Pu X, Li M. A sequence-based two-level method for the prediction of type I secreted RTX proteins. Analyst 2015; 140:3048-56. [PMID: 25800819 DOI: 10.1039/c5an00311c] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Many Gram-negative bacteria use the type I secretion system (T1SS) to translocate a wide range of substrates (type I secreted RTX proteins, T1SRPs) from the cytoplasm across the inner and outer membrane in one step to the extracellular space. Since T1SRPs play an important role in pathogen-host interactions, identifying them is crucial for a full understanding of the pathogenic mechanism of T1SS. However, experimental identification is often time-consuming and expensive. In the post-genomic era, it becomes imperative to predict new T1SRPs using information from the amino acid sequence alone when new proteins are being identified in a high-throughput mode. In this study, we report a two-level method for the first attempt to identify T1SRPs using sequence-derived features and the random forest (RF) algorithm. At the full-length sequence level, the results show that the unique feature of T1SRPs is the presence of variable numbers of the calcium-binding RTX repeats. These RTX repeats have a strong predictive power and so T1SRPs can be well distinguished from non-T1SRPs. At another level, different from that of the secretion signal, we find that a sequence segment located at the last 20-30 C-terminal amino acids may contain important signal information for T1SRP secretion because obvious differences were shown between the corresponding positions of T1SRPs and non-T1SRPs in terms of amino acid and secondary structure compositions. Using five-fold cross-validation, overall accuracies of 97% at the full-length sequence level and 89% at the secretion signal level were achieved through feature evaluation and optimization. Benchmarking on an independent dataset, our method could correctly predict 63 and 66 of 74 T1SRPs at the full-length sequence and secretion signal levels, respectively. We believe that this study will be useful in elucidating the secretion mechanism of T1SS and facilitating hypothesis-driven experimental design and validation.
Collapse
Affiliation(s)
- Jiesi Luo
- College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, PR China.
| | | | | | | | | | | |
Collapse
|
30
|
Yang X, Guo Y, Luo J, Pu X, Li M. Effective identification of Gram-negative bacterial type III secreted effectors using position-specific residue conservation profiles. PLoS One 2013; 8:e84439. [PMID: 24391954 PMCID: PMC3877298 DOI: 10.1371/journal.pone.0084439] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Accepted: 11/07/2013] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Type III secretion systems (T3SSs) are central to the pathogenesis and specifically deliver their secreted substrates (type III secreted proteins, T3SPs) into host cells. Since T3SPs play a crucial role in pathogen-host interactions, identifying them is crucial to our understanding of the pathogenic mechanisms of T3SSs. This study reports a novel and effective method for identifying the distinctive residues which are conserved different from other SPs for T3SPs prediction. Moreover, the importance of several sequence features was evaluated and further, a promising prediction model was constructed. RESULTS Based on the conservation profiles constructed by a position-specific scoring matrix (PSSM), 52 distinctive residues were identified. To our knowledge, this is the first attempt to identify the distinct residues of T3SPs. Of the 52 distinct residues, the first 30 amino acid residues are all included, which is consistent with previous studies reporting that the secretion signal generally occurs within the first 30 residue positions. However, the remaining 22 positions span residues 30-100 were also proven by our method to contain important signal information for T3SP secretion because the translocation of many effectors also depends on the chaperone-binding residues that follow the secretion signal. For further feature optimisation and compression, permutation importance analysis was conducted to select 62 optimal sequence features. A prediction model across 16 species was developed using random forest to classify T3SPs and non-T3 SPs, with high receiver operating curve of 0.93 in the 10-fold cross validation and an accuracy of 94.29% for the test set. Moreover, when performing on a common independent dataset, the results demonstrate that our method outperforms all the others published to date. Finally, the novel, experimentally confirmed T3 effectors were used to further demonstrate the model's correct application. The model and all data used in this paper are freely available at http://cic.scu.edu.cn/bioinformatics/T3SPs.zip.
Collapse
Affiliation(s)
- Xiaojiao Yang
- College of Chemistry, Sichuan University, Chengdu, P.R.China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu, P.R.China
| | - Jiesi Luo
- College of Chemistry, Sichuan University, Chengdu, P.R.China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu, P.R.China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, P.R.China
| |
Collapse
|
31
|
Tung CW. Prediction of pupylation sites using the composition of k-spaced amino acid pairs. J Theor Biol 2013; 336:11-7. [PMID: 23871866 DOI: 10.1016/j.jtbi.2013.07.009] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2013] [Revised: 07/05/2013] [Accepted: 07/10/2013] [Indexed: 11/24/2022]
Abstract
Pupylation is an important post-translational modification in prokaryotes. A prokaryotic ubiquitin-like protein (Pup) is attached to proteins as a signal for selective degradation by proteasome. Several proteomics methods have been developed for the identification of pupylated proteins and pupylation sites. However, pupylation sites of many experimentally identified pupylated proteins are still unknown. The development of sequence-based prediction methods can help to accelerate the identification of pupylation sites and gain insights into the substrate specificity and regulatory functions of pupylation. A novel tool iPUP is developed for the computational identification of pupylation sites. A composition of k-spaced amino acid pairs is utilized to represent a peptide sequence. Top ranked k-spaced amino acid pairs are subsequently selected by using a sequential backward feature elimination algorithm. The 10-fold cross-validation performance of iPUP trained by using the composition of 150 top ranked k-spaced amino acid pairs and support vector machines is 0.83 for the area under receiver operating characteristic curve. The importance analysis of k-spaced amino acid pairs shows that terminal space-containing pairs are useful for discriminating pupylation sites from non-pupylation sites. A sequence analysis confirms that lysines close to C-terminus tend to be pupylated. In contrast, lysines close to N-terminus are less likely to be pupylated. The iPUP tool can predict pupylation sites with probability scores for prioritizing promising pupylation sites. Both the online server and the standalone software of iPUP are freely available for academic use at http://cwtung.kmu.edu.tw/ipup.
Collapse
Affiliation(s)
- Chun-Wei Tung
- School of Pharmacy, Kaohsiung Medical University, Kaohsiung 807, Taiwan; PhD Program in Toxicology, Kaohsiung Medical University, Kaohsiung 807, Taiwan.
| |
Collapse
|
32
|
More Evidence for Secretion Signals within the mRNA of Type 3 Secreted Effectors. J Bacteriol 2013; 195:2117-8. [DOI: 10.1128/jb.00303-13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|