1
|
Ding H, Feng PM, Chen W, Lin H. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. MOLECULAR BIOSYSTEMS 2015; 10:2229-35. [PMID: 24931825 DOI: 10.1039/c4mb00316k] [Citation(s) in RCA: 106] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The bacteriophage virion proteins play extremely important roles in the fate of host bacterial cells. Accurate identification of bacteriophage virion proteins is very important for understanding their functions and clarifying the lysis mechanism of bacterial cells. In this study, a new sequence-based method was developed to identify phage virion proteins. In the new method, the protein sequences were initially formulated by the g-gap dipeptide compositions. Subsequently, the analysis of variance (ANOVA) with incremental feature selection (IFS) was used to search for the optimal feature set. It was observed that, in jackknife cross-validation, the optimal feature set including 160 optimized features can produce the maximum accuracy of 85.02%. By performing feature analysis, we found that the correlation between two amino acids with one gap was more important than other correlations for phage virion protein prediction and that some of the 1-gap dipeptides were important and mainly contributed to the virion protein prediction. This analysis will provide novel insights into the function of phage virion proteins. On the basis of the proposed method, an online web-server, PVPred, was established and can be freely accessed from the website (http://lin.uestc.edu.cn/server/PVPred). We believe that the PVPred will become a powerful tool to study phage virion proteins and to guide the related experimental validations.
Collapse
Affiliation(s)
- Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | | | | | | |
Collapse
|
2
|
Dehzangi A, Heffernan R, Sharma A, Lyons J, Paliwal K, Sattar A. Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. J Theor Biol 2014; 364:284-94. [PMID: 25264267 DOI: 10.1016/j.jtbi.2014.09.029] [Citation(s) in RCA: 178] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Revised: 08/11/2014] [Accepted: 09/17/2014] [Indexed: 11/17/2022]
Abstract
Protein subcellular localization is defined as predicting the functioning location of a given protein in the cell. It is considered an important step towards protein function prediction and drug design. Recent studies have shown that relying on Gene Ontology (GO) for feature extraction can improve protein subcellular localization prediction performance. However, relying solely on GO, this problem remains unsolved. At the same time, the impact of other sources of features especially evolutionary-based features has not been explored adequately for this task. In this study, we aim to extract discriminative evolutionary features to tackle this problem. To do this, we propose two segmentation based feature extraction methods to explore potential local evolutionary-based information for Gram-positive and Gram-negative subcellular localizations. We will show that by applying a Support Vector Machine (SVM) classifier to our extracted features, we are able to enhance Gram-positive and Gram-negative subcellular localization prediction accuracies by up to 6.4% better than previous studies including the studies that used GO for feature extraction.
Collapse
Affiliation(s)
- Abdollah Dehzangi
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; National ICT Australia (NICTA), Brisbane, Australia.
| | - Rhys Heffernan
- School of Engineering, Griffith University, Brisbane, Australia
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; School of Engineering and Physics, University of the South Pacific, Fiji
| | - James Lyons
- School of Engineering, Griffith University, Brisbane, Australia
| | - Kuldip Paliwal
- School of Engineering, Griffith University, Brisbane, Australia
| | - Abdul Sattar
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; National ICT Australia (NICTA), Brisbane, Australia
| |
Collapse
|
3
|
Li X, Wu X, Wu G. Robust feature generation for protein subchloroplast location prediction with a weighted GO transfer model. J Theor Biol 2014; 347:84-94. [PMID: 24423409 DOI: 10.1016/j.jtbi.2014.01.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Revised: 10/17/2013] [Accepted: 01/03/2014] [Indexed: 10/25/2022]
Abstract
Chloroplasts are crucial organelles of green plants and eukaryotic algae since they conduct photosynthesis. Predicting the subchloroplast location of a protein can provide important insights for understanding its biological functions. The performance of subchloroplast location prediction algorithms often depends on deriving predictive and succinct features from genomic and proteomic data. In this work, a novel weighted Gene Ontology (GO) transfer model is proposed to generate discriminating features from sequence data and GO Categories. This model contains two components. First, we transfer the GO terms of the homologous protein, and then assign the bit-score as weights to GO features. Second, we employ term-selection methods to determine weights for GO terms. This model is capable of improving prediction accuracy due to the tolerance of the noise derived from homolog knowledge transfer. The proposed weighted GO transfer method based on bit-score and a logarithmic transformation of CHI-square (WS-LCHI) performs better than the baseline models, and also outperforms the four off-the-shelf subchloroplast prediction methods.
Collapse
Affiliation(s)
- Xiaomei Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, PR China.
| | - Xindong Wu
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, PR China; Department of Computer Science, University of Vermont, Burlington, VT 50405, USA.
| | - Gongqing Wu
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, PR China.
| |
Collapse
|
5
|
Liu X, Zhang X, Tang WH, Chen L, Zhao XM. eFG: an electronic resource for Fusarium graminearum. Database (Oxford) 2013; 2013:bat042. [PMID: 23798489 PMCID: PMC3690120 DOI: 10.1093/database/bat042] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2012] [Revised: 05/04/2013] [Accepted: 05/20/2013] [Indexed: 11/23/2022]
Abstract
Fusarium graminearum is a plant pathogen, which causes crop diseases and further leads to huge economic damage worldwide in past decades. Recently, the accumulation of different types of molecular data provides insights into the pathogenic mechanism of F. graminearum, and might help develop efficient strategies to combat this destructive fungus. Unfortunately, most available molecular data related to F. graminearum are distributed in various media, where each single source only provides limited information on the complex biological systems of the fungus. In this work, we present a comprehensive database, namely eFG (Electronic resource for Fusarium graminearum), to the community for further understanding this destructive pathogen. In particular, a large amount of functional genomics data generated by our group is deposited in eFG, including protein subcellular localizations, protein-protein interactions and orthologous genes in other model organisms. This valuable knowledge can not only help to disclose the molecular underpinnings of pathogenesis of the destructive fungus F. graminearum but also help the community to develop efficient strategies to combat this pathogen. To our best knowledge, eFG is the most comprehensive functional genomics database for F. graminearum until now. The eFG database is freely accessible at http://csb.shu.edu.cn/efg/ with a user-friendly and interactive interface, and all data can be downloaded freely. DATABASE URL: http://csb.shu.edu.cn/efg/
Collapse
Affiliation(s)
- Xiaoping Liu
- Department of Computer Science, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China, Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China, Institute of Systems Biology, Shanghai University, Shanghai 200444, China, National Key Laboratory of Plant Molecular Genetics, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China, Collaborative Research Center for Innovative Mathematical Modelling, Institute of Industrial Science, University of Tokyo, Tokyo 153-8505, Japan
| | - Xiaodong Zhang
- Department of Computer Science, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China, Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China, Institute of Systems Biology, Shanghai University, Shanghai 200444, China, National Key Laboratory of Plant Molecular Genetics, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China, Collaborative Research Center for Innovative Mathematical Modelling, Institute of Industrial Science, University of Tokyo, Tokyo 153-8505, Japan
| | - Wei-Hua Tang
- Department of Computer Science, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China, Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China, Institute of Systems Biology, Shanghai University, Shanghai 200444, China, National Key Laboratory of Plant Molecular Genetics, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China, Collaborative Research Center for Innovative Mathematical Modelling, Institute of Industrial Science, University of Tokyo, Tokyo 153-8505, Japan
| | - Luonan Chen
- Department of Computer Science, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China, Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China, Institute of Systems Biology, Shanghai University, Shanghai 200444, China, National Key Laboratory of Plant Molecular Genetics, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China, Collaborative Research Center for Innovative Mathematical Modelling, Institute of Industrial Science, University of Tokyo, Tokyo 153-8505, Japan
| | - Xing-Ming Zhao
- Department of Computer Science, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China, Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China, Institute of Systems Biology, Shanghai University, Shanghai 200444, China, National Key Laboratory of Plant Molecular Genetics, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China, Collaborative Research Center for Innovative Mathematical Modelling, Institute of Industrial Science, University of Tokyo, Tokyo 153-8505, Japan
| |
Collapse
|
6
|
Prediction of S-glutathionylation sites based on protein sequences. PLoS One 2013; 8:e55512. [PMID: 23418443 PMCID: PMC3572087 DOI: 10.1371/journal.pone.0055512] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2011] [Accepted: 12/30/2012] [Indexed: 01/10/2023] Open
Abstract
S-glutathionylation, the reversible formation of mixed disulfides between glutathione(GSH) and cysteine residues in proteins, is a specific form of post-translational modification that plays important roles in various biological processes, including signal transduction, redox homeostasis, and metabolism inside cells. Experimentally identifying S-glutathionylation sites is labor-intensive and time consuming, whereas bioinformatics methods provide an alternative way to this problem by predicting S-glutathionylation sites in silico. The bioinformatics approaches give not only candidate sites for further experimental verification but also bio-chemical insights into the mechanism of S-glutathionylation. In this paper, we firstly collect experimentally determined S-glutathionylated proteins and their corresponding modification sites from the literature, and then propose a new method for predicting S-glutathionylation sites by employing machine learning methods based on protein sequence data. Promising results are obtained by our method with an AUC (area under ROC curve) score of 0.879 in 5-fold cross-validation, which demonstrates the predictive power of our proposed method. The datasets used in this work are available at http://csb.shu.edu.cn/SGDB.
Collapse
|
8
|
Spanu F, Pasquali M, Scherm B, Balmas V, Marcello A, Ortu G, Dufresne M, Hoffmann L, Daboussi MJ, Migheli Q. Transposition of the miniature inverted-repeat transposable element mimp1 in the wheat pathogen Fusarium culmorum. MOLECULAR PLANT PATHOLOGY 2012; 13:1149-1155. [PMID: 22897438 PMCID: PMC6638673 DOI: 10.1111/j.1364-3703.2012.00823.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
High-throughput methods are needed for functional genomics analysis in Fusarium culmorum, the cause of crown and foot rot on wheat and a type B trichothecene producer. Our aim was to develop and test the efficacy of a double-component system based on the ability of the impala transposase to transactivate the miniature inverted-repeat transposable element mimp1 of Fusarium oxysporum. We report, for the first time, the application of a tagging system based on a heterologous transposon and of splinkerette-polymerase chain reaction to identify mimp1 flanking regions in the filamentous fungus F. culmorum. Similar to previous observations in Fusarium graminearum, mimp1 transposes in F. culmorum by a cut-and-paste mechanism into TA dinucleotides, which are duplicated on insertion. mimp1 was reinserted in open reading frames in 16.4% (i.e. 10 of 61) of the strains analysed, probably spanning throughout the entire genome of F. culmorum. The effectiveness of the mimp1/impala double-component system for gene tagging in F. culmorum was confirmed phenotypically for a putative aurofusarin gene. This system also allowed the identification of two genes putatively involved in oxidative stress-coping capabilities in F. culmorum, as well as a sequence specific to this fungus, thus suggesting the valuable exploratory role of this tool.
Collapse
Affiliation(s)
- Francesca Spanu
- Dipartimento di Agraria - Sezione di Patologia Vegetale ed Entomologia and Centro interdisciplinare per lo sviluppo della ricerca biotecnologica e per lo studio della biodiversità della Sardegna e dell'area Mediterranea, Università degli Studi di Sassari, Via E. De Nicola 9, I-07100 Sassari, Italy
| | | | | | | | | | | | | | | | | | | |
Collapse
|