1
|
Kumar R, Dhanda SK. Bird Eye View of Protein Subcellular Localization Prediction. Life (Basel) 2020; 10:E347. [PMID: 33327400 PMCID: PMC7764902 DOI: 10.3390/life10120347] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 12/11/2020] [Accepted: 12/11/2020] [Indexed: 12/12/2022] Open
Abstract
Proteins are made up of long chain of amino acids that perform a variety of functions in different organisms. The activity of the proteins is determined by the nucleotide sequence of their genes and by its 3D structure. In addition, it is essential for proteins to be destined to their specific locations or compartments to perform their structure and functions. The challenge of computational prediction of subcellular localization of proteins is addressed in various in silico methods. In this review, we reviewed the progress in this field and offered a bird eye view consisting of a comprehensive listing of tools, types of input features explored, machine learning approaches employed, and evaluation matrices applied. We hope the review will be useful for the researchers working in the field of protein localization predictions.
Collapse
Affiliation(s)
- Ravindra Kumar
- Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, NIH, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Sandeep Kumar Dhanda
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| |
Collapse
|
2
|
Sarkar D, Maranas CD. SNPeffect: identifying functional roles of SNPs using metabolic networks. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 103:512-531. [PMID: 32167625 PMCID: PMC9328443 DOI: 10.1111/tpj.14746] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 02/20/2020] [Indexed: 05/04/2023]
Abstract
Genetic sources of phenotypic variation have been a focus of plant studies aimed at improving agricultural yield and understanding adaptive processes. Genome-wide association studies identify the genetic background behind a trait by examining associations between phenotypes and single-nucleotide polymorphisms (SNPs). Although such studies are common, biological interpretation of the results remains a challenge; especially due to the confounding nature of population structure and the systematic biases thus introduced. Here, we propose a complementary analysis (SNPeffect) that offers putative genotype-to-phenotype mechanistic interpretations by integrating biochemical knowledge encoded in metabolic models. SNPeffect is used to explain differential growth rate and metabolite accumulation in A. thaliana and P. trichocarpa accessions as the outcome of SNPs in enzyme-coding genes. To this end, we also constructed a genome-scale metabolic model for Populus trichocarpa, the first for a perennial woody tree. As expected, our results indicate that growth is a complex polygenic trait governed by carbon and energy partitioning. The predicted set of functional SNPs in both species are associated with experimentally characterized growth-determining genes and also suggest putative ones. Functional SNPs were found in pathways such as amino acid metabolism, nucleotide biosynthesis, and cellulose and lignin biosynthesis, in line with breeding strategies that target pathways governing carbon and energy partition.
Collapse
Affiliation(s)
- Debolina Sarkar
- Department of Chemical EngineeringPennsylvania State UniversityUniversity ParkPAUSA
| | - Costas D. Maranas
- Department of Chemical EngineeringPennsylvania State UniversityUniversity ParkPAUSA
| |
Collapse
|
3
|
Pfau T, Christian N, Masakapalli SK, Sweetlove LJ, Poolman MG, Ebenhöh O. The intertwined metabolism during symbiotic nitrogen fixation elucidated by metabolic modelling. Sci Rep 2018; 8:12504. [PMID: 30131500 PMCID: PMC6104047 DOI: 10.1038/s41598-018-30884-x] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 08/07/2018] [Indexed: 11/09/2022] Open
Abstract
Genome-scale metabolic network models can be used for various analyses including the prediction of metabolic responses to changes in the environment. Legumes are well known for their rhizobial symbiosis that introduces nitrogen into the global nutrient cycle. Here, we describe a fully compartmentalised, mass and charge-balanced, genome-scale model of the clover Medicago truncatula, which has been adopted as a model organism for legumes. We employed flux balance analysis to demonstrate that the network is capable of producing biomass components in experimentally observed proportions, during day and night. By connecting the plant model to a model of its rhizobial symbiont, Sinorhizobium meliloti, we were able to investigate the effects of the symbiosis on metabolic fluxes and plant growth and could demonstrate how oxygen availability influences metabolic exchanges between plant and symbiont, thus elucidating potential benefits of inter organism amino acid cycling. We thus provide a modelling framework, in which the interlinked metabolism of plants and nodules can be studied from a theoretical perspective.
Collapse
Affiliation(s)
- Thomas Pfau
- Institute of Complex Systems and Mathematical Biology, University of Aberdeen, Aberdeen, UK
- Life Sciences Research Unit, University of Luxembourg, Belvaux, Luxembourg
| | - Nils Christian
- Institute of Complex Systems and Mathematical Biology, University of Aberdeen, Aberdeen, UK
| | - Shyam K Masakapalli
- School of Basic Sciences, Indian Institute of Technology Mandi, Mandi, India
| | - Lee J Sweetlove
- Department of Plant Sciences, University of Oxford, Oxford, UK
| | - Mark G Poolman
- Department Biological and Medical Sciences, Oxford Brookes University, Oxford, UK
| | - Oliver Ebenhöh
- Institute of Quantitative and Theoretical Biology, Cluster of Excellence on Plant Sciences CEPLAS, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
4
|
Mirzaei Mehrabad E, Hassanzadeh R, Eslahchi C. PMLPR: A novel method for predicting subcellular localization based on recommender systems. Sci Rep 2018; 8:12006. [PMID: 30104743 PMCID: PMC6089892 DOI: 10.1038/s41598-018-30394-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Accepted: 07/30/2018] [Indexed: 12/16/2022] Open
Abstract
The importance of protein subcellular localization problem is due to the importance of protein's functions in different cell parts. Moreover, prediction of subcellular locations helps to identify the potential molecular targets for drugs and has an important role in genome annotation. Most of the existing prediction methods assign only one location for each protein. But, since some proteins move between different subcellular locations, they can have multiple locations. In recent years, some multiple location predictors have been introduced. However, their performances are not accurate enough and there is much room for improvement. In this paper, we introduced a method, PMLPR, to predict locations for a protein. PMLPR predicts a list of locations for each protein based on recommender systems and it can properly overcome the multiple location prediction problem. For evaluating the performance of PMLPR, we considered six datasets RAT, FLY, HUMAN, Du et al., DBMLoc and Höglund. The performance of this algorithm is compared with six state-of-the-art algorithms, YLoc, WOLF-PSORT, prediction channel, MDLoc, Du et al. and MultiLoc2-HighRes. The results indicate that our proposed method is significantly superior on RAT and Fly proteins, and decent on HUMAN proteins. Moreover, on the datasets introduced by Du et al., DBMLoc and Höglund, PMLPR has comparable results. For the case study, we applied the algorithms on 8 proteins which are important in cancer research. The results of comparison with other methods indicate the efficiency of PMLPR.
Collapse
Affiliation(s)
- Elnaz Mirzaei Mehrabad
- Department of Computer Science, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| | - Reza Hassanzadeh
- Department of Engineering Sciences, Faculty of Advanced Technologies, University of Mohaghegh Ardabili, Namin, Iran
- Department of Bioinformatics, Faculty of Computer Engineering and Information Technology, Sabalan University of Advanced Technologies (SUAT), Namin, Iran
| | - Changiz Eslahchi
- Department of Computer Science, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran.
- School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| |
Collapse
|
5
|
Predicting protein subcellular localization based on information content of gene ontology terms. Comput Biol Chem 2016; 65:1-7. [PMID: 27665466 DOI: 10.1016/j.compbiolchem.2016.09.009] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Revised: 07/10/2016] [Accepted: 09/11/2016] [Indexed: 01/11/2023]
Abstract
Predicting the location where a protein resides within a cell is important in cell biology. Computational approaches to this issue have attracted more and more attentions from the community of biomedicine. Among the protein features used to predict the subcellular localization of proteins, the feature derived from Gene Ontology (GO) has been shown to be superior to others. However, most of the sights in this field are set on the presence or absence of some predefined GO terms. We proposed a method to derive information from the intrinsic structure of the GO graph. The feature vector was constructed with each element in it representing the information content of the GO term annotating to a protein investigated, and the support vector machines was used as classifier to test our extracted features. Evaluation experiments were conducted on three protein datasets and the results show that our method can enhance eukaryotic and human subcellular location prediction accuracy by up to 1.1% better than previous studies that also used GO-based features. Especially in the scenario where the cellular component annotation is absent, our method can achieved satisfied results with an overall accuracy of more than 87%.
Collapse
|
6
|
van Heck RGA, Ganter M, Martins dos Santos VAP, Stelling J. Efficient Reconstruction of Predictive Consensus Metabolic Network Models. PLoS Comput Biol 2016; 12:e1005085. [PMID: 27563720 PMCID: PMC5001716 DOI: 10.1371/journal.pcbi.1005085] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 07/29/2016] [Indexed: 01/08/2023] Open
Abstract
Understanding cellular function requires accurate, comprehensive representations of metabolism. Genome-scale, constraint-based metabolic models (GSMs) provide such representations, but their usability is often hampered by inconsistencies at various levels, in particular for concurrent models. COMMGEN, our tool for COnsensus Metabolic Model GENeration, automatically identifies inconsistencies between concurrent models and semi-automatically resolves them, thereby contributing to consolidate knowledge of metabolic function. Tests of COMMGEN for four organisms showed that automatically generated consensus models were predictive and that they substantially increased coherence of knowledge representation. COMMGEN ought to be particularly useful for complex scenarios in which manual curation does not scale, such as for eukaryotic organisms, microbial communities, and host-pathogen interactions.
Collapse
Affiliation(s)
- Ruben G. A. van Heck
- Department of Biosystems Science and Engineering and Swiss Institute of Bioinformatics, ETH Zurich, Basel, Switzerland
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, The Netherlands
| | - Mathias Ganter
- Department of Biosystems Science and Engineering and Swiss Institute of Bioinformatics, ETH Zurich, Basel, Switzerland
| | - Vitor A. P. Martins dos Santos
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, The Netherlands
- LifeGlimmer GmbH, Berlin, Germany
- * E-mail: (VAPMdS); (JS)
| | - Joerg Stelling
- Department of Biosystems Science and Engineering and Swiss Institute of Bioinformatics, ETH Zurich, Basel, Switzerland
- * E-mail: (VAPMdS); (JS)
| |
Collapse
|
7
|
Ganter M, Kaltenbach HM, Stelling J. Predicting network functions with nested patterns. Nat Commun 2015; 5:3006. [PMID: 24398547 DOI: 10.1038/ncomms4006] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2013] [Accepted: 11/24/2013] [Indexed: 12/20/2022] Open
Abstract
Identifying suitable patterns in complex biological interaction networks helps understanding network functions and allows for predictions at the pattern level: by recognizing a known pattern, one can assign its previously established function. However, current approaches fail for previously unseen patterns, when patterns overlap and when they are embedded into a new network context. Here we show how to conceptually extend pattern-based approaches. We define metabolite patterns in metabolic networks that formalize co-occurrences of metabolites. Our probabilistic framework decodes the implicit information in the networks' metabolite patterns to predict metabolic functions. We demonstrate the predictive power by identifying 'indicator patterns', for instance, for enzyme classification, by predicting directions of novel reactions and of known reactions in new network contexts, and by ranking candidate network extensions for gap filling. Beyond their use in improving genome annotations and metabolic network models, we expect that the concepts transfer to other network types.
Collapse
Affiliation(s)
- Mathias Ganter
- 1] Department of Biosystems Science & Engineering and Swiss Institute of Bioinformatics, ETH Zurich, Mattenstr. 26, 4058 Basel, Switzerland [2]
| | - Hans-Michael Kaltenbach
- 1] Department of Biosystems Science & Engineering and Swiss Institute of Bioinformatics, ETH Zurich, Mattenstr. 26, 4058 Basel, Switzerland [2]
| | - Jörg Stelling
- Department of Biosystems Science & Engineering and Swiss Institute of Bioinformatics, ETH Zurich, Mattenstr. 26, 4058 Basel, Switzerland
| |
Collapse
|
8
|
Töpfer N, Kleessen S, Nikoloski Z. Integration of metabolomics data into metabolic networks. FRONTIERS IN PLANT SCIENCE 2015; 6:49. [PMID: 25741348 PMCID: PMC4330704 DOI: 10.3389/fpls.2015.00049] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 01/19/2015] [Indexed: 05/08/2023]
Abstract
Metabolite levels together with their corresponding metabolic fluxes are integrative outcomes of biochemical transformations and regulatory processes and they can be used to characterize the response of biological systems to genetic and/or environmental changes. However, while changes in transcript or to some extent protein levels can usually be traced back to one or several responsible genes, changes in fluxes and particularly changes in metabolite levels do not follow such rationale and are often the outcome of complex interactions of several components. The increasing quality and coverage of metabolomics technologies have fostered the development of computational approaches for integrating metabolic read-outs with large-scale models to predict the physiological state of a system. Constraint-based approaches, relying on the stoichiometry of the considered reactions, provide a modeling framework amenable to analyses of large-scale systems and to the integration of high-throughput data. Here we review the existing approaches that integrate metabolomics data in variants of constrained-based approaches to refine model reconstructions, to constrain flux predictions in metabolic models, and to relate network structural properties to metabolite levels. Finally, we discuss the challenges and perspectives in the developments of constraint-based modeling approaches driven by metabolomics data.
Collapse
Affiliation(s)
- Nadine Töpfer
- Systems Biology and Mathematical Modeling Group, Department Willmitzer, Max-Planck Institute of Molecular Plant PhysiologyPotsdam, Germany
- Department of Plant Sciences, Weizmann Institute of ScienceRehovot, Israel
| | - Sabrina Kleessen
- Systems Biology and Mathematical Modeling Group, Department Willmitzer, Max-Planck Institute of Molecular Plant PhysiologyPotsdam, Germany
- Targenomix GmbHPotsdam, Germany
| | - Zoran Nikoloski
- Systems Biology and Mathematical Modeling Group, Department Willmitzer, Max-Planck Institute of Molecular Plant PhysiologyPotsdam, Germany
- *Correspondence: Zoran Nikoloski, Systems Biology and Mathematical Modeling Group, Department Willmitzer, Max-Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany e-mail:
| |
Collapse
|
9
|
Recent advances in the reconstruction of metabolic models and integration of omics data. Curr Opin Biotechnol 2014; 29:39-45. [DOI: 10.1016/j.copbio.2014.02.011] [Citation(s) in RCA: 102] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Revised: 02/04/2014] [Accepted: 02/13/2014] [Indexed: 11/22/2022]
|
10
|
Pitkänen E, Jouhten P, Hou J, Syed MF, Blomberg P, Kludas J, Oja M, Holm L, Penttilä M, Rousu J, Arvas M. Comparative genome-scale reconstruction of gapless metabolic networks for present and ancestral species. PLoS Comput Biol 2014; 10:e1003465. [PMID: 24516375 PMCID: PMC3916221 DOI: 10.1371/journal.pcbi.1003465] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2013] [Accepted: 12/18/2013] [Indexed: 12/12/2022] Open
Abstract
We introduce a novel computational approach, CoReCo, for comparative metabolic reconstruction and provide genome-scale metabolic network models for 49 important fungal species. Leveraging on the exponential growth in sequenced genome availability, our method reconstructs genome-scale gapless metabolic networks simultaneously for a large number of species by integrating sequence data in a probabilistic framework. High reconstruction accuracy is demonstrated by comparisons to the well-curated Saccharomyces cerevisiae consensus model and large-scale knock-out experiments. Our comparative approach is particularly useful in scenarios where the quality of available sequence data is lacking, and when reconstructing evolutionary distant species. Moreover, the reconstructed networks are fully carbon mapped, allowing their use in 13C flux analysis. We demonstrate the functionality and usability of the reconstructed fungal models with computational steady-state biomass production experiment, as these fungi include some of the most important production organisms in industrial biotechnology. In contrast to many existing reconstruction techniques, only minimal manual effort is required before the reconstructed models are usable in flux balance experiments. CoReCo is available at http://esaskar.github.io/CoReCo/. Advances in next-generation sequencing technologies are revolutionizing molecular biology. Sequencing-enabled cost-effective characterization of microbial genomes is a particularly exciting development in metabolic engineering. There, considerable effort has been put to reconstructing genome-scale metabolic networks that describe the collection of hundreds to thousands of biochemical reactions available for a microbial cell. These network models are instrumental in understanding microbial metabolism and guiding metabolic engineering efforts to improve biochemical yields. We have developed a novel computational method, CoReCo, which bridges the growing gap between the availability of sequenced genomes and respective reconstructed metabolic networks. The method reconstructs genome-scale metabolic networks simultaneously for related microbial species. It utilizes the available sequencing data from these species to correct for incomplete and missing data. We used the method to reconstruct metabolic networks for a set of 49 fungal species providing the method protein sequence data and a phylogenetic tree describing the evolutionary relationships between the species. We demonstrate the applicability of the method by comparing a metabolic reconstruction of Saccharomyces cerevisiae to the manually curated, high-quality consensus network. We also provide an easy-to-use implementation of the method, usable both in single computer and distributed computing environments.
Collapse
Affiliation(s)
- Esa Pitkänen
- Department of Computer Science, University of Helsinki, Helsinki, Finland
- Department of Medical Genetics, Genome-Scale Biology Research Program, University of Helsinki, Helsinki, Finland
- * E-mail:
| | - Paula Jouhten
- VTT Technical Research Centre of Finland, Espoo, Finland
| | - Jian Hou
- Department of Computer Science, University of Helsinki, Helsinki, Finland
- Department of Information and Computer Science, Aalto University, Espoo, Finland
| | | | - Peter Blomberg
- VTT Technical Research Centre of Finland, Espoo, Finland
| | - Jana Kludas
- Department of Information and Computer Science, Aalto University, Espoo, Finland
| | - Merja Oja
- VTT Technical Research Centre of Finland, Espoo, Finland
| | - Liisa Holm
- Institute of Biotechnology & Department of Biosciences, University of Helsinki, Helsinki, Finland
| | - Merja Penttilä
- VTT Technical Research Centre of Finland, Espoo, Finland
| | - Juho Rousu
- Department of Information and Computer Science, Aalto University, Espoo, Finland
| | - Mikko Arvas
- VTT Technical Research Centre of Finland, Espoo, Finland
| |
Collapse
|
11
|
Stekhoven DJ, Omasits U, Quebatte M, Dehio C, Ahrens CH. Proteome-wide identification of predominant subcellular protein localizations in a bacterial model organism. J Proteomics 2014; 99:123-37. [PMID: 24486812 DOI: 10.1016/j.jprot.2014.01.015] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2013] [Revised: 01/12/2014] [Accepted: 01/15/2014] [Indexed: 01/04/2023]
Abstract
UNLABELLED Proteomics data provide unique insights into biological systems, including the predominant subcellular localization (SCL) of proteins, which can reveal important clues about their functions. Here we analyzed data of a complete prokaryotic proteome expressed under two conditions mimicking interaction of the emerging pathogen Bartonella henselae with its mammalian host. Normalized spectral count data from cytoplasmic, total membrane, inner and outer membrane fractions allowed us to identify the predominant SCL for 82% of the identified proteins. The spectral count proportion of total membrane versus cytoplasmic fractions indicated the propensity of cytoplasmic proteins to co-fractionate with the inner membrane, and enabled us to distinguish cytoplasmic, peripheral inner membrane and bona fide inner membrane proteins. Principal component analysis and k-nearest neighbor classification training on selected marker proteins or predominantly localized proteins, allowed us to determine an extensive catalog of at least 74 expressed outer membrane proteins, and to extend the SCL assignment to 94% of the identified proteins, including 18% where in silico methods gave no prediction. Suitable experimental proteomics data combined with straightforward computational approaches can thus identify the predominant SCL on a proteome-wide scale. Finally, we present a conceptual approach to identify proteins potentially changing their SCL in a condition-dependent fashion. BIOLOGICAL SIGNIFICANCE The work presented here describes the first prokaryotic proteome-wide subcellular localization (SCL) dataset for the emerging pathogen B. henselae (Bhen). The study indicates that suitable subcellular fractionation experiments combined with straight-forward computational analysis approaches assessing the proportion of spectral counts observed in different subcellular fractions are powerful for determining the predominant SCL of a large percentage of the experimentally observed proteins. This includes numerous cases where in silico prediction methods do not provide any prediction. Avoiding a treatment with harsh conditions, cytoplasmic proteins tend to co-fractionate with proteins of the inner membrane fraction, indicative of close functional interactions. The spectral count proportion (SCP) of total membrane versus cytoplasmic fractions allowed us to obtain a good indication about the relative proximity of individual protein complex members to the inner membrane. Using principal component analysis and k-nearest neighbor approaches, we were able to extend the percentage of proteins with a predominant experimental localization to over 90% of all expressed proteins and identified a set of at least 74 outer membrane (OM) proteins. In general, OM proteins represent a rich source of candidates for the development of urgently needed new therapeutics in combat of resurgence of infectious disease and multi-drug resistant bacteria. Finally, by comparing the data from two infection biology relevant conditions, we conceptually explore methods to identify and visualize potential candidates that may partially change their SCL in these different conditions. The data are made available to researchers as a SCL compendium for Bhen and as an assistance in further improving in silico SCL prediction algorithms.
Collapse
Affiliation(s)
- Daniel J Stekhoven
- Quantitative Model Organism Proteomics (Q-MOP), Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland.
| | - Ulrich Omasits
- Quantitative Model Organism Proteomics (Q-MOP), Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland; Institute of Molecular Systems Biology, ETH Zurich, Auguste-Piccard-Hof 1, 8093 Zurich, Switzerland
| | - Maxime Quebatte
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, 4056 Basel, Switzerland
| | - Christoph Dehio
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, 4056 Basel, Switzerland
| | - Christian H Ahrens
- Quantitative Model Organism Proteomics (Q-MOP), Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland.
| |
Collapse
|
12
|
Predicting human protein subcellular locations by the ensemble of multiple predictors via protein-protein interaction network with edge clustering coefficients. PLoS One 2014; 9:e86879. [PMID: 24466278 PMCID: PMC3900678 DOI: 10.1371/journal.pone.0086879] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 12/18/2013] [Indexed: 12/14/2022] Open
Abstract
One of the fundamental tasks in biology is to identify the functions of all proteins to reveal the primary machinery of a cell. Knowledge of the subcellular locations of proteins will provide key hints to reveal their functions and to understand the intricate pathways that regulate biological processes at the cellular level. Protein subcellular location prediction has been extensively studied in the past two decades. A lot of methods have been developed based on protein primary sequences as well as protein-protein interaction network. In this paper, we propose to use the protein-protein interaction network as an infrastructure to integrate existing sequence based predictors. When predicting the subcellular locations of a given protein, not only the protein itself, but also all its interacting partners were considered. Unlike existing methods, our method requires neither the comprehensive knowledge of the protein-protein interaction network nor the experimentally annotated subcellular locations of most proteins in the protein-protein interaction network. Besides, our method can be used as a framework to integrate multiple predictors. Our method achieved 56% on human proteome in absolute-true rate, which is higher than the state-of-the-art methods.
Collapse
|
13
|
Li X, Wu X, Wu G. Robust feature generation for protein subchloroplast location prediction with a weighted GO transfer model. J Theor Biol 2014; 347:84-94. [PMID: 24423409 DOI: 10.1016/j.jtbi.2014.01.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Revised: 10/17/2013] [Accepted: 01/03/2014] [Indexed: 10/25/2022]
Abstract
Chloroplasts are crucial organelles of green plants and eukaryotic algae since they conduct photosynthesis. Predicting the subchloroplast location of a protein can provide important insights for understanding its biological functions. The performance of subchloroplast location prediction algorithms often depends on deriving predictive and succinct features from genomic and proteomic data. In this work, a novel weighted Gene Ontology (GO) transfer model is proposed to generate discriminating features from sequence data and GO Categories. This model contains two components. First, we transfer the GO terms of the homologous protein, and then assign the bit-score as weights to GO features. Second, we employ term-selection methods to determine weights for GO terms. This model is capable of improving prediction accuracy due to the tolerance of the noise derived from homolog knowledge transfer. The proposed weighted GO transfer method based on bit-score and a logarithmic transformation of CHI-square (WS-LCHI) performs better than the baseline models, and also outperforms the four off-the-shelf subchloroplast prediction methods.
Collapse
Affiliation(s)
- Xiaomei Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, PR China.
| | - Xindong Wu
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, PR China; Department of Computer Science, University of Vermont, Burlington, VT 50405, USA.
| | - Gongqing Wu
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, PR China.
| |
Collapse
|
14
|
Du P, Xu C. Predicting multisite protein subcellular locations: progress and challenges. Expert Rev Proteomics 2014; 10:227-37. [DOI: 10.1586/epr.13.16] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
15
|
Lee K, Byun K, Hong W, Chuang HY, Pack CG, Bayarsaikhan E, Paek SH, Kim H, Shin HY, Ideker T, Lee B. Proteome-wide discovery of mislocated proteins in cancer. Genome Res 2013; 23:1283-94. [PMID: 23674306 PMCID: PMC3730102 DOI: 10.1101/gr.155499.113] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Several studies have sought systematically to identify protein subcellular locations, but an even larger task is to map which of these proteins conditionally relocates in disease (the mislocalizome). Here, we report an integrative computational framework for mapping conditional location and mislocation of proteins on a proteome-wide scale, called a conditional location predictor (CoLP). Using CoLP, we mapped the locations of over 10,000 proteins in normal human brain and in glioma. The prediction showed 0.9 accuracy using 100 location tests of 20 randomly selected proteins. Of the 10,000 proteins, over 150 have a strong likelihood of mislocation under glioma, which is striking considering that few mislocation events have been identified in this disease previously. Using immunofluorescence and Western blotting in both primary cells and tissues, we successfully experimentally confirmed 15 mislocations. The most common type of mislocation occurs between the endoplasmic reticulum and the nucleus; for example, for RNF138, TLX3, and NFRKB. In particular, we found that the gene for the mislocating protein GFRA4 had a nonsynonymous point mutation in exon 2. Moreover, redirection of GFRA4 to its normal location, the plasma membrane, led to marked reductions in phospho-STAT3 and proliferation of glioma cells. This framework has the potential to track changes in protein location in many human diseases.
Collapse
Affiliation(s)
- KiYoung Lee
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon 443-749, Korea.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Reconstruction of Danio rerio metabolic model accounting for subcellular compartmentalisation. PLoS One 2012; 7:e49903. [PMID: 23166792 PMCID: PMC3498201 DOI: 10.1371/journal.pone.0049903] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2012] [Accepted: 10/16/2012] [Indexed: 11/19/2022] Open
Abstract
Plant and microbial metabolic engineering is commonly used in the production of functional foods and quality trait improvement. Computational model-based approaches have been used in this important endeavour. However, to date, fish metabolic models have only been scarcely and partially developed, in marked contrast to their prominent success in metabolic engineering. In this study we present the reconstruction of fully compartmentalised models of the Danio rerio (zebrafish) on a global scale. This reconstruction involves extraction of known biochemical reactions in D. rerio for both primary and secondary metabolism and the implementation of methods for determining subcellular localisation and assignment of enzymes. The reconstructed model (ZebraGEM) is amenable for constraint-based modelling analysis, and accounts for 4,988 genes coding for 2,406 gene-associated reactions and only 418 non-gene-associated reactions. A set of computational validations (i.e., simulations of known metabolic functionalities and experimental data) strongly testifies to the predictive ability of the model. Overall, the reconstructed model is expected to lay down the foundations for computational-based rational design of fish metabolic engineering in aquaculture.
Collapse
|
17
|
Schmidt BJ, Papin JA, Musante CJ. Mechanistic systems modeling to guide drug discovery and development. Drug Discov Today 2012; 18:116-27. [PMID: 22999913 DOI: 10.1016/j.drudis.2012.09.003] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2012] [Revised: 08/17/2012] [Accepted: 09/05/2012] [Indexed: 01/24/2023]
Abstract
A crucial question that must be addressed in the drug development process is whether the proposed therapeutic target will yield the desired effect in the clinical population. Pharmaceutical and biotechnology companies place a large investment on research and development, long before confirmatory data are available from human trials. Basic science has greatly expanded the computable knowledge of disease processes, both through the generation of large omics data sets and a compendium of studies assessing cellular and systemic responses to physiologic and pathophysiologic stimuli. Given inherent uncertainties in drug development, mechanistic systems models can better inform target selection and the decision process for advancing compounds through preclinical and clinical research.
Collapse
Affiliation(s)
- Brian J Schmidt
- Department of Bioengineering, University of California at San Diego, La Jolla, CA 92093-0412, USA
| | | | | |
Collapse
|
18
|
de Oliveira Dal'Molin CG, Nielsen LK. Plant genome-scale metabolic reconstruction and modelling. Curr Opin Biotechnol 2012; 24:271-7. [PMID: 22947602 DOI: 10.1016/j.copbio.2012.08.007] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2012] [Revised: 08/15/2012] [Accepted: 08/17/2012] [Indexed: 11/26/2022]
Abstract
Genome-scale metabolic reconstructions are used extensively in the study of microbial metabolism and have proven powerful tools to guide rational pathway design of industrial strains. Generation and curation of plant genome-scale metabolic models has proven far more challenging, not the least of which is our incomplete knowledge of compartmentation and organelle transporters in plants. Conversely, the potential value of modelling is far greater when exploring a complex, multi-organelle and multi-tissue metabolism. The first generation of plant genome-scale metabolic reconstructions have proven surprisingly functional and robust as well as capable of predicting many observed complex phenotypes. With further refinement, the application of these models promises to make important contributions to plant biology and metabolic engineering.
Collapse
|
19
|
Reumann S, Buchwald D, Lingner T. PredPlantPTS1: A Web Server for the Prediction of Plant Peroxisomal Proteins. FRONTIERS IN PLANT SCIENCE 2012; 3:194. [PMID: 22969783 PMCID: PMC3427985 DOI: 10.3389/fpls.2012.00194] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2012] [Accepted: 08/06/2012] [Indexed: 05/04/2023]
Abstract
Prediction of subcellular protein localization is essential to correctly assign unknown proteins to cell organelle-specific protein networks and to ultimately determine protein function. For metazoa, several computational approaches have been developed in the past decade to predict peroxisomal proteins carrying the peroxisome targeting signal type 1 (PTS1). However, plant-specific PTS1 protein prediction methods have been lacking up to now, and pre-existing methods generally were incapable of correctly predicting low-abundance plant proteins possessing non-canonical PTS1 patterns. Recently, we presented a machine learning approach that is able to predict PTS1 proteins for higher plants (spermatophytes) with high accuracy and which can correctly identify unknown targeting patterns, i.e., novel PTS1 tripeptides and tripeptide residues. Here we describe the first plant-specific web server PredPlantPTS1 for the prediction of plant PTS1 proteins using the above-mentioned underlying models. The server allows the submission of protein sequences from diverse spermatophytes and also performs well for mosses and algae. The easy-to-use web interface provides detailed output in terms of (i) the peroxisomal targeting probability of the given sequence, (ii) information whether a particular non-canonical PTS1 tripeptide has already been experimentally verified, and (iii) the prediction scores for the single C-terminal 14 amino acid residues. The latter allows identification of predicted residues that inhibit peroxisome targeting and which can be optimized using site-directed mutagenesis to raise the peroxisome targeting efficiency. The prediction server will be instrumental in identifying low-abundance and stress-inducible peroxisomal proteins and defining the entire peroxisomal proteome of Arabidopsis and agronomically important crop plants. PredPlantPTS1 is freely accessible at ppp.gobics.de.
Collapse
Affiliation(s)
- Sigrun Reumann
- Center for Organelle Research, University of StavangerStavanger, Norway
| | - Daniela Buchwald
- Department of Bioinformatics, University of GöttingenGöttingen, Germany
| | - Thomas Lingner
- Department of Bioinformatics, University of GöttingenGöttingen, Germany
| |
Collapse
|
20
|
Chowdhary G, Kataya ARA, Lingner T, Reumann S. Non-canonical peroxisome targeting signals: identification of novel PTS1 tripeptides and characterization of enhancer elements by computational permutation analysis. BMC PLANT BIOLOGY 2012; 12:142. [PMID: 22882975 PMCID: PMC3487989 DOI: 10.1186/1471-2229-12-142] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2012] [Accepted: 07/13/2012] [Indexed: 05/04/2023]
Abstract
BACKGROUND High-accuracy prediction tools are essential in the post-genomic era to define organellar proteomes in their full complexity. We recently applied a discriminative machine learning approach to predict plant proteins carrying peroxisome targeting signals (PTS) type 1 from genome sequences. For Arabidopsis thaliana 392 gene models were predicted to be peroxisome-targeted. The predictions were extensively tested in vivo, resulting in a high experimental verification rate of Arabidopsis proteins previously not known to be peroxisomal. RESULTS In this study, we experimentally validated the predictions in greater depth by focusing on the most challenging Arabidopsis proteins with unknown non-canonical PTS1 tripeptides and prediction scores close to the threshold. By in vivo subcellular targeting analysis, three novel PTS1 tripeptides (QRL>, SQM>, and SDL>) and two novel tripeptide residues (Q at position -3 and D at pos. -2) were identified. To understand why, among many Arabidopsis proteins carrying the same C-terminal tripeptides, these proteins were specifically predicted as peroxisomal, the residues upstream of the PTS1 tripeptide were computationally permuted and the changes in prediction scores were analyzed. The newly identified Arabidopsis proteins were found to contain four to five amino acid residues of high predicted targeting enhancing properties at position -4 to -12 in front of the non-canonical PTS1 tripeptide. The identity of the predicted targeting enhancing residues was unexpectedly diverse, comprising besides basic residues also proline, hydroxylated (Ser, Thr), hydrophobic (Ala, Val), and even acidic residues. CONCLUSIONS Our computational and experimental analyses demonstrate that the plant PTS1 tripeptide motif is more diverse than previously thought, including an increasing number of non-canonical sequences and allowed residues. Specific targeting enhancing elements can be predicted for particular sequences of interest and are far more diverse in amino acid composition and positioning than previously assumed. Machine learning methods become indispensable to predict which specific proteins, among numerous candidate proteins carrying the same non-canonical PTS1 tripeptide, contain sufficient enhancer elements in terms of number, positioning and total strength to cause peroxisome targeting.
Collapse
Affiliation(s)
- Gopal Chowdhary
- Centre for Organelle Research, University of Stavanger, N-4036, Stavanger, Norway
- KIIT School of Biotechnology, Campus XI, KIIT University, Bhubaneswar, 751024, India
| | - Amr RA Kataya
- Centre for Organelle Research, University of Stavanger, N-4036, Stavanger, Norway
| | - Thomas Lingner
- Department of Bioinformatics, Institute for Microbiology and Genetics, D-37077, Goettingen, Germany
| | - Sigrun Reumann
- Centre for Organelle Research, University of Stavanger, N-4036, Stavanger, Norway
| |
Collapse
|
21
|
Seaver SMD, Henry CS, Hanson AD. Frontiers in metabolic reconstruction and modeling of plant genomes. JOURNAL OF EXPERIMENTAL BOTANY 2012; 63:2247-58. [PMID: 22238452 DOI: 10.1093/jxb/err371] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
A major goal of post-genomic biology is to reconstruct and model in silico the metabolic networks of entire organisms. Work on bacteria is well advanced, and is now under way for plants and other eukaryotes. Genome-scale modelling in plants is much more challenging than in bacteria. The challenges come from features characteristic of higher organisms (subcellular compartmentation, tissue differentiation) and also from the particular severity in plants of a general problem: genome content whose functions remain undiscovered. This problem results in thousands of genes for which no function is known ('undiscovered genome content') and hundreds of enzymatic and transport functions for which no gene is yet identified. The severity of the undiscovered genome content problem in plants reflects their genome size and complexity. To bring the challenges of plant genome-scale modelling into focus, we first summarize the current status of plant genome-scale models. We then highlight the challenges - and ways to address them - in three areas: identifying genes for missing processes, modelling tissues as opposed to single cells, and finding metabolic functions encoded by undiscovered genome content. We also discuss the emerging view that a significant fraction of undiscovered genome content encodes functions that counter damage to metabolites inflicted by spontaneous chemical reactions or enzymatic mistakes.
Collapse
Affiliation(s)
- Samuel M D Seaver
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA
| | | | | |
Collapse
|
22
|
Lewis NE, Nagarajan H, Palsson BO. Constraining the metabolic genotype-phenotype relationship using a phylogeny of in silico methods. Nat Rev Microbiol 2012; 10:291-305. [PMID: 22367118 DOI: 10.1038/nrmicro2737] [Citation(s) in RCA: 537] [Impact Index Per Article: 44.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Reconstructed microbial metabolic networks facilitate a mechanistic description of the genotype-phenotype relationship through the deployment of constraint-based reconstruction and analysis (COBRA) methods. As reconstructed networks leverage genomic data for insight and phenotype prediction, the development of COBRA methods has accelerated following the advent of whole-genome sequencing. Here, we describe a phylogeny of COBRA methods that has rapidly evolved from the few early methods, such as flux balance analysis and elementary flux mode analysis, into a repertoire of more than 100 methods. These methods have enabled genome-scale analysis of microbial metabolism for numerous basic and applied uses, including antibiotic discovery, metabolic engineering and modelling of microbial community behaviour.
Collapse
Affiliation(s)
- Nathan E Lewis
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0412, USA
| | | | | |
Collapse
|
23
|
Kleessen S, Araújo WL, Fernie AR, Nikoloski Z. Model-based confirmation of alternative substrates of mitochondrial electron transport chain. J Biol Chem 2012; 287:11122-31. [PMID: 22334689 DOI: 10.1074/jbc.m111.310383] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Discrimination of metabolic models based on high throughput metabolomics data, reflecting various internal and external perturbations, is essential for identifying the components that contribute to the emerging behavior of metabolic processes. Here, we investigate 12 different models of the mitochondrial electron transport chain (ETC) in Arabidopsis thaliana during dark-induced senescence in order to elucidate the alternative substrates to this metabolic pathway. Our findings demonstrate that the coupling of the proposed computational approach, based on dynamic flux balance analysis, with time-resolved metabolomics data results in model-based confirmations of the hypotheses that, during dark-induced senescence in Arabidopsis, (i) under conditions where the main substrate for the ETC are not fully available, isovaleryl-CoA dehydrogenase and 2-hydroxyglutarate dehydrogenase are able to donate electrons to the ETC, (ii) phytanoyl-CoA does not act even as an indirect substrate of the electron transfer flavoprotein/electron-transfer flavoprotein:ubiquinone oxidoreductase complex, and (iii) the mitochondrial γ-aminobutyric acid transporter has functional significance in maintaining mitochondrial metabolism. Our study provides a basic framework for future in silico studies of alternative pathways in mitochondrial metabolism under extended darkness whereby the role of its components can be computationally discriminated based on available molecular profile data.
Collapse
Affiliation(s)
- Sabrina Kleessen
- Max-Planck Institute of Molecular Plant Physiology, 14476 Potsdam-Golm, Germany
| | | | | | | |
Collapse
|
24
|
Lingner T, Kataya ARA, Reumann S. Experimental and statistical post-validation of positive example EST sequences carrying peroxisome targeting signals type 1 (PTS1). PLANT SIGNALING & BEHAVIOR 2012; 7:263-8. [PMID: 22415050 PMCID: PMC3405698 DOI: 10.4161/psb.18720] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
We recently developed the first algorithms specifically for plants to predict proteins carrying peroxisome targeting signals type 1 (PTS1) from genome sequences. As validated experimentally, the prediction methods are able to correctly predict unknown peroxisomal Arabidopsis proteins and to infer novel PTS1 tripeptides. The high prediction performance is primarily determined by the large number and sequence diversity of the underlying positive example sequences, which mainly derived from EST databases. However, a few constructs remained cytosolic in experimental validation studies, indicating sequencing errors in some ESTs. To identify erroneous sequences, we validated subcellular targeting of additional positive example sequences in the present study. Moreover, we analyzed the distribution of prediction scores separately for each orthologous group of PTS1 proteins, which generally resembled normal distributions with group-specific mean values. The cytosolic sequences commonly represented outliers of low prediction scores and were located at the very tail of a fitted normal distribution. Three statistical methods for identifying outliers were compared in terms of sensitivity and specificity." Their combined application allows elimination of erroneous ESTs from positive example data sets. This new post-validation method will further improve the prediction accuracy of both PTS1 and PTS2 protein prediction models for plants, fungi, and mammals.
Collapse
Affiliation(s)
- Thomas Lingner
- Institute for Microbiology; Department of Bioinformatics; Goettingen, Germany
| | - Amr R. A. Kataya
- Centre for Organelle Research; University of Stavanger; Stavanger, Norway
| | - Sigrun Reumann
- Centre for Organelle Research; University of Stavanger; Stavanger, Norway
- Corresponding author: Sigrun Reumann; E-mail:
| |
Collapse
|
25
|
Reconstruction of Arabidopsis metabolic network models accounting for subcellular compartmentalization and tissue-specificity. Proc Natl Acad Sci U S A 2011; 109:339-44. [PMID: 22184215 DOI: 10.1073/pnas.1100358109] [Citation(s) in RCA: 202] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Plant metabolic engineering is commonly used in the production of functional foods and quality trait improvement. However, to date, computational model-based approaches have only been scarcely used in this important endeavor, in marked contrast to their prominent success in microbial metabolic engineering. In this study we present a computational pipeline for the reconstruction of fully compartmentalized tissue-specific models of Arabidopsis thaliana on a genome scale. This reconstruction involves automatic extraction of known biochemical reactions in Arabidopsis for both primary and secondary metabolism, automatic gap-filling, and the implementation of methods for determining subcellular localization and tissue assignment of enzymes. The reconstructed tissue models are amenable for constraint-based modeling analysis, and significantly extend upon previous model reconstructions. A set of computational validations (i.e., cross-validation tests, simulations of known metabolic functionalities) and experimental validations (comparison with experimental metabolomics datasets under various compartments and tissues) strongly testify to the predictive ability of the models. The utility of the derived models was demonstrated in the prediction of measured fluxes in metabolically engineered seed strains and the design of genetic manipulations that are expected to increase vitamin E content, a significant nutrient for human health. Overall, the reconstructed tissue models are expected to lay down the foundations for computational-based rational design of plant metabolic engineering. The reconstructed compartmentalized Arabidopsis tissue models are MIRIAM-compliant and are available upon request.
Collapse
|
26
|
Osterlund T, Nookaew I, Nielsen J. Fifteen years of large scale metabolic modeling of yeast: developments and impacts. Biotechnol Adv 2011; 30:979-88. [PMID: 21846501 DOI: 10.1016/j.biotechadv.2011.07.021] [Citation(s) in RCA: 88] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2011] [Accepted: 07/26/2011] [Indexed: 10/17/2022]
Abstract
Since the first large-scale reconstruction of the Saccharomyces cerevisiae metabolic network 15 years ago the development of yeast metabolic models has progressed rapidly, resulting in no less than nine different yeast genome-scale metabolic models. Here we review the historical development of large-scale mathematical modeling of yeast metabolism and the growing scope and impact of applications of these models in four different areas: as guide for metabolic engineering and strain improvement, as a tool for biological interpretation and discovery, applications of novel computational framework and for evolutionary studies.
Collapse
Affiliation(s)
- Tobias Osterlund
- Department of Chemical and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | | | | |
Collapse
|
27
|
Lingner T, Kataya AR, Antonicelli GE, Benichou A, Nilssen K, Chen XY, Siemsen T, Morgenstern B, Meinicke P, Reumann S. Identification of novel plant peroxisomal targeting signals by a combination of machine learning methods and in vivo subcellular targeting analyses. THE PLANT CELL 2011; 23:1556-72. [PMID: 21487095 PMCID: PMC3101550 DOI: 10.1105/tpc.111.084095] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2011] [Revised: 02/04/2011] [Accepted: 03/24/2011] [Indexed: 05/18/2023]
Abstract
In the postgenomic era, accurate prediction tools are essential for identification of the proteomes of cell organelles. Prediction methods have been developed for peroxisome-targeted proteins in animals and fungi but are missing specifically for plants. For development of a predictor for plant proteins carrying peroxisome targeting signals type 1 (PTS1), we assembled more than 2500 homologous plant sequences, mainly from EST databases. We applied a discriminative machine learning approach to derive two different prediction methods, both of which showed high prediction accuracy and recognized specific targeting-enhancing patterns in the regions upstream of the PTS1 tripeptides. Upon application of these methods to the Arabidopsis thaliana genome, 392 gene models were predicted to be peroxisome targeted. These predictions were extensively tested in vivo, resulting in a high experimental verification rate of Arabidopsis proteins previously not known to be peroxisomal. The prediction methods were able to correctly infer novel PTS1 tripeptides, which even included novel residues. Twenty-three newly predicted PTS1 tripeptides were experimentally confirmed, and a high variability of the plant PTS1 motif was discovered. These prediction methods will be instrumental in identifying low-abundance and stress-inducible peroxisomal proteins and defining the entire peroxisomal proteome of Arabidopsis and agronomically important crop plants.
Collapse
Affiliation(s)
- Thomas Lingner
- Georg-August University of Goettingen, Institute for Microbiology, Department of Bioinformatics, D-37077 Goettingen, Germany
- Centre for Organelle Research, University of Stavanger, N-4021 Stavanger, Norway
| | - Amr R. Kataya
- Centre for Organelle Research, University of Stavanger, N-4021 Stavanger, Norway
| | - Gerardo E. Antonicelli
- Centre for Organelle Research, University of Stavanger, N-4021 Stavanger, Norway
- Georg-August-University of Goettingen, Department of Plant Biochemistry, D-37077 Goettingen, Germany
| | - Aline Benichou
- Centre for Organelle Research, University of Stavanger, N-4021 Stavanger, Norway
| | - Kjersti Nilssen
- Centre for Organelle Research, University of Stavanger, N-4021 Stavanger, Norway
| | - Xiong-Yan Chen
- Centre for Organelle Research, University of Stavanger, N-4021 Stavanger, Norway
| | - Tanja Siemsen
- Georg-August-University of Goettingen, Department of Plant Biochemistry, D-37077 Goettingen, Germany
| | - Burkhard Morgenstern
- Georg-August University of Goettingen, Institute for Microbiology, Department of Bioinformatics, D-37077 Goettingen, Germany
| | - Peter Meinicke
- Georg-August University of Goettingen, Institute for Microbiology, Department of Bioinformatics, D-37077 Goettingen, Germany
| | - Sigrun Reumann
- Centre for Organelle Research, University of Stavanger, N-4021 Stavanger, Norway
- Georg-August-University of Goettingen, Department of Plant Biochemistry, D-37077 Goettingen, Germany
- Address correspondence to
| |
Collapse
|
28
|
Abstract
Background Understanding cellular systems requires the knowledge of a protein's subcellular localization (SCL). Although experimental and predicted data for protein SCL are archived in various databases, SCL prediction remains a non-trivial problem in genome annotation. Current SCL prediction tools use amino-acid sequence features and text mining approaches. A comprehensive analysis of protein SCL in human PPI and metabolic networks for various subcellular compartments is necessary for developing a robust SCL prediction methodology. Results Based on protein-protein interaction (PPI) and metabolite-linked protein interaction (MLPI) networks of proteins, we have compared, contrasted and analysed the statistical properties across different subcellular compartments. We integrated PPI and metabolic datasets with SCL information of human proteins from LOCATE and GOA (Gene Ontology Annotation) and estimated three statistical properties: Chi-square (χ2) test, Paired Localisation Correlation Profile (PLCP) and network topological measures. For the PPI network, Pearson's chi-square test shows that for the same SCL category, twice as many interacting protein pairs are observed than estimated when compared to non-interacting protein pairs (χ2 = 1270.19, P-value < 2.2 × 10-16), whereas for MLPI, metabolite-linked protein pairs having the same SCL are observed 20% more than expected, compared to non-metabolite linked proteins (χ2 = 110.02, P-value < 2.2 x10-16). To address the issue of proteins with multiple SCLs, we have specifically used the PLCP (Pair Localization Correlation Profile) measure. PLCP analysis revealed that protein interactions are majorly restricted to the same SCL, though significant cross-compartment interactions are seen for nuclear proteins. Metabolite-linked protein pairs are restricted to specific compartments such as the mitochondrion (P-value < 6.0e-07), the lysosome (P-value < 4.7e-05) and the Golgi apparatus (P-value < 1.0e-15). These findings indicate that the metabolic network adds value to the information in the PPI network for the localisation process of proteins in human subcellular compartments. Conclusions The MLPI network differs significantly from the PPI network in its SCL distribution. The PPI network shows passive protein interaction, possibly due to its high false positive rate, across different subcellular compartments, which seem to be absent in the MLPI network, as the MLPI network has evolved to maintain high substrate specificity for proteins.
Collapse
Affiliation(s)
- Gaurav Kumar
- ARC Centre of Excellence in Bioinformatics and Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney NSW, Australia.
| | | |
Collapse
|
29
|
Abstract
Recently, a way was opened with the development of many mathematical methods to model and analyze genome-scale metabolic networks. Among them, methods based on graph models enable to us quickly perform large-scale analyses on large metabolic networks. However, it could be difficult for parasitologists to select the graph model and methods adapted to their biological questions. In this review, after briefly addressing the problem of the metabolic network reconstruction, we propose an overview of the graph-based approaches used in whole metabolic network analyses. Applications highlight the usefulness of this kind of approach in the field of parasitology, especially by suggesting metabolic targets for new drugs. Their development still represents a major challenge to fight against the numerous diseases caused by parasites.
Collapse
|
30
|
Bourguignon PY, Samal A, Képès F, Jost J, Martin OC. Challenges in experimental data integration within genome-scale metabolic models. Algorithms Mol Biol 2010; 5:20. [PMID: 20412574 PMCID: PMC2865480 DOI: 10.1186/1748-7188-5-20] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2010] [Accepted: 04/22/2010] [Indexed: 11/10/2022] Open
Abstract
A report of the meeting "Challenges in experimental data integration within genome-scale metabolic models", Institut Henri Poincaré, Paris, October 10-11 2009, organized by the CNRS-MPG joint program in Systems Biology.
Collapse
|
31
|
Oberhardt MA, Palsson BØ, Papin JA. Applications of genome-scale metabolic reconstructions. Mol Syst Biol 2009; 5:320. [PMID: 19888215 PMCID: PMC2795471 DOI: 10.1038/msb.2009.77] [Citation(s) in RCA: 577] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2009] [Accepted: 09/22/2009] [Indexed: 12/12/2022] Open
Abstract
The availability and utility of genome-scale metabolic reconstructions have exploded since the first genome-scale reconstruction was published a decade ago. Reconstructions have now been built for a wide variety of organisms, and have been used toward five major ends: (1) contextualization of high-throughput data, (2) guidance of metabolic engineering, (3) directing hypothesis-driven discovery, (4) interrogation of multi-species relationships, and (5) network property discovery. In this review, we examine the many uses and future directions of genome-scale metabolic reconstructions, and we highlight trends and opportunities in the field that will make the greatest impact on many fields of biology.
Collapse
Affiliation(s)
- Matthew A Oberhardt
- Department of Biomedical Engineering, University of Virginia, Health System, Charlottesville, VA, USA
| | | | | |
Collapse
|