26
|
Swainston N, Hastings J, Dekker A, Muthukrishnan V, May J, Steinbeck C, Mendes P. libChEBI: an API for accessing the ChEBI database. J Cheminform 2016; 8:11. [PMID: 26933452 PMCID: PMC4772646 DOI: 10.1186/s13321-016-0123-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 02/16/2016] [Indexed: 01/29/2023] Open
Abstract
Background ChEBI is a database and ontology of chemical entities of biological interest. It is widely used as a source of identifiers to facilitate unambiguous reference to chemical entities within biological models, databases, ontologies and literature. ChEBI contains a wealth of chemical data, covering over 46,500 distinct chemical entities, and related data such as chemical formula, charge, molecular mass, structure, synonyms and links to external databases. Furthermore, ChEBI is an ontology, and thus provides meaningful links between chemical entities. Unlike many other resources, ChEBI is fully human-curated, providing a reliable, non-redundant collection of chemical entities and related data. While ChEBI is supported by a web service for programmatic access and a number of download files, it does not have an API library to facilitate the use of ChEBI and its data in cheminformatics software. Results To provide
this missing functionality, libChEBI, a comprehensive API library for accessing ChEBI data, is introduced. libChEBI is available in Java, Python and MATLAB versions from http://github.com/libChEBI, and provides full programmatic access to all data held within the ChEBI database through a simple and documented API. libChEBI is reliant upon the (automated) download and regular update of flat files that are held locally. As such, libChEBI can be embedded in both on- and off-line software applications. Conclusions libChEBI allows better support of ChEBI and its data in the development of new cheminformatics software. Covering three key programming languages, it allows for the entirety of the ChEBI database to be accessed easily and quickly through a simple API. All code is open access and freely available. Electronic supplementary material The online version of this article (doi:10.1186/s13321-016-0123-9) contains supplementary material, which is available to authorized users.
Collapse
|
27
|
Swainston N, Smallbone K, Hefzi H, Dobson PD, Brewer J, Hanscho M, Zielinski DC, Ang KS, Gardiner NJ, Gutierrez JM, Kyriakopoulos S, Lakshmanan M, Li S, Liu JK, Martínez VS, Orellana CA, Quek LE, Thomas A, Zanghellini J, Borth N, Lee DY, Nielsen LK, Kell DB, Lewis NE, Mendes P. Recon 2.2: from reconstruction to model of human metabolism. Metabolomics 2016; 12:109. [PMID: 27358602 PMCID: PMC4896983 DOI: 10.1007/s11306-016-1051-4] [Citation(s) in RCA: 182] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 05/27/2016] [Indexed: 11/30/2022]
Abstract
INTRODUCTION The human genome-scale metabolic reconstruction details all known metabolic reactions occurring in humans, and thereby holds substantial promise for studying complex diseases and phenotypes. Capturing the whole human metabolic reconstruction is an on-going task and since the last community effort generated a consensus reconstruction, several updates have been developed. OBJECTIVES We report a new consensus version, Recon 2.2, which integrates various alternative versions with significant additional updates. In addition to re-establishing a consensus reconstruction, further key objectives included providing more comprehensive annotation of metabolites and genes, ensuring full mass and charge balance in all reactions, and developing a model that correctly predicts ATP production on a range of carbon sources. METHODS Recon 2.2 has been developed through a combination of manual curation and automated error checking. Specific and significant manual updates include a respecification of fatty acid metabolism, oxidative phosphorylation and a coupling of the electron transport chain to ATP synthase activity. All metabolites have definitive chemical formulae and charges specified, and these are used to ensure full mass and charge reaction balancing through an automated linear programming approach. Additionally, improved integration with transcriptomics and proteomics data has been facilitated with the updated curation of relationships between genes, proteins and reactions. RESULTS Recon 2.2 now represents the most predictive model of human metabolism to date as demonstrated here. Extensive manual curation has increased the reconstruction size to 5324 metabolites, 7785 reactions and 1675 associated genes, which now are mapped to a single standard. The focus upon mass and charge balancing of all reactions, along with better representation of energy generation, has produced a flux model that correctly predicts ATP yield on different carbon sources. CONCLUSION Through these updates we have achieved the most complete and best annotated consensus human metabolic reconstruction available, thereby increasing the ability of this resource to provide novel insights into normal and disease states in human. The model is freely available from the Biomodels database (http://identifiers.org/biomodels.db/MODEL1603150001).
Collapse
|
28
|
Quinn JY, Cox RS, Adler A, Beal J, Bhatia S, Cai Y, Chen J, Clancy K, Galdzicki M, Hillson NJ, Le Novère N, Maheshwari AJ, McLaughlin JA, Myers CJ, P U, Pocock M, Rodriguez C, Soldatova L, Stan GBV, Swainston N, Wipat A, Sauro HM. SBOL Visual: A Graphical Language for Genetic Designs. PLoS Biol 2015; 13:e1002310. [PMID: 26633141 PMCID: PMC4669170 DOI: 10.1371/journal.pbio.1002310] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Synthetic Biology Open Language (SBOL) Visual is a graphical standard for genetic engineering. It consists of symbols representing DNA subsequences, including regulatory elements and DNA assembly features. These symbols can be used to draw illustrations for communication and instruction, and as image assets for computer-aided design. SBOL Visual is a community standard, freely available for personal, academic, and commercial use (Creative Commons CC0 license). We provide prototypical symbol images that have been used in scientific publications and software tools. We encourage users to use and modify them freely, and to join the SBOL Visual community: http://www.sbolstandard.org/visual.
Collapse
|
29
|
Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, Turner S, Swainston N, Mendes P, Steinbeck C. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res 2015; 44:D1214-9. [PMID: 26467479 PMCID: PMC4702775 DOI: 10.1093/nar/gkv1031] [Citation(s) in RCA: 518] [Impact Index Per Article: 57.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 09/28/2015] [Indexed: 12/31/2022] Open
Abstract
ChEBI is a database and ontology containing information about chemical entities of biological interest. It currently includes over 46,000 entries, each of which is classified within the ontology and assigned multiple annotations including (where relevant) a chemical structure, database cross-references, synonyms and literature citations. All content is freely available and can be accessed online at http://www.ebi.ac.uk/chebi. In this update paper, we describe recent improvements and additions to the ChEBI offering. We have substantially extended our collection of endogenous metabolites for several organisms including human, mouse, Escherichia coli and yeast. Our front-end has also been reworked and updated, improving the user experience, removing our dependency on Java applets in favour of embedded JavaScript components and moving from a monthly release update to a 'live' website. Programmatic access has been improved by the introduction of a library, libChEBI, in Java, Python and Matlab. Furthermore, we have added two new tools, namely an analysis tool, BiNChE, and a query tool for the ontology, OntoQuery.
Collapse
|
30
|
Stanford NJ, Millard P, Swainston N. RobOKoD: microbial strain design for (over)production of target compounds. Front Cell Dev Biol 2015; 3:17. [PMID: 25853130 PMCID: PMC4371745 DOI: 10.3389/fcell.2015.00017] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 02/25/2015] [Indexed: 11/16/2022] Open
Abstract
Sustainable production of target compounds such as biofuels and high-value chemicals for pharmaceutical, agrochemical, and chemical industries is becoming an increasing priority given their current dependency upon diminishing petrochemical resources. Designing these strains is difficult, with current methods focusing primarily on knocking-out genes, dismissing other vital steps of strain design including the overexpression and dampening of genes. The design predictions from current methods also do not translate well-into successful strains in the laboratory. Here, we introduce RobOKoD (Robust, Overexpression, Knockout and Dampening), a method for predicting strain designs for overproduction of targets. The method uses flux variability analysis to profile each reaction within the system under differing production percentages of target-compound and biomass. Using these profiles, reactions are identified as potential knockout, overexpression, or dampening targets. The identified reactions are ranked according to their suitability, providing flexibility in strain design for users. The software was tested by designing a butanol-producing Escherichia coli strain, and was compared against the popular OptKnock and RobustKnock methods. RobOKoD shows favorable design predictions, when predictions from these methods are compared to a successful butanol-producing experimentally-validated strain. Overall RobOKoD provides users with rankings of predicted beneficial genetic interventions with which to support optimized strain design.
Collapse
|
31
|
Currin A, Swainston N, Day PJ, Kell DB. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem Soc Rev 2015; 44:1172-239. [PMID: 25503938 PMCID: PMC4349129 DOI: 10.1039/c4cs00351a] [Citation(s) in RCA: 247] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Indexed: 12/21/2022]
Abstract
The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the 'search space' of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (Kd) and catalytic (kcat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving kcat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the 'best' amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biology, offers scope for the development of novel biocatalysts that are both highly active and robust.
Collapse
|
32
|
O′Hagan S, Swainston N, Handl J, Kell DB. A 'rule of 0.5' for the metabolite-likeness of approved pharmaceutical drugs. Metabolomics 2014; 11:323-339. [PMID: 25750602 PMCID: PMC4342520 DOI: 10.1007/s11306-014-0733-z] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Accepted: 09/08/2014] [Indexed: 12/20/2022]
Abstract
We exploit the recent availability of a community reconstruction of the human metabolic network ('Recon2') to study how close in structural terms are marketed drugs to the nearest known metabolite(s) that Recon2 contains. While other encodings using different kinds of chemical fingerprints give greater differences, we find using the 166 Public MDL Molecular Access (MACCS) keys that 90 % of marketed drugs have a Tanimoto similarity of more than 0.5 to the (structurally) 'nearest' human metabolite. This suggests a 'rule of 0.5' mnemonic for assessing the metabolite-like properties that characterise successful, marketed drugs. Multiobjective clustering leads to a similar conclusion, while artificial (synthetic) structures are seen to be less human-metabolite-like. This 'rule of 0.5' may have considerable predictive value in chemical biology and drug discovery, and may represent a powerful filter for decision making processes.
Collapse
|
33
|
Currin A, Swainston N, Day PJ, Kell DB. SpeedyGenes: an improved gene synthesis method for the efficient production of error-corrected, synthetic protein libraries for directed evolution. Protein Eng Des Sel 2014; 27:273-80. [PMID: 25108914 PMCID: PMC4140418 DOI: 10.1093/protein/gzu029] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The de novo synthesis of genes is becoming increasingly common in synthetic biology studies. However, the inherent error rate (introduced by errors incurred during oligonucleotide synthesis) limits its use in synthesising protein libraries to only short genes. Here we introduce SpeedyGenes, a PCR-based method for the synthesis of diverse protein libraries that includes an error-correction procedure, enabling the efficient synthesis of large genes for use directly in functional screening. First, we demonstrate an accurate gene synthesis method by synthesising and directly screening (without pre-selection) a 747 bp gene for green fluorescent protein (yielding 85% fluorescent colonies) and a larger 1518 bp gene (a monoamine oxidase, producing 76% colonies with full catalytic activity, a 4-fold improvement over previous methods). Secondly, we show that SpeedyGenes can accommodate multiple and combinatorial variant sequences while maintaining efficient enzymatic error correction, which is particularly crucial for larger genes. In its first application for directed evolution, we demonstrate the use of SpeedyGenes in the synthesis and screening of large libraries of MAO-N variants. Using this method, libraries are synthesised, transformed and screened within 3 days. Importantly, as each mutation we introduce is controlled by the oligonucleotide sequence, SpeedyGenes enables the synthesis of large, diverse, yet controlled variant sequences for the purposes of directed evolution.
Collapse
|
34
|
Swainston N, Currin A, Day PJ, Kell DB. GeneGenie: optimized oligomer design for directed evolution. Nucleic Acids Res 2014; 42:W395-400. [PMID: 24782527 PMCID: PMC4086129 DOI: 10.1093/nar/gku336] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
GeneGenie, a new online tool available at http://www.gene-genie.org, is introduced to support the design and self-assembly of synthetic genes and constructs. GeneGenie allows for the design of oligonucleotide cohorts encoding the gene sequence optimized for expression in any suitable host through an intuitive, easy-to-use web interface. The tool ensures consistent oligomer overlapping melting temperatures, minimizes the likelihood of misannealing, optimizes codon usage for expression in a selected host, allows for specification of forward and reverse cloning sequences (for downstream ligation) and also provides support for mutagenesis or directed evolution studies. Directed evolution studies are enabled through the construction of variant libraries via the optional specification of ‘variant codons’, containing mixtures of bases, at any position. For example, specifying the variant codon TNT (where N is any nucleotide) will generate an equimolar mixture of the codons TAT, TCT, TGT and TTT at that position, encoding a mixture of the amino acids Tyr, Ser, Cys and Phe. This facility is demonstrated through the use of GeneGenie to develop and synthesize a library of enhanced green fluorescent protein variants.
Collapse
|
35
|
Büchel F, Rodriguez N, Swainston N, Wrzodek C, Czauderna T, Keller R, Mittag F, Schubert M, Glont M, Golebiewski M, van Iersel M, Keating S, Rall M, Wybrow M, Hermjakob H, Hucka M, Kell DB, Müller W, Mendes P, Zell A, Chaouiya C, Saez-Rodriguez J, Schreiber F, Laibe C, Dräger A, Le Novère N. Path2Models: large-scale generation of computational models from biochemical pathway maps. BMC SYSTEMS BIOLOGY 2013; 7:116. [PMID: 24180668 PMCID: PMC4228421 DOI: 10.1186/1752-0509-7-116] [Citation(s) in RCA: 126] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 10/23/2013] [Indexed: 11/10/2022]
Abstract
BACKGROUND Systems biology projects and omics technologies have led to a growing number of biochemical pathway models and reconstructions. However, the majority of these models are still created de novo, based on literature mining and the manual processing of pathway data. RESULTS To increase the efficiency of model creation, the Path2Models project has automatically generated mathematical models from pathway representations using a suite of freely available software. Data sources include KEGG, BioCarta, MetaCyc and SABIO-RK. Depending on the source data, three types of models are provided: kinetic, logical and constraint-based. Models from over 2 600 organisms are encoded consistently in SBML, and are made freely available through BioModels Database at http://www.ebi.ac.uk/biomodels-main/path2models. Each model contains the list of participants, their interactions, the relevant mathematical constructs, and initial parameter values. Most models are also available as easy-to-understand graphical SBGN maps. CONCLUSIONS To date, the project has resulted in more than 140 000 freely available models. Such a resource can tremendously accelerate the development of mathematical models by providing initial starting models for simulation and analysis, which can be subsequently curated and further parameterized.
Collapse
|
36
|
Swainston N, Mendes P, Kell DB. An analysis of a 'community-driven' reconstruction of the human metabolic network. Metabolomics 2013; 9:757-764. [PMID: 23888127 PMCID: PMC3715687 DOI: 10.1007/s11306-013-0564-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Accepted: 06/28/2013] [Indexed: 12/22/2022]
Abstract
Following a strategy similar to that used in baker's yeast (Herrgård et al. Nat Biotechnol 26:1155-1160, 2008). A consensus yeast metabolic network obtained from a community approach to systems biology (Herrgård et al. 2008; Dobson et al. BMC Syst Biol 4:145, 2010). Further developments towards a genome-scale metabolic model of yeast (Dobson et al. 2010; Heavner et al. BMC Syst Biol 6:55, 2012). Yeast 5-an expanded reconstruction of the Saccharomyces cerevisiae metabolic network (Heavner et al. 2012) and in Salmonella typhimurium (Thiele et al. BMC Syst Biol 5:8, 2011). A community effort towards a knowledge-base and mathematical model of the human pathogen Salmonellatyphimurium LT2 (Thiele et al. 2011), a recent paper (Thiele et al. Nat Biotechnol 31:419-425, 2013). A community-driven global reconstruction of human metabolism (Thiele et al. 2013) described a much improved 'community consensus' reconstruction of the human metabolic network, called Recon 2, and the authors (that include the present ones) have made it freely available via a database at http://humanmetabolism.org/ and in SBML format at Biomodels (http://identifiers.org/biomodels.db/MODEL1109130000). This short analysis summarises the main findings, and suggests some approaches that will be able to exploit the availability of this model to advantage.
Collapse
|
37
|
Smallbone K, Messiha HL, Carroll KM, Winder CL, Malys N, Dunn WB, Murabito E, Swainston N, Dada JO, Khan F, Pir P, Simeonidis E, Spasić I, Wishart J, Weichart D, Hayes NW, Jameson D, Broomhead DS, Oliver SG, Gaskell SJ, McCarthy JEG, Paton NW, Westerhoff HV, Kell DB, Mendes P. A model of yeast glycolysis based on a consistent kinetic characterisation of all its enzymes. FEBS Lett 2013; 587:2832-41. [PMID: 23831062 PMCID: PMC3764422 DOI: 10.1016/j.febslet.2013.06.043] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Revised: 06/24/2013] [Accepted: 06/25/2013] [Indexed: 11/17/2022]
Abstract
We present an experimental and computational pipeline for the generation of kinetic models of metabolism, and demonstrate its application to glycolysis in Saccharomyces cerevisiae. Starting from an approximate mathematical model, we employ a “cycle of knowledge” strategy, identifying the steps with most control over flux. Kinetic parameters of the individual isoenzymes within these steps are measured experimentally under a standardised set of conditions. Experimental strategies are applied to establish a set of in vivo concentrations for isoenzymes and metabolites. The data are integrated into a mathematical model that is used to predict a new set of metabolite concentrations and reevaluate the control properties of the system. This bottom-up modelling study reveals that control over the metabolic network most directly involved in yeast glycolysis is more widely distributed than previously thought.
Collapse
|
38
|
Lee D, Smallbone K, Dunn WB, Murabito E, Winder CL, Kell DB, Mendes P, Swainston N. Improving metabolic flux predictions using absolute gene expression data. BMC SYSTEMS BIOLOGY 2012; 6:73. [PMID: 22713172 PMCID: PMC3477026 DOI: 10.1186/1752-0509-6-73] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2012] [Accepted: 06/05/2012] [Indexed: 12/13/2022]
Abstract
Background Constraint-based analysis of genome-scale metabolic models typically relies upon maximisation of a cellular objective function such as the rate or efficiency of biomass production. Whilst this assumption may be valid in the case of microorganisms growing under certain conditions, it is likely invalid in general, and especially for multicellular organisms, where cellular objectives differ greatly both between and within cell types. Moreover, for the purposes of biotechnological applications, it is normally the flux to a specific metabolite or product that is of interest rather than the rate of production of biomass per se. Results An alternative objective function is presented, that is based upon maximising the correlation between experimentally measured absolute gene expression data and predicted internal reaction fluxes. Using quantitative transcriptomics data acquired from Saccharomyces cerevisiae cultures under two growth conditions, the method outperforms traditional approaches for predicting experimentally measured exometabolic flux that are reliant upon maximisation of the rate of biomass production. Conclusion Due to its improved prediction of experimentally measured metabolic fluxes, and of its lack of a requirement for knowledge of the biomass composition of the organism under the conditions of interest, the approach is likely to be of rather general utility. The method has been shown to predict fluxes reliably in single cellular systems. Subsequent work will investigate the method’s ability to generate condition- and tissue-specific flux predictions in multicellular organisms.
Collapse
|
39
|
Krause F, Schulz M, Swainston N, Liebermeister W. Sustainable model building the role of standards and biological semantics. Methods Enzymol 2012; 500:371-95. [PMID: 21943907 DOI: 10.1016/b978-0-12-385118-5.00019-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2023]
Abstract
Systems biology models can be reused within new simulation scenarios, as parts of more complex models or as sources of biochemical knowledge. Reusability does not come by itself but has to be ensured while creating a model. Most important, models should be designed to remain valid in different contexts-for example, for different experimental conditions-and be published in a standardized and well-documented form. Creating reusable models is worthwhile, but it requires some efforts when a model is developed, implemented, documented, and published. Minimum requirements for published systems biology models have been formulated by the MIRIAM initiative. Main criteria are completeness of information and documentation, availability of machine-readable models in standard formats, and semantic annotations connecting the model elements with entries in biological Web resources. In this chapter, we discuss the assumptions behind bottom-up modeling; present important standards like MIRIAM, the Systems Biology Markup Language (SBML), and the Systems Biology Graphical Notation (SBGN); and describe software tools and services for handling semantic annotations. Finally, we show how standards can facilitate the construction of large metabolic network models.
Collapse
|
40
|
Swainston N, Smallbone K, Mendes P, Kell DB, Paton NW. The SuBliMinaL Toolbox: automating steps in the reconstruction of metabolic networks. J Integr Bioinform 2011. [DOI: 10.1515/jib-2011-186] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Summary The generation and use of metabolic network reconstructions has increased over recent years. The development of such reconstructions has typically involved a time-consuming, manual process. Recent work has shown that steps undertaken in reconstructing such metabolic networks are amenable to automation. The SuBliMinaL Toolbox (http://www.mcisb.org/subliminal/) facilitates the reconstruction process by providing a number of independent modules to perform common tasks, such as generating draft reconstructions, determining metabolite protonation state, mass and charge balancing reactions, suggesting intracellular compartmentalisation, adding transport reactions and a biomass function, and formatting the reconstruction to be used in third-party analysis packages. The individual modules manipulate reconstructions encoded in Systems Biology Markup Language (SBML), and can be chained to generate a reconstruction pipeline, or used individually during a manual curation process. This work describes the individual modules themselves, and a study in which the modules were used to develop a metabolic reconstruction of Saccharomyces cerevisiae from the existing data resources KEGG and MetaCyc. The automatically generated reconstruction is analysed for blocked reactions, and suggestions for future improvements to the toolbox are discussed.
Collapse
|
41
|
Thiele I, Hyduke DR, Steeb B, Fankam G, Allen DK, Bazzani S, Charusanti P, Chen FC, Fleming RMT, Hsiung CA, De Keersmaecker SCJ, Liao YC, Marchal K, Mo ML, Özdemir E, Raghunathan A, Reed JL, Shin SI, Sigurbjörnsdóttir S, Steinmann J, Sudarsan S, Swainston N, Thijs IM, Zengler K, Palsson BO, Adkins JN, Bumann D. A community effort towards a knowledge-base and mathematical model of the human pathogen Salmonella Typhimurium LT2. BMC SYSTEMS BIOLOGY 2011; 5:8. [PMID: 21244678 PMCID: PMC3032673 DOI: 10.1186/1752-0509-5-8] [Citation(s) in RCA: 102] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Accepted: 01/18/2011] [Indexed: 01/05/2023]
Abstract
Background Metabolic reconstructions (MRs) are common denominators in systems biology and represent biochemical, genetic, and genomic (BiGG) knowledge-bases for target organisms by capturing currently available information in a consistent, structured manner. Salmonella enterica subspecies I serovar Typhimurium is a human pathogen, causes various diseases and its increasing antibiotic resistance poses a public health problem. Results Here, we describe a community-driven effort, in which more than 20 experts in S. Typhimurium biology and systems biology collaborated to reconcile and expand the S. Typhimurium BiGG knowledge-base. The consensus MR was obtained starting from two independently developed MRs for S. Typhimurium. Key results of this reconstruction jamboree include i) development and implementation of a community-based workflow for MR annotation and reconciliation; ii) incorporation of thermodynamic information; and iii) use of the consensus MR to identify potential multi-target drug therapy approaches. Conclusion Taken together, with the growing number of parallel MRs a structured, community-driven approach will be necessary to maximize quality while increasing adoption of MRs in experimental design and interpretation.
Collapse
|
42
|
Swainston N, Jameson D, Carroll K. A QconCAT informatics pipeline for the analysis, visualization and sharing of absolute quantitative proteomics data. Proteomics 2010; 11:329-33. [PMID: 21204260 DOI: 10.1002/pmic.201000454] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2010] [Revised: 09/29/2010] [Accepted: 10/18/2010] [Indexed: 11/08/2022]
Abstract
Absolute protein concentration determination is becoming increasingly important in a number of fields including diagnostics, biomarker discovery and systems biology modeling. The recently introduced quantification concatamer methodology provides a novel approach to performing such determinations, and it has been applied to both microbial and mammalian systems. While a number of software tools exist for performing analyses of quantitative data generated by related methodologies such as SILAC, there is currently no analysis package dedicated to the quantification concatamer approach. Furthermore, most tools that are currently available in the field of quantitative proteomics do not manage storage and dissemination of such data sets.
Collapse
|
43
|
Dobson PD, Smallbone K, Jameson D, Simeonidis E, Lanthaler K, Pir P, Lu C, Swainston N, Dunn WB, Fisher P, Hull D, Brown M, Oshota O, Stanford NJ, Kell DB, King RD, Oliver SG, Stevens RD, Mendes P. Further developments towards a genome-scale metabolic model of yeast. BMC SYSTEMS BIOLOGY 2010; 4:145. [PMID: 21029416 PMCID: PMC2988745 DOI: 10.1186/1752-0509-4-145] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2010] [Accepted: 10/28/2010] [Indexed: 12/15/2022]
Abstract
BACKGROUND To date, several genome-scale network reconstructions have been used to describe the metabolism of the yeast Saccharomyces cerevisiae, each differing in scope and content. The recent community-driven reconstruction, while rigorously evidenced and well annotated, under-represented metabolite transport, lipid metabolism and other pathways, and was not amenable to constraint-based analyses because of lack of pathway connectivity. RESULTS We have expanded the yeast network reconstruction to incorporate many new reactions from the literature and represented these in a well-annotated and standards-compliant manner. The new reconstruction comprises 1102 unique metabolic reactions involving 924 unique metabolites--significantly larger in scope than any previous reconstruction. The representation of lipid metabolism in particular has improved, with 234 out of 268 enzymes linked to lipid metabolism now present in at least one reaction. Connectivity is emphatically improved, with more than 90% of metabolites now reachable from the growth medium constituents. The present updates allow constraint-based analyses to be performed; viability predictions of single knockouts are comparable to results from in vivo experiments and to those of previous reconstructions. CONCLUSIONS We report the development of the most complete reconstruction of yeast metabolism to date that is based upon reliable literature evidence and richly annotated according to MIRIAM standards. The reconstruction is available in the Systems Biology Markup Language (SBML) and via a publicly accessible database http://www.comp-sys-bio.org/yeastnet/.
Collapse
|
44
|
Radrich K, Tsuruoka Y, Dobson P, Gevorgyan A, Swainston N, Baart G, Schwartz JM. Integration of metabolic databases for the reconstruction of genome-scale metabolic networks. BMC SYSTEMS BIOLOGY 2010; 4:114. [PMID: 20712863 PMCID: PMC2930596 DOI: 10.1186/1752-0509-4-114] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2010] [Accepted: 08/16/2010] [Indexed: 01/13/2023]
Abstract
BACKGROUND Genome-scale metabolic reconstructions have been recognised as a valuable tool for a variety of applications ranging from metabolic engineering to evolutionary studies. However, the reconstruction of such networks remains an arduous process requiring a high level of human intervention. This process is further complicated by occurrences of missing or conflicting information and the absence of common annotation standards between different data sources. RESULTS In this article, we report a semi-automated methodology aimed at streamlining the process of metabolic network reconstruction by enabling the integration of different genome-wide databases of metabolic reactions. We present results obtained by applying this methodology to the metabolic network of the plant Arabidopsis thaliana. A systematic comparison of compounds and reactions between two genome-wide databases allowed us to obtain a high-quality core consensus reconstruction, which was validated for stoichiometric consistency. A lower level of consensus led to a larger reconstruction, which has a lower quality standard but provides a baseline for further manual curation. CONCLUSION This semi-automated methodology may be applied to other organisms and help to streamline the process of genome-scale network reconstruction in order to accelerate the transfer of such models to applications.
Collapse
|
45
|
Swainston N, Golebiewski M, Messiha HL, Malys N, Kania R, Kengne S, Krebs O, Mir S, Sauer-Danzwith H, Smallbone K, Weidemann A, Wittig U, Kell DB, Mendes P, Müller W, Paton NW, Rojas I. Enzyme kinetics informatics: from instrument to browser. FEBS J 2010; 277:3769-79. [PMID: 20738395 DOI: 10.1111/j.1742-4658.2010.07778.x] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A limited number of publicly available resources provide access to enzyme kinetic parameters. These have been compiled through manual data mining of published papers, not from the original, raw experimental data from which the parameters were calculated. This is largely due to the lack of software or standards to support the capture, analysis, storage and dissemination of such experimental data. Introduced here is an integrative system to manage experimental enzyme kinetics data from instrument to browser. The approach is based on two interrelated databases: the existing SABIO-RK database, containing kinetic data and corresponding metadata, and the newly introduced experimental raw data repository, MeMo-RK. Both systems are publicly available by web browser and web service interfaces and are configurable to ensure privacy of unpublished data. Users of this system are provided with the ability to view both kinetic parameters and the experimental raw data from which they are calculated, providing increased confidence in the data. A data analysis and submission tool, the kineticswizard, has been developed to allow the experimentalist to perform data collection, analysis and submission to both data resources. The system is designed to be extensible, allowing integration with other manufacturer instruments covering a range of analytical techniques.
Collapse
|
46
|
Smallbone K, Simeonidis E, Swainston N, Mendes P. Towards a genome-scale kinetic model of cellular metabolism. BMC SYSTEMS BIOLOGY 2010; 4:6. [PMID: 20109182 PMCID: PMC2829494 DOI: 10.1186/1752-0509-4-6] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2009] [Accepted: 01/28/2010] [Indexed: 11/10/2022]
Abstract
Background Advances in bioinformatic techniques and analyses have led to the availability of genome-scale metabolic reconstructions. The size and complexity of such networks often means that their potential behaviour can only be analysed with constraint-based methods. Whilst requiring minimal experimental data, such methods are unable to give insight into cellular substrate concentrations. Instead, the long-term goal of systems biology is to use kinetic modelling to characterize fully the mechanics of each enzymatic reaction, and to combine such knowledge to predict system behaviour. Results We describe a method for building a parameterized genome-scale kinetic model of a metabolic network. Simplified linlog kinetics are used and the parameters are extracted from a kinetic model repository. We demonstrate our methodology by applying it to yeast metabolism. The resultant model has 956 metabolic reactions involving 820 metabolites, and, whilst approximative, has considerably broader remit than any existing models of its type. Control analysis is used to identify key steps within the system. Conclusions Our modelling framework may be considered a stepping-stone toward the long-term goal of a fully-parameterized model of yeast metabolism. The model is available in SBML format from the BioModels database (BioModels ID: MODEL1001200000) and at http://www.mcisb.org/resources/genomescale/.
Collapse
|
47
|
Jameson D, Turner DA, Ankers J, Kennedy S, Ryan S, Swainston N, Griffiths T, Spiller DG, Oliver SG, White MRH, Kell DB, Paton NW. Information management for high content live cell imaging. BMC Bioinformatics 2009; 10:226. [PMID: 19622144 PMCID: PMC2723092 DOI: 10.1186/1471-2105-10-226] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2009] [Accepted: 07/21/2009] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND High content live cell imaging experiments are able to track the cellular localisation of labelled proteins in multiple live cells over a time course. Experiments using high content live cell imaging will generate multiple large datasets that are often stored in an ad-hoc manner. This hinders identification of previously gathered data that may be relevant to current analyses. Whilst solutions exist for managing image data, they are primarily concerned with storage and retrieval of the images themselves and not the data derived from the images. There is therefore a requirement for an information management solution that facilitates the indexing of experimental metadata and results of high content live cell imaging experiments. RESULTS We have designed and implemented a data model and information management solution for the data gathered through high content live cell imaging experiments. Many of the experiments to be stored measure the translocation of fluorescently labelled proteins from cytoplasm to nucleus in individual cells. The functionality of this database has been enhanced by the addition of an algorithm that automatically annotates results of these experiments with the timings of translocations and periods of any oscillatory translocations as they are uploaded to the repository. Testing has shown the algorithm to perform well with a variety of previously unseen data. CONCLUSION Our repository is a fully functional example of how high throughput imaging data may be effectively indexed and managed to address the requirements of end users. By implementing the automated analysis of experimental results, we have provided a clear impetus for individuals to ensure that their data forms part of that which is stored in the repository. Although focused on imaging, the solution provided is sufficiently generic to be applied to other functional proteomics and genomics experiments. The software is available from: fhttp://code.google.com/p/livecellim/
Collapse
|
48
|
Abstract
SUMMARY The Systems Biology Markup Language (SBML) is an established community XML format for the markup of biochemical models. With the introduction of SBML level 2 version 3, specific model entities, such as species or reactions, can now be annotated using ontological terms. These annotations, which are encoded using the resource description framework (RDF), provide the facility to specify definite terms to individual components, allowing software to unambiguously identify such components and thus link the models to existing data resources. libSBML is an application programming interface library for the manipulation of SBML files. While libSBML provides the facilities for reading and writing such annotations from and to models, it is beyond the scope of libSBML to provide interpretation of these terms. The libAnnotationSBML library introduced here acts as a layer on top of libSBML linking SBML annotations to the web services that describe these ontological terms. Two applications that use this library are described: SbmlSynonymExtractor finds name synonyms of SBML model entities and SbmlReactionBalancer checks SBML files to determine whether specifed reactions are elementally balanced.
Collapse
|
49
|
Brown M, Dunn WB, Dobson P, Patel Y, Winder CL, Francis-McIntyre S, Begley P, Carroll K, Broadhurst D, Tseng A, Swainston N, Spasic I, Goodacre R, Kell DB. Mass spectrometry tools and metabolite-specific databases for molecular identification in metabolomics. Analyst 2009; 134:1322-32. [PMID: 19562197 DOI: 10.1039/b901179j] [Citation(s) in RCA: 219] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The chemical identification of mass spectrometric signals in metabolomic applications is important to provide conversion of analytical data to biological knowledge about metabolic pathways. The complexity of electrospray mass spectrometric data acquired from a range of samples (serum, urine, yeast intracellular extracts, yeast metabolic footprints, placental tissue metabolic footprints) has been investigated and has defined the frequency of different ion types routinely detected. Although some ion types were expected (protonated and deprotonated peaks, isotope peaks, multiply charged peaks) others were not expected (sodium formate adduct ions). In parallel, the Manchester Metabolomics Database (MMD) has been constructed with data from genome scale metabolic reconstructions, HMDB, KEGG, Lipid Maps, BioCyc and DrugBank to provide knowledge on 42,687 endogenous and exogenous metabolite species. The combination of accurate mass data for a large collection of metabolites, theoretical isotope abundance data and knowledge of the different ion types detected provided a greater number of electrospray mass spectrometric signals which were putatively identified and with greater confidence in the samples studied. To provide definitive identification metabolite-specific mass spectral libraries for UPLC-MS and GC-MS have been constructed for 1,065 commercially available authentic standards. The MMD data are available at http://dbkgroup.org/MMD/.
Collapse
|
50
|
Lau KW, Jones AR, Swainston N, Siepen JA, Hubbard SJ. Capture and analysis of quantitative proteomic data. Proteomics 2007; 7:2787-99. [PMID: 17640002 PMCID: PMC2260796 DOI: 10.1002/pmic.200700127] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2007] [Indexed: 12/21/2022]
Abstract
Whilst the array of techniques available for quantitative proteomics continues to grow, the attendant bioinformatic software tools are similarly expanding in number. The data capture and analysis of such quantitative data is obviously crucial to the experiment and the methods used to process it will critically affect the quality of the data obtained. These tools must deal with a variety of issues, including identification of labelled and unlabelled peptide species, location of the corresponding MS scans in the experiment, construction of representative ion chromatograms, location of the true peptide ion chromatogram start and end, elimination of background signal in the mass spectrum and chromatogram and calculation of both peptide and protein ratios/abundances. A variety of tools and approaches are available, in part restricted by the nature of the experiment to be performed and available instrumentation. Currently, although there is no single consensus on precisely how to calculate protein and peptide abundances, many common themes have emerged which identify and reduce many of the key sources of error. These issues will be discussed, along with those relating to deposition of quantitative data. At present, mature data standards for quantitative proteomics are not yet available, although formats are beginning to emerge.
Collapse
|