Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

127
(from Reference Citation Analysis)

Article PDFs (35)

Cited by > 0 (120)

Searched Name

Michael P H Stumpf

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Number	Citation Analysis
51	Lakatos E, Ale A, Kirk PDW, Stumpf MPH. Multivariate moment closure techniques for stochastic kinetic models. J Chem Phys 2015;143:094107. [DOI: 10.1063/1.4929837] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
52	Sim A, Liepe J, Stumpf MPH. Goldstein-Kac telegraph processes with random speeds: Path probabilities, likelihoods, and reported Lévy flights. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015;91:042115. [PMID: 25974447 DOI: 10.1103/physreve.91.042115] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2014] [Indexed: 06/04/2023] Abstract The Goldstein-Kac telegraph process describes the one-dimensional motion of particles with constant speed undergoing random changes in direction. Despite its resemblance to numerous real-world phenomena, the singular nature of the resultant spatial distribution of each particle precludes the possibility of any a posteriori empirical validation of this random-walk model from data. Here we show that by simply allowing for random speeds, the ballistic terms are regularized and that the diffusion component can be well-approximated via the unscented transform. The result is a computationally efficient yet robust evaluation of the full particle path probabilities and, hence, the parameter likelihoods of this generalized telegraph process. We demonstrate how a population diffusing under such a model can lead to non-Gaussian asymptotic spatial distributions, thereby mimicking the behavior of an ensemble of Lévy walkers. Collapse Key Words Collapse MESH Headings Computer Simulation Models, Economic Models, Theoretical Motion Probability Collapse Grants NC/K001949/1 National Centre for the Replacement, Refinement and Reduction of Animals in Research Collapse
53	Johnson R, Kirk P, Stumpf MPH. SYSBIONS: nested sampling for systems biology. Bioinformatics 2015;31:604-5. [PMID: 25399028 PMCID: PMC4325544 DOI: 10.1093/bioinformatics/btu675] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Revised: 07/30/2014] [Accepted: 10/13/2014] [Indexed: 01/30/2023] Open Abstract MOTIVATION Model selection is a fundamental part of the scientific process in systems biology. Given a set of competing hypotheses, we routinely wish to choose the one that best explains the observed data. In the Bayesian framework, models are compared via Bayes factors (the ratio of evidences), where a model's evidence is the support given to the model by the data. A parallel interest is inferring the distribution of the parameters that define a model. Nested sampling is a method for the computation of a model's evidence and the generation of samples from the posterior parameter distribution. RESULTS We present a C-based, GPU-accelerated implementation of nested sampling that is designed for biological applications. The algorithm follows a standard routine with optional extensions and additional features. We provide a number of methods for sampling from the prior subject to a likelihood constraint. AVAILABILITY AND IMPLEMENTATION The software SYSBIONS is available from http://www.theosysbio.bio.ic.ac.uk/resources/sysbions/ CONTACT m.stumpf@imperial.ac.uk, robert.johnson11@imperial.ac.uk. Collapse Key Words Collapse MESH Headings Algorithms Bayes Theorem Models, Biological Probability Software Systems Biology/methods Collapse Grants BB/G007934/1 Biotechnology and Biological Sciences Research Council Collapse
54	MacLean AL, Harrington HA, Stumpf MPH, Hansen MDH. Epithelial-Mesenchymal Transition in Metastatic Cancer Cell Populations Affects Tumor Dormancy in a Simple Mathematical Model. Biomedicines 2014;2:384-402. [PMID: 28548077 PMCID: PMC5344274 DOI: 10.3390/biomedicines2040384] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Revised: 11/07/2014] [Accepted: 11/28/2014] [Indexed: 02/06/2023] Open Abstract Signaling from the c-Met receptor tyrosine kinase is associated with progression and metastasis of epithelial tumors. c-Met, the receptor for hepatocyte growth factor, triggers epithelial-mesenchymal transition (EMT) of cultured cells, which is thought to drive migration of tumor cells and confer on them critical stem cell properties. Here, we employ mathematical modeling to better understand how EMT affects population dynamics in metastatic tumors. We find that without intervention, micrometastatic tumors reach a steady-state population. While the rates of proliferation, senescence and death only have subtle effects on the steady state, changes in the frequency of EMT dramatically alter population dynamics towards exponential growth. We also find that therapies targeting cell proliferation or cell death are markedly more successful when combined with one that prevents EMT, though such therapies do little when used alone. Stochastic modeling reveals the probability of tumor recurrence from small numbers of residual differentiated tumor cells. EMT events in metastatic tumors provide a plausible mechanism by which clinically detectable tumors can arise from dormant micrometastatic tumors. Modeling the dynamics of this process demonstrates the benefit of a treatment that eradicates tumor cells and reduces the rate of EMT simultaneously. Collapse Key Words cancer growth chemotherapy mathematical modeling metastasis Collapse MESH Headings Collapse Grants Collapse
55	Mishto M, Liepe J, Textoris-Taube K, Keller C, Henklein P, Weberruß M, Dahlmann B, Enenkel C, Voigt A, Kuckelkorn U, Stumpf MPH, Kloetzel PM. Proteasome isoforms exhibit only quantitative differences in cleavage and epitope generation. Eur J Immunol 2014;44:3508-21. [PMID: 25231383 DOI: 10.1002/eji.201444902] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Revised: 08/01/2014] [Accepted: 09/12/2014] [Indexed: 11/09/2022] Abstract Immunoproteasomes are considered to be optimised to process Ags and to alter the peptide repertoire by generating a qualitatively different set of MHC class I epitopes. Whether the immunoproteasome at the biochemical level, influence the quality rather than the quantity of the immuno-genic peptide pool is still unclear. Here, we quantified the cleavage-site usage by human standard- and immunoproteasomes, and proteasomes from immuno-subunit-deficient mice, as well as the peptides generated from model polypeptides. We show in this study that the different proteasome isoforms can exert significant quantitative differences in the cleavage-site usage and MHC class I restricted epitope production. However, independent of the proteasome isoform and substrates studied, no evidence was obtained for the abolishment of the specific cleavage-site usage, or for differences in the quality of the peptides generated. Thus, we conclude that the observed differences in MHC class I restricted Ag presentation between standard- and immunoproteasomes are due to quantitative differences in the proteasome-generated antigenic peptides. Collapse Key Words Antigen presentation Immunoproteasome MHC class I restricted epitopes Proteasome Proteolysis Collapse MESH Headings Collapse Grants Collapse
56	Huvet M, Stumpf MPH. Overlapping genes: a window on gene evolvability. BMC Genomics 2014;15:721. [PMID: 25159814 PMCID: PMC4161906 DOI: 10.1186/1471-2164-15-721] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2013] [Accepted: 08/18/2014] [Indexed: 11/13/2022] Open Abstract Background The forces underlying genome architecture and organization are still only poorly understood in detail. Overlapping genes (genes partially or entirely overlapping) represent a genomic feature that is shared widely across biological organisms ranging from viruses to multi-cellular organisms. In bacteria, a third of the annotated genes are involved in an overlap. Despite the widespread nature of this arrangement, its evolutionary origins and biological ramifications have so far eluded explanation. Results Here we present a comparative approach using information from 699 bacterial genomes that sheds light on the evolutionary dynamics of overlapping genes. We show that these structures exhibit high levels of plasticity. Conclusions We propose a simple model allowing us to explain the observed properties of overlapping genes based on the importance of initiation and termination of transcriptional and translational processes. We believe that taking into account the processes leading to the expression of protein-coding genes hold the key to the understanding of overlapping genes structures. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
57	Liepe J, Holzhütter HG, Kloetzel PM, Stumpf MPH, Mishto M. Modelling proteasome and proteasome regulator activities. Biomolecules 2014;4:585-99. [PMID: 24970232 PMCID: PMC4101499 DOI: 10.3390/biom4020585] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2014] [Revised: 05/28/2014] [Accepted: 05/30/2014] [Indexed: 02/07/2023] Open Abstract Proteasomes are key proteases involved in a variety of processes ranging from the clearance of damaged proteins to the presentation of antigens to CD8+ T-lymphocytes. Which cleavage sites are used within the target proteins and how fast these proteins are degraded have a profound impact on immune system function and many cellular metabolic processes. The regulation of proteasome activity involves different mechanisms, such as the substitution of the catalytic subunits, the binding of regulatory complexes to proteasome gates and the proteasome conformational modifications triggered by the target protein itself. Mathematical models are invaluable in the analysis; and potentially allow us to predict the complex interactions of proteasome regulatory mechanisms and the final outcomes of the protein degradation rate and MHC class I epitope generation. The pioneering attempts that have been made to mathematically model proteasome activity, cleavage preference variation and their modification by one of the regulatory mechanisms are reviewed here. Collapse Key Words Collapse MESH Headings Animals Humans Hydrolysis Models, Biological Oligopeptides/chemistry Oligopeptides/metabolism Proteasome Endopeptidase Complex/metabolism Collapse Grants NC/K001949/1 National Centre for the Replacement, Refinement and Reduction of Animals in Research Wellcome Trust Collapse
58	Mc Mahon SS, Sim A, Filippi S, Johnson R, Liepe J, Smith D, Stumpf MPH. Information theory and signal transduction systems: from molecular information processing to network inference. Semin Cell Dev Biol 2014;35:98-108. [PMID: 24953199 DOI: 10.1016/j.semcdb.2014.06.011] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Revised: 06/04/2014] [Accepted: 06/10/2014] [Indexed: 01/05/2023] Abstract Sensing and responding to the environment are two essential functions that all biological organisms need to master for survival and successful reproduction. Developmental processes are marshalled by a diverse set of signalling and control systems, ranging from systems with simple chemical inputs and outputs to complex molecular and cellular networks with non-linear dynamics. Information theory provides a powerful and convenient framework in which such systems can be studied; but it also provides the means to reconstruct the structure and dynamics of molecular interaction networks underlying physiological and developmental processes. Here we supply a brief description of its basic concepts and introduce some useful tools for systems and developmental biologists. Along with a brief but thorough theoretical primer, we demonstrate the wide applicability and biological application-specific nuances by way of different illustrative vignettes. In particular, we focus on the characterisation of biological information processing efficiency, examining cell-fate decision making processes, gene regulatory network reconstruction, and efficient signal transduction experimental design. Collapse Key Words Experimental design Mutual information Network inference Noise Signal processing Collapse MESH Headings Collapse Grants Collapse
59	Michailovici I, Harrington HA, Azogui HH, Yahalom-Ronen Y, Plotnikov A, Ching S, Stumpf MPH, Klein OD, Seger R, Tzahor E. Nuclear to cytoplasmic shuttling of ERK promotes differentiation of muscle stem/progenitor cells. Development 2014;141:2611-20. [PMID: 24924195 DOI: 10.1242/dev.107078] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Abstract The transition between the proliferation and differentiation of progenitor cells is a key step in organogenesis, and alterations in this process can lead to developmental disorders. The extracellular signal-regulated kinase 1/2 (ERK) signaling pathway is one of the most intensively studied signaling mechanisms that regulates both proliferation and differentiation. How a single molecule (e.g. ERK) can regulate two opposing cellular outcomes is still a mystery. Using both chick and mouse models, we shed light on the mechanism responsible for the switch from proliferation to differentiation of head muscle progenitors and implicate ERK subcellular localization. Manipulation of the fibroblast growth factor (FGF)-ERK signaling pathway in chick embryos in vitro and in vivo demonstrated that blockage of this pathway accelerated myogenic differentiation, whereas its activation diminished it. We next examined whether the spatial subcellular localization of ERK could act as a switch between proliferation (nuclear ERK) and differentiation (cytoplasmic ERK) of muscle progenitors. A myristoylated peptide that blocks importin 7-mediated ERK nuclear translocation induced robust myogenic differentiation of muscle progenitor/stem cells in both head and trunk. In the mouse, analysis of Sprouty mutant embryos revealed that increased ERK signaling suppressed both head and trunk myogenesis. Our findings, corroborated by mathematical modeling, suggest that ERK shuttling between the nucleus and the cytoplasm provides a switch-like transition between proliferation and differentiation of muscle progenitors. Collapse Key Words Chick ERK FGF signaling Mouse Myogenesis Collapse MESH Headings Collapse Grants Collapse
60	Silk D, Kirk PDW, Barnes CP, Toni T, Stumpf MPH. Model selection in systems biology depends on experimental design. PLoS Comput Biol 2014;10:e1003650. [PMID: 24922483 PMCID: PMC4055659 DOI: 10.1371/journal.pcbi.1003650] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Accepted: 04/10/2014] [Indexed: 12/01/2022] Open Abstract Experimental design attempts to maximise the information available for modelling tasks. An optimal experiment allows the inferred models or parameters to be chosen with the highest expected degree of confidence. If the true system is faithfully reproduced by one of the models, the merit of this approach is clear - we simply wish to identify it and the true parameters with the most certainty. However, in the more realistic situation where all models are incorrect or incomplete, the interpretation of model selection outcomes and the role of experimental design needs to be examined more carefully. Using a novel experimental design and model selection framework for stochastic state-space models, we perform high-throughput in-silico analyses on families of gene regulatory cascade models, to show that the selected model can depend on the experiment performed. We observe that experimental design thus makes confidence a criterion for model choice, but that this does not necessarily correlate with a model's predictive power or correctness. Finally, in the special case of linear ordinary differential equation (ODE) models, we explore how wrong a model has to be before it influences the conclusions of a model selection analysis. Different models of the same process represent distinct hypotheses about reality. These can be decided between within the framework of model selection, where the evidence for each is given by their ability to reproduce a set of experimental data. Even if one of the models is correct, the chances of identifying it can be hindered by the quality of the data, both in terms of its signal to measurement error ratio and the intrinsic discriminatory potential of the experiment undertaken. This potential can be predicted in various ways, and maximising it is one aim of experimental design. In this work we present a computationally efficient method of experimental design for model selection. We exploit the efficiency to consider the implications of the realistic case where all models are more or less incorrect, showing that experiments can be chosen that, considered individually, lead to unequivocal support for opposed hypotheses. Collapse Key Words Collapse MESH Headings Computational Biology Computer Simulation Mathematical Concepts Models, Biological Monte Carlo Method Signal Transduction Systems Biology Collapse Grants Wellcome Trust BB/G007934/1 Biotechnology and Biological Sciences Research Council BB/K003909/1 Biotechnology and Biological Sciences Research Council Collapse
61	Liepe J, Kirk P, Filippi S, Toni T, Barnes CP, Stumpf MPH. A framework for parameter estimation and model selection from experimental data in systems biology using approximate Bayesian computation. Nat Protoc 2014;9:439-56. [PMID: 24457334 DOI: 10.1038/nprot.2014.025] [Citation(s) in RCA: 113] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Abstract As modeling becomes a more widespread practice in the life sciences and biomedical sciences, researchers need reliable tools to calibrate models against ever more complex and detailed data. Here we present an approximate Bayesian computation (ABC) framework and software environment, ABC-SysBio, which is a Python package that runs on Linux and Mac OS X systems and that enables parameter estimation and model selection in the Bayesian formalism by using sequential Monte Carlo (SMC) approaches. We outline the underlying rationale, discuss the computational and practical issues and provide detailed guidance as to how the important tasks of parameter inference and model selection can be performed in practice. Unlike other available packages, ABC-SysBio is highly suited for investigating, in particular, the challenging problem of fitting stochastic models to data. In order to demonstrate the use of ABC-SysBio, in this protocol we postulate the existence of an imaginary reaction network composed of seven interrelated biological reactions (involving a specific mRNA, the protein it encodes and a post-translationally modified version of the protein), a network that is defined by two files containing 'observed' data that we provide as supplementary information. In the first part of the PROCEDURE, ABC-SysBio is used to infer the parameters of this system, whereas in the second part we use ABC-SysBio's relevant functionality to discriminate between two different reaction network models, one of them being the 'true' one. Although computationally expensive, the additional insights gained in the Bayesian formalism more than make up for this cost, especially in complex problems. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
62	Jetka T, Charzyńska A, Gambin A, Stumpf MPH, Komorowski M. StochDecomp--Matlab package for noise decomposition in stochastic biochemical systems. ACTA ACUST UNITED AC 2013;30:137-8. [PMID: 24191070 DOI: 10.1093/bioinformatics/btt631] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Abstract MOTIVATION Stochasticity is an indispensable aspect of biochemical processes at the cellular level. Studies on how the noise enters and propagates in biochemical systems provided us with non-trivial insights into the origins of stochasticity, in total, however, they constitute a patchwork of different theoretical analyses. RESULTS Here we present a flexible and widely applicable noise decomposition tool that allows us to calculate contributions of individual reactions to the total variability of a system's output. With the package it is, therefore, possible to quantify how the noise enters and propagates in biochemical systems. We also demonstrate and exemplify using the JAK-STAT signalling pathway that the noise contributions resulting from individual reactions can be inferred from data experimental data along with Bayesian parameter inference. The method is based on the linear noise approximation, which is assumed to provide a reasonable representation of analyzed systems. AVAILABILITY AND IMPLEMENTATION http://sourceforge.net/p/stochdecomp/ Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
63	Komorowski M, Miękisz J, Stumpf MPH. Decomposing noise in biochemical signaling systems highlights the role of protein degradation. Biophys J 2013;104:1783-93. [PMID: 23601325 DOI: 10.1016/j.bpj.2013.02.027] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2012] [Revised: 01/30/2013] [Accepted: 02/08/2013] [Indexed: 11/17/2022] Open Abstract Stochasticity is an essential aspect of biochemical processes at the cellular level. We now know that living cells take advantage of stochasticity in some cases and counteract stochastic effects in others. Here we propose a method that allows us to calculate contributions of individual reactions to the total variability of a system's output. We demonstrate that reactions differ significantly in their relative impact on the total noise and we illustrate the importance of protein degradation on the overall variability for a range of molecular processes and signaling systems. With our flexible and generally applicable noise decomposition method, we are able to shed new, to our knowledge, light on the sources and propagation of noise in biochemical reaction networks; in particular, we are able to show how regulated protein degradation can be employed to reduce the noise in biochemical systems. Collapse Key Words Collapse MESH Headings Enzymes/metabolism Gene Expression Models, Biological Proteolysis Signal Transduction Stochastic Processes Collapse Grants BB/G020434/1 Biotechnology and Biological Sciences Research Council Collapse
64	Harrington HA, Feliu E, Wiuf C, Stumpf MPH. Cellular compartments cause multistability and allow cells to process more information. Biophys J 2013;104:1824-31. [PMID: 23601329 DOI: 10.1016/j.bpj.2013.02.028] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2012] [Revised: 02/03/2013] [Accepted: 02/08/2013] [Indexed: 11/30/2022] Open Abstract Many biological, physical, and social interactions have a particular dependence on where they take place; e.g., in living cells, protein movement between the nucleus and cytoplasm affects cellular responses (i.e., proteins must be present in the nucleus to regulate their target genes). Here we use recent developments from dynamical systems and chemical reaction network theory to identify and characterize the key-role of the spatial organization of eukaryotic cells in cellular information processing. In particular, the existence of distinct compartments plays a pivotal role in whether a system is capable of multistationarity (multiple response states), and is thus directly linked to the amount of information that the signaling molecules can represent in the nucleus. Multistationarity provides a mechanism for switching between different response states in cell signaling systems and enables multiple outcomes for cellular-decision making. We combine different mathematical techniques to provide a heuristic procedure to determine if a system has the capacity for multiple steady states, and find conditions that ensure that multiple steady states cannot occur. Notably, we find that introducing species localization can alter the capacity for multistationarity, and we mathematically demonstrate that shuttling confers flexibility for and greater control of the emergence of an all-or-nothing response of a cell. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
65	Kirk P, Witkover A, Bangham CRM, Richardson S, Lewin AM, Stumpf MPH. Balancing the robustness and predictive performance of biomarkers. J Comput Biol 2013;20:979-89. [PMID: 23909374 DOI: 10.1089/cmb.2013.0018] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open Abstract Recent studies have highlighted the importance of assessing the robustness of putative biomarkers identified from experimental data. This has given rise to the concept of stable biomarkers, which are ones that are consistently identified regardless of small perturbations to the data. Since stability is not by itself a useful objective, we present a number of strategies that combine assessments of stability and predictive performance in order to identify biomarkers that are both robust and diagnostically useful. Moreover, by wrapping these strategies around logistic regression classifiers regularized by the elastic net penalty, we are able to assess the effects of correlations between biomarkers upon their perceived stability. We use a synthetic example to illustrate the properties of our proposed strategies. In this example, we find that: (i) assessments of stability can help to reduce the number of false-positive biomarkers, although potentially at the cost of missing some true positives; (ii) combining assessments of stability with assessments of predictive performance can improve the true positive rate; and (iii) correlations between biomarkers can have adverse effects on their stability and hence must be carefully taken into account when undertaking biomarker discovery. We then apply our strategies in a proteomics context to identify a number of robust candidate biomarkers for the human disease HTLV1-associated myelopathy/tropical spastic paraparesis (HAM/TSP). Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
66	Kirk P, Thorne T, Stumpf MPH. Model selection in systems and synthetic biology. Curr Opin Biotechnol 2013;24:767-74. [PMID: 23578462 DOI: 10.1016/j.copbio.2013.03.012] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2012] [Revised: 03/07/2013] [Accepted: 03/14/2013] [Indexed: 11/17/2022] Abstract Developing mechanistic models has become an integral aspect of systems biology, as has the need to differentiate between alternative models. Parameterizing mathematical models has been widely perceived as a formidable challenge, which has spurred the development of statistical and optimisation routines for parameter inference. But now focus is increasingly shifting to problems that require us to choose from among a set of different models to determine which one offers the best description of a given biological system. We will here provide an overview of recent developments in the area of model selection. We will focus on approaches that are both practical as well as build on solid statistical principles and outline the conceptual foundations and the scope for application of such methods in systems biology. Collapse Key Words Collapse MESH Headings Bayes Theorem Likelihood Functions Models, Biological Synthetic Biology Systems Biology Collapse Grants BB/F00513X/1 Biotechnology and Biological Sciences Research Council BB/F005210/1 Biotechnology and Biological Sciences Research Council BB/G007934/1 Biotechnology and Biological Sciences Research Council Collapse
67	Ale A, Kirk P, Stumpf MPH. A general moment expansion method for stochastic kinetic models. J Chem Phys 2013;138:174101. [DOI: 10.1063/1.4802475] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
68	Thorne T, Fratta P, Hanna MG, Cortese A, Plagnol V, Fisher EM, Stumpf MPH. Graphical modelling of molecular networks underlying sporadic inclusion body myositis. MOLECULAR BIOSYSTEMS 2013;9:1736-42. [PMID: 23595110 DOI: 10.1039/c3mb25497f] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Abstract Here we present a novel statistical methodology that allows us to analyze gene expression data that have been collected from a number of different cases or conditions in a unified framework. Using a Bayesian nonparametric framework we develop a hierarchical model wherein genes can maintain a shared set of interactions between different cases, whilst also exhibiting behaviour that is unique to specific cases, sets of conditions, or groups of data points. By doing so we are able to not only combine data from different cases but also to discern the unique regulatory interactions that differentiate the cases. We apply our method to clinical data collected from patients suffering from sporadic Inclusion Body Myositis (sIBM), as well as control samples, and demonstrate the ability of our method to infer regulatory interactions that are unique to the disease cases of interest. The method thus balances the statistical need to include as many patients and controls as possible, and the clinical need to maintain potentially cryptic differences among patients and between patients and controls at the regulatory level. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
69	Filippi S, Barnes CP, Cornebise J, Stumpf MPH. On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo. Stat Appl Genet Mol Biol 2013;12:87-107. [PMID: 23502346 DOI: 10.1515/sagmb-2012-0069] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Abstract Approximate Bayesian computation (ABC) has gained popularity over the past few years for the analysis of complex models arising in population genetics, epidemiology and system biology. Sequential Monte Carlo (SMC) approaches have become work-horses in ABC. Here we discuss how to construct the perturbation kernels that are required in ABC SMC approaches, in order to construct a sequence of distributions that start out from a suitably defined prior and converge towards the unknown posterior. We derive optimality criteria for different kernels, which are based on the Kullback-Leibler divergence between a distribution and the distribution of the perturbed particles. We will show that for many complicated posterior distributions, locally adapted kernels tend to show the best performance. We find that the added moderate cost of adapting kernel functions is easily regained in terms of the higher acceptance rate. We demonstrate the computational efficiency gains in a range of toy examples which illustrate some of the challenges faced in real-world applications of ABC, before turning to two demanding parameter inference problems in molecular biology, which highlight the huge increases in efficiency that can be gained from choice of optimal kernels. We conclude with a general discussion of the rational choice of perturbation kernels in ABC SMC settings. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
70	Liepe J, Filippi S, Komorowski M, Stumpf MPH. Maximizing the information content of experiments in systems biology. PLoS Comput Biol 2013;9:e1002888. [PMID: 23382663 PMCID: PMC3561087 DOI: 10.1371/journal.pcbi.1002888] [Citation(s) in RCA: 102] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2012] [Accepted: 11/30/2012] [Indexed: 12/12/2022] Open Abstract Our understanding of most biological systems is in its infancy. Learning their structure and intricacies is fraught with challenges, and often side-stepped in favour of studying the function of different gene products in isolation from their physiological context. Constructing and inferring global mathematical models from experimental data is, however, central to systems biology. Different experimental setups provide different insights into such systems. Here we show how we can combine concepts from Bayesian inference and information theory in order to identify experiments that maximize the information content of the resulting data. This approach allows us to incorporate preliminary information; it is global and not constrained to some local neighbourhood in parameter space and it readily yields information on parameter robustness and confidence. Here we develop the theoretical framework and apply it to a range of exemplary problems that highlight how we can improve experimental investigations into the structure and dynamics of biological systems and their behavior. For most biological signalling and regulatory systems we still lack reliable mechanistic models. And where such models exist, e.g. in the form of differential equations, we typically have only rough estimates for the parameters that characterize the biochemical reactions. In order to improve our knowledge of such systems we require better estimates for these parameters and here we show how judicious choice of experiments, based on a combination of simulations and information theoretical analysis, can help us. Our approach builds on the available, frequently rudimentary information, and identifies which experimental set-up provides most additional information about all the parameters, or individual parameters. We will also consider the related but subtly different problem of which experiments need to be performed in order to decrease the uncertainty about the behaviour of the system under altered conditions. We develop the theoretical framework in the necessary detail before illustrating its use and applying it to the repressilator model, the regulation of Hes1 and signal transduction in the Akt pathway. Collapse Key Words Collapse MESH Headings Bayes Theorem Models, Theoretical Systems Biology Uncertainty Collapse Grants BB/G001863/1 Biotechnology and Biological Sciences Research Council BB/G020434/1 Biotechnology and Biological Sciences Research Council G1002092 Medical Research Council Wellcome Trust BB/G007934/1 Biotechnology and Biological Sciences Research Council Collapse
71	MacLean AL, Lo Celso C, Stumpf MPH. Population dynamics of normal and leukaemia stem cells in the haematopoietic stem cell niche show distinct regimes where leukaemia will be controlled. J R Soc Interface 2013;10:20120968. [PMID: 23349436 PMCID: PMC3627104 DOI: 10.1098/rsif.2012.0968] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open Abstract Haematopoietic stem cells (HSCs) are responsible for maintaining immune cells, red blood cells and platelets throughout life. HSCs must be located in their ecological niche (the bone marrow) to function correctly, that is, to regenerate themselves and their progeny; the latter eventually exit the bone marrow and enter circulation. We propose that cells with oncogenic potential-cancer/leukaemia stem cells (LSC)-and their progeny will also occupy this niche. Mathematical models, which describe the dynamics of HSCs, LSCs and their progeny allow investigation into the conditions necessary for defeating a malignant invasion of the niche. Two such models are developed and analysed here. To characterize their behaviour, we use an inferential framework that allows us to study regions in parameter space that give rise to desired behaviour together with an assessment of the robustness of the dynamics. Using this approach, we map out conditions under which HSCs can outcompete LSCs. In therapeutic applications, we clearly want to drive haematopoiesis into such regimes and the current analysis provide some guidance as to how we can identify new therapeutic targets. Our results suggest that maintaining a viable population of HSCs and their progenies in the niche may often already be nearly sufficient to eradicate LSCs from the system. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
72	Silk D, Filippi S, Stumpf MPH. Optimizing threshold-schedules for sequential approximate Bayesian computation: applications to molecular systems. Stat Appl Genet Mol Biol 2013;12:603-18. [DOI: 10.1515/sagmb-2012-0043] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
73	Thorne T, Stumpf MPH. Inference of temporally varying Bayesian networks. Bioinformatics 2012;28:3298-305. [PMID: 23074260 PMCID: PMC3519458 DOI: 10.1093/bioinformatics/bts614] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Revised: 10/04/2012] [Accepted: 10/11/2012] [Indexed: 11/12/2022] Open Abstract MOTIVATION When analysing gene expression time series data, an often overlooked but crucial aspect of the model is that the regulatory network structure may change over time. Although some approaches have addressed this problem previously in the literature, many are not well suited to the sequential nature of the data. RESULTS Here, we present a method that allows us to infer regulatory network structures that may vary between time points, using a set of hidden states that describe the network structure at a given time point. To model the distribution of the hidden states, we have applied the Hierarchical Dirichlet Process Hidden Markov Model, a non-parametric extension of the traditional Hidden Markov Model, which does not require us to fix the number of hidden states in advance. We apply our method to existing microarray expression data as well as demonstrating is efficacy on simulated test data. Collapse Key Words Collapse MESH Headings Animals Arabidopsis/genetics Arabidopsis/metabolism Bayes Theorem Drosophila melanogaster/genetics Drosophila melanogaster/growth & development Drosophila melanogaster/metabolism Gene Expression Gene Expression Profiling Gene Regulatory Networks Markov Chains Starch/metabolism Collapse Grants BB/F005210/2 Biotechnology and Biological Sciences Research Council Collapse
74	Thorne T, Stumpf MPH. Graph spectral analysis of protein interaction network evolution. J R Soc Interface 2012;9:2653-66. [PMID: 22552917 PMCID: PMC3427518 DOI: 10.1098/rsif.2012.0220] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 04/10/2012] [Indexed: 11/12/2022] Open Abstract We present an analysis of protein interaction network data via the comparison of models of network evolution to the observed data. We take a bayesian approach and perform posterior density estimation using an approximate bayesian computation with sequential Monte Carlo method. Our approach allows us to perform model selection over a selection of potential network growth models. The methodology we apply uses a distance defined in terms of graph spectra which captures the network data more naturally than previously used summary statistics such as the degree distribution. Furthermore, we include the effects of sampling into the analysis, to properly correct for the incompleteness of existing datasets, and have analysed the performance of our method under various degrees of sampling. We consider a number of models focusing not only on the biologically relevant class of duplication models, but also including models of scale-free network growth that have previously been claimed to describe such data. We find a preference for a duplication-divergence with linear preferential attachment model in the majority of the interaction datasets considered. We also illustrate how our method can be used to perform multi-model inference of network parameters to estimate properties of the full network from sampled data. Collapse Key Words protein interaction networks graph spectra approximate bayesian computation network evolution sequential monte carlo Collapse MESH Headings Animals Bayes Theorem Computer Simulation Drosophila melanogaster/metabolism Escherichia coli/metabolism Evolution, Molecular Helicobacter pylori/metabolism Models, Statistical Monte Carlo Method Protein Interaction Maps/physiology Saccharomyces cerevisiae/metabolism Collapse Grants BB/F005210/1 Biotechnology and Biological Sciences Research Council BB/F013566/1 Biotechnology and Biological Sciences Research Council BB/F005210/2 Biotechnology and Biological Sciences Research Council Collapse
75	Komorowski M, Zurauskiene J, Stumpf MPH. StochSens--Matlab package for sensitivity analysis of stochastic chemical systems. ACTA ACUST UNITED AC 2012;28:731-3. [PMID: 22378710 DOI: 10.1093/bioinformatics/btr714] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Abstract MOTIVATION The growing interest in the role of stochasticity in biochemical systems drives the demand for tools to analyse stochastic dynamical models of chemical reactions. One powerful tool to elucidate performance of dynamical systems is sensitivity analysis. Traditionally, however, the concept of sensitivity has mainly been applied to deterministic systems, and the difficulty to generalize these concepts for stochastic systems results from necessity of extensive Monte Carlo simulations. RESULTS Here we present a Matlab package, StochSens, that implements sensitivity analysis for stochastic chemical systems using the concept of the Fisher Information Matrix (FIM). It uses the linear noise approximation to represent the FIM in terms of solutions of ordinary differential equations. This is the first computational tool that allows for quick computation of the Information Matrix for stochastic systems without the need for Monte Carlo simulations. AVAILABILITY http://www.theosysbio.bio.ic.ac.uk/resources/stns SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
76	Toni T, Ozaki YI, Kirk P, Kuroda S, Stumpf MPH. Elucidating the in vivo phosphorylation dynamics of the ERK MAP kinase using quantitative proteomics data and Bayesian model selection. MOLECULAR BIOSYSTEMS 2012;8:1921-9. [PMID: 22555461 DOI: 10.1039/c2mb05493k] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Abstract Ever since reversible protein phosphorylation was discovered, it has been clear that it plays a key role in the regulation of cellular processes. Proteins often undergo double phosphorylation, which can occur through two possible mechanisms: distributive or processive. Which phosphorylation mechanism is chosen for a particular cellular regulation bears biological significance, and it is therefore in our interest to understand these mechanisms. In this paper we study dynamics of the MEK/ERK phosphorylation. We employ a model selection algorithm based on approximate Bayesian computation to elucidate phosphorylation dynamics from quantitative time course data obtained from PC12 cells in vivo. The algorithm infers the posterior distribution over four proposed models for phosphorylation and dephosphorylation dynamics, and this distribution indicates the amount of support given to each model. We evaluate the robustness of our inferential framework by systematically exploring different ways of parameterizing the models and for different prior specifications. The models with the highest inferred posterior probability are the two models employing distributive dephosphorylation, whereas we are unable to choose decisively between the processive and distributive phosphorylation mechanisms. Collapse Key Words Collapse MESH Headings Algorithms Animals Bayes Theorem Cell Line, Tumor Extracellular Signal-Regulated MAP Kinases/metabolism Models, Biological PC12 Cells Phosphorylation Proteomics Rats Collapse Grants BB/F005210/1 Biotechnology and Biological Sciences Research Council BB/G007934/1 Biotechnology and Biological Sciences Research Council Collapse
77	Harrington HA, Komorowski M, Beguerisse-Díaz M, Ratto GM, Stumpf MPH. Mathematical modeling reveals the functional implications of the different nuclear shuttling rates of Erk1 and Erk2. Phys Biol 2012;9:036001. [PMID: 22551942 DOI: 10.1088/1478-3975/9/3/036001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Abstract The mitogen-activated protein kinase (MAPK) family of proteins is involved in regulating cellular fates such as proliferation, differentiation and apoptosis. In particular, the dynamics of the Erk/Mek system, which has become the canonical example for MAPK signaling systems, have attracted considerable attention. Erk is encoded by two genes, Erk1 and Erk2, that until recently had been considered equivalent as they differ only subtly at the sequence level. However, these proteins exhibit radically different trafficking between cytoplasm and nucleus and this fact may have functional implications. Here we use spatially resolved data on Erk1/2 to develop and analyze spatio-temporal models of these cascades, and we discuss how sensitivity analysis can be used to discriminate between mechanisms. Our models elucidate some of the factors governing the interplay between signaling processes and the Erk1/2 localization in different cellular compartments, including competition between Erk1 and Erk2. Our approach is applicable to a wide range of signaling systems, such as activation cascades, where translocation of molecules occurs. Our study provides a first model of Erk1 and Erk2 activation and their nuclear shuttling dynamics, revealing a role in the regulation of the efficiency of nuclear signaling. Collapse Key Words Collapse MESH Headings Active Transport, Cell Nucleus Animals Cell Nucleus/metabolism Enzyme Activation HeLa Cells Humans MAP Kinase Signaling System Mice Mitogen-Activated Protein Kinase 1/analysis Mitogen-Activated Protein Kinase 1/metabolism Mitogen-Activated Protein Kinase 3/analysis Mitogen-Activated Protein Kinase 3/metabolism Models, Biological NIH 3T3 Cells Collapse Grants BB/F005210/1 Biotechnology and Biological Sciences Research Council BB/G020434/1 Biotechnology and Biological Sciences Research Council Collapse
78	Stumpf MPH, Porter MA. Mathematics. Critical truths about power laws. Science 2012;335:665-6. [PMID: 22323807 DOI: 10.1126/science.1216142] [Citation(s) in RCA: 207] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
79	Liepe J, Taylor H, Barnes CP, Huvet M, Bugeon L, Thorne T, Lamb JR, Dallman MJ, Stumpf MPH. Calibrating spatio-temporal models of leukocyte dynamics against in vivo live-imaging data using approximate Bayesian computation. Integr Biol (Camb) 2012;4:335-345. [PMID: 22327539 PMCID: PMC5058438 DOI: 10.1039/c2ib00175f] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Abstract In vivo studies allow us to investigate biological processes at the level of the organism. But not all aspects of in vivo systems are amenable to direct experimental measurements. In order to make the most of such data we therefore require statistical tools that allow us to obtain reliable estimates for e.g. kinetic in vivo parameters. Here we show how we can use approximate Bayesian computation approaches in order to analyse leukocyte migration in zebrafish embryos in response to injuries. We track individual leukocytes using live imaging following surgical injury to the embryos' tail-fins. The signalling gradient that leukocytes follow towards the site of the injury cannot be directly measured but we can estimate its shape and how it changes with time from the directly observed patterns of leukocyte migration. By coupling simple models of immune signalling and leukocyte migration with the unknown gradient shape into a single statistical framework we can gain detailed insights into the tissue-wide processes that are involved in the innate immune response to wound injury. In particular we find conclusive evidence for a temporally and spatially changing signalling gradient that modulates the changing activity of the leukocyte population in the embryos. We conclude with a robustness analysis which highlights the most important factors determining the leukocyte dynamics. Our approach relies only on the ability to simulate numerically the process under investigation and is therefore also applicable in other in vivo contexts and studies. Collapse Key Words Collapse MESH Headings Algorithms Animals Bayes Theorem Cell Movement/physiology Humans Leukocytes/physiology Models, Biological Signal Transduction Systems Biology Time-Lapse Imaging Zebrafish/embryology Zebrafish/physiology Collapse Grants 086763 Wellcome Trust BB/F005210/1 Biotechnology and Biological Sciences Research Council BB/G007934/1 Biotechnology and Biological Sciences Research Council Collapse
80	Kelly WP, Ingram PJ, Stumpf MPH. The degree distribution of networks: statistical model selection. Methods Mol Biol 2012;804:245-262. [PMID: 22144157 DOI: 10.1007/978-1-61779-361-5_13] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023] Abstract The degree distribution has been viewed as an important characteristic of network data. Many biological networks have been labelled scale-free as their degree distribution can be approximately described by a power-law probability distribution. This chapter presents a formal statistical model selection procedure that can determine which functional form, from a collection of specified models, best describes the degree distribution of network data. The degree distribution found for empirical data is viewed as belonging to a class of probability models and the model which best describes the data is determined in a maximum likelihood framework. In conclusion, it is important to note that these statistical tests do not confirm the true underlying distribution of the observed data, but instead show which models from a chosen set best describe the data. In reality, these approaches should be viewed as providing evidence for which probability models do not adequately (or optimally) describe the data, and give an indication of the underlying sampling and true interaction properties of the system considered. Collapse Key Words Collapse MESH Headings Campylobacter jejuni/genetics Likelihood Functions Mathematical Concepts Models, Statistical Probability Protein Interaction Maps/genetics Saccharomyces cerevisiae/genetics Software Systems Biology/methods Collapse Grants BB/E01612X/1 Biotechnology and Biological Sciences Research Council Wellcome Trust Collapse
81	Harmston N, Filsell W, Stumpf MPH. Which species is it? Species-driven gene name disambiguation using random walks over a mixture of adjacency matrices. Bioinformatics 2011;28:254-60. [PMID: 22135416 DOI: 10.1093/bioinformatics/btr640] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open Abstract MOTIVATION The scientific literature contains a wealth of information about biological systems. Manual curation lacks the scalability to extract this information due to the ever-increasing numbers of papers being published. The development and application of text mining technologies has been proposed as a way of dealing with this problem. However, the inter-species ambiguity of the genomic nomenclature makes mapping of gene mentions identified in text to their corresponding Entrez gene identifiers an extremely difficult task. We propose a novel method, which transforms a MEDLINE record into a mixture of adjacency matrices; by performing a random walkover the resulting graph, we can perform multi-class supervised classification allowing the assignment of taxonomy identifiers to individual gene mentions. The ability to achieve good performance at this task has a direct impact on the performance of normalizing gene mentions to Entrez gene identifiers. Such graph mixtures add flexibility and allow us to generate probabilistic classification schemes that naturally reflect the uncertainties inherent, even in literature-derived data. RESULTS Our method performs well in terms of both micro- and macro-averaged performance, achieving micro-F(1) of 0.76 and macro-F(1) of 0.36 on the publicly available DECA corpus. Re-curation of the DECA corpus was performed, with our method achieving 0.88 micro-F(1) and 0.51 macro-F(1). Our method improves over standard classification techniques [such as support vector machines (SVMs)] in a number of ways: flexibility, interpretability and its resistance to the effects of class bias in the training data. Good performance is achieved without the need for computationally expensive parse tree generation or 'bag of words classification'. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
82	Tang Y, Sheng X, Stumpf MPH. The roles of contact residue disorder and domain composition in characterizing protein-ligand binding specificity and promiscuity. MOLECULAR BIOSYSTEMS 2011;7:3280-6. [PMID: 22002096 DOI: 10.1039/c1mb05325f] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Abstract Most protein chains interact with only one ligand but a small number of protein chains can bind several ligands, and many examples are available in the protein-ligand complex database of PDB. Among these proteins, some show preferences for the ligands or types of ligands they bind; however, so far we have only poor understanding of what determines protein-ligand binding and its specificity. Here we investigate the structural and functional properties of proteins in protein-ligand complexes. Analysis of the protein-ligand complex dataset from the PDB structure database reveals that proteins with more interactions have more disordered contact residues. Those proteins containing few disordered contact residues that bind multiple ligands have a tendency to consist of several domains. Analysis of physicochemical properties of hub contact residues binding multiple ligands indicates that they are enriched for hydrophilic, charged, polar and His-Asp catalytic triad residues. Finally, in order to differentiate proteins binding different classes of ligands, we mapped the three most prominent classes of ligands onto different superfamily domains. Our results demonstrate that contact residue disorder and ordered multiple domains are complementary factors that play a crucial role in determining ligand binding specificity and promiscuity. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
83	Kirk PDW, Witkover A, Courtney A, Lewin AM, Wait R, Stumpf MPH, Richardson S, Taylor GP, Bangham CRM. Plasma proteome analysis in HTLV-1-associated myelopathy/tropical spastic paraparesis. Retrovirology 2011;8:81. [PMID: 21992623 PMCID: PMC3210102 DOI: 10.1186/1742-4690-8-81] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2011] [Accepted: 10/12/2011] [Indexed: 11/13/2022] Open Abstract Background Human T lymphotropic virus Type 1 (HTLV-1) causes a chronic inflammatory disease of the central nervous system known as HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM) which resembles chronic spinal forms of multiple sclerosis (MS). The pathogenesis of HAM remains uncertain. To aid in the differential diagnosis of HAM and to identify pathogenetic mechanisms, we analysed the plasma proteome in asymptomatic HTLV-1 carriers (ACs), patients with HAM, uninfected controls, and patients with MS. We used surface-enhanced laser desorption-ionization (SELDI) mass spectrometry to analyse the plasma proteome in 68 HTLV-1-infected individuals (in two non-overlapping sets, each comprising 17 patients with HAM and 17 ACs), 16 uninfected controls, and 11 patients with secondary progressive MS. Candidate biomarkers were identified by tandem Q-TOF mass spectrometry. Results The concentrations of three plasma proteins - high [β2-microglobulin], high [Calgranulin B], and low [apolipoprotein A2] - were specifically associated with HAM, independently of proviral load. The plasma [β2-microglobulin] was positively correlated with disease severity. Conclusions The results indicate that monocytes are activated by contact with activated endothelium in HAM. Using β2-microglobulin and Calgranulin B alone we derive a diagnostic algorithm that correctly classified the disease status (presence or absence of HAM) in 81% of HTLV-1-infected subjects in the cohort. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
84	Barnes CP, Silk D, Stumpf MPH. Bayesian design strategies for synthetic biology. Interface Focus 2011;1:895-908. [PMID: 23226588 DOI: 10.1098/rsfs.2011.0056] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2011] [Accepted: 09/12/2011] [Indexed: 11/12/2022] Open Abstract We discuss how statistical inference techniques can be applied in the context of designing novel biological systems. Bayesian techniques have found widespread application and acceptance in the systems biology community, where they are used for both parameter estimation and model selection. Here we show that the same approaches can also be used in order to engineer synthetic biological systems by inferring the structure and parameters that are most likely to give rise to the dynamics that we require a system to exhibit. Problems that are shared between applications in systems and synthetic biology include the vast potential spaces that need to be searched for suitable models and model parameters; the complex forms of likelihood functions; and the interplay between noise at the molecular level and nonlinearity in the dynamics owing to often complex feedback structures. In order to meet these challenges, we have to develop suitable inferential tools and here, in particular, we illustrate the use of approximate Bayesian computation and unscented Kalman filtering-based approaches. These partly complementary methods allow us to tackle a number of recurring problems in the design of biological systems. After a brief exposition of these two methodologies, we focus on their application to oscillatory systems. Collapse Key Words approximate Bayesian computation synthetic biology unscented Kalman filter Collapse MESH Headings Collapse Grants Collapse
85	Silk D, Kirk PDW, Barnes CP, Toni T, Rose A, Moon S, Dallman MJ, Stumpf MPH. Designing attractive models via automated identification of chaotic and oscillatory dynamical regimes. Nat Commun 2011;2:489. [PMID: 21971504 PMCID: PMC3207206 DOI: 10.1038/ncomms1496] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2011] [Accepted: 09/01/2011] [Indexed: 11/09/2022] Open Abstract Chaos and oscillations continue to capture the interest of both the scientific and public domains. Yet despite the importance of these qualitative features, most attempts at constructing mathematical models of such phenomena have taken an indirect, quantitative approach, for example, by fitting models to a finite number of data points. Here we develop a qualitative inference framework that allows us to both reverse-engineer and design systems exhibiting these and other dynamical behaviours by directly specifying the desired characteristics of the underlying dynamical attractor. This change in perspective from quantitative to qualitative dynamics, provides fundamental and new insights into the properties of dynamical systems. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
86	Barnes CP, Silk D, Sheng X, Stumpf MPH. Bayesian design of synthetic biological systems. Proc Natl Acad Sci U S A 2011;108:15190-5. [PMID: 21876136 PMCID: PMC3174594 DOI: 10.1073/pnas.1017972108] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open Abstract Here we introduce a new design framework for synthetic biology that exploits the advantages of Bayesian model selection. We will argue that the difference between inference and design is that in the former we try to reconstruct the system that has given rise to the data that we observe, whereas in the latter, we seek to construct the system that produces the data that we would like to observe, i.e., the desired behavior. Our approach allows us to exploit methods from Bayesian statistics, including efficient exploration of models spaces and high-dimensional parameter spaces, and the ability to rank models with respect to their ability to generate certain types of data. Bayesian model selection furthermore automatically strikes a balance between complexity and (predictive or explanatory) performance of mathematical models. To deal with the complexities of molecular systems we employ an approximate Bayesian computation scheme which only requires us to simulate from different competing models to arrive at rational criteria for choosing between them. We illustrate the advantages resulting from combining the design and modeling (or in silico prototyping) stages currently seen as separate in synthetic biology by reference to deterministic and stochastic model systems exhibiting adaptive and switch-like behavior, as well as bacterial two-component signaling systems. Collapse Key Words biochemical circuits dynamical systems robustness Collapse MESH Headings Adaptation, Physiological Bacteria/metabolism Bayes Theorem Stochastic Processes Synthetic Biology Systems Biology Collapse Grants BB/C519670/1 Biotechnology and Biological Sciences Research Council BB/F005210/1 Biotechnology and Biological Sciences Research Council BB/G007934/1 Biotechnology and Biological Sciences Research Council BB/G020434/1 Biotechnology and Biological Sciences Research Council Collapse
87	Toni T, Jovanovic G, Huvet M, Buck M, Stumpf MPH. From qualitative data to quantitative models: analysis of the phage shock protein stress response in Escherichia coli. BMC SYSTEMS BIOLOGY 2011;5:69. [PMID: 21569396 PMCID: PMC3127791 DOI: 10.1186/1752-0509-5-69] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2010] [Accepted: 05/12/2011] [Indexed: 01/05/2023] Abstract Background Bacteria have evolved a rich set of mechanisms for sensing and adapting to adverse conditions in their environment. These are crucial for their survival, which requires them to react to extracellular stresses such as heat shock, ethanol treatment or phage infection. Here we focus on studying the phage shock protein (Psp) stress response in Escherichia coli induced by a phage infection or other damage to the bacterial membrane. This system has not yet been theoretically modelled or analysed in silico. Results We develop a model of the Psp response system, and illustrate how such models can be constructed and analyzed in light of available sparse and qualitative information in order to generate novel biological hypotheses about their dynamical behaviour. We analyze this model using tools from Petri-net theory and study its dynamical range that is consistent with currently available knowledge by conditioning model parameters on the available data in an approximate Bayesian computation (ABC) framework. Within this ABC approach we analyze stochastic and deterministic dynamics. This analysis allows us to identify different types of behaviour and these mechanistic insights can in turn be used to design new, more detailed and time-resolved experiments. Conclusions We have developed the first mechanistic model of the Psp response in E. coli. This model allows us to predict the possible qualitative stochastic and deterministic dynamic behaviours of key molecular players in the stress response. Our inferential approach can be applied to stress response and signalling systems more generally: in the ABC framework we can condition mathematical models on qualitative data in order to delimit e.g. parameter ranges or the qualitative system dynamics in light of available end-point or qualitative information. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
88	Zhou Y, Liepe J, Sheng X, Stumpf MPH, Barnes C. GPU accelerated biochemical network simulation. Bioinformatics 2011;27:874-6. [PMID: 21224286 PMCID: PMC3051321 DOI: 10.1093/bioinformatics/btr015] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2010] [Revised: 12/10/2010] [Accepted: 01/05/2011] [Indexed: 11/13/2022] Open Abstract MOTIVATION Mathematical modelling is central to systems and synthetic biology. Using simulations to calculate statistics or to explore parameter space is a common means for analysing these models and can be computationally intensive. However, in many cases, the simulations are easily parallelizable. Graphics processing units (GPUs) are capable of efficiently running highly parallel programs and outperform CPUs in terms of raw computing power. Despite their computational advantages, their adoption by the systems biology community is relatively slow, since differences in hardware architecture between GPUs and CPUs complicate the porting of existing code. RESULTS We present a Python package, cuda-sim, that provides highly parallelized algorithms for the repeated simulation of biochemical network models on NVIDIA CUDA GPUs. Algorithms are implemented for the three popular types of model formalisms: the LSODA algorithm for ODE integration, the Euler-Maruyama algorithm for SDE simulation and the Gillespie algorithm for MJP simulation. No knowledge of GPU computing is required from the user. Models can be specified in SBML format or provided as CUDA code. For running a large number of simulations in parallel, up to 360-fold decrease in simulation runtime is attained when compared to single CPU implementations. AVAILABILITY http://cuda-sim.sourceforge.net/ Collapse Key Words Collapse MESH Headings Algorithms Computational Biology/methods Computer Graphics Computer Simulation Models, Biological Software Systems Biology Collapse Grants BB/C519670/1 Biotechnology and Biological Sciences Research Council BB/F005210/1 Biotechnology and Biological Sciences Research Council BB/G007934/1 Biotechnology and Biological Sciences Research Council Wellcome Trust Collapse
89	Erguler K, Stumpf MPH. Practical limits for reverse engineering of dynamical systems: a statistical analysis of sensitivity and parameter inferability in systems biology models. MOLECULAR BIOSYSTEMS 2011;7:1593-602. [PMID: 21380410 DOI: 10.1039/c0mb00107d] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Abstract The size and complexity of cellular systems make building predictive models an extremely difficult task. In principle dynamical time-course data can be used to elucidate the structure of the underlying molecular mechanisms, but a central and recurring problem is that many and very different models can be fitted to experimental data, especially when the latter are limited and subject to noise. Even given a model, estimating its parameters remains challenging in real-world systems. Here we present a comprehensive analysis of 180 systems biology models, which allows us to classify the parameters with respect to their contribution to the overall dynamical behaviour of the different systems. Our results reveal candidate elements of control in biochemical pathways that differentially contribute to dynamics. We introduce sensitivity profiles that concisely characterize parameter sensitivity and demonstrate how this can be connected to variability in data. Systematically linking data and model sloppiness allows us to extract features of dynamical systems that determine how well parameters can be estimated from time-course measurements, and associates the extent of data required for parameter inference with the model structure, and also with the global dynamical state of the system. The comprehensive analysis of so many systems biology models reaffirms the inability to estimate precisely most model or kinetic parameters as a generic feature of dynamical systems, and provides safe guidelines for performing better inferences and model predictions in the context of reverse engineering of mathematical models for biological systems. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
90	Huvet M, Toni T, Sheng X, Thorne T, Jovanovic G, Engl C, Buck M, Pinney JW, Stumpf MPH. The evolution of the phage shock protein response system: interplay between protein function, genomic organization, and system function. Mol Biol Evol 2010;28:1141-55. [PMID: 21059793 PMCID: PMC3041696 DOI: 10.1093/molbev/msq301] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open Abstract Sensing the environment and responding appropriately to it are key capabilities for the survival of an organism. All extant organisms must have evolved suitable sensors, signaling systems, and response mechanisms allowing them to survive under the conditions they are likely to encounter. Here, we investigate in detail the evolutionary history of one such system: The phage shock protein (Psp) stress response system is an important part of the stress response machinery in many bacteria, including Escherichia coli K12. Here, we use a systematic analysis of the genes that make up and regulate the Psp system in E. coli in order to elucidate the evolutionary history of the system. We compare gene sharing, sequence evolution, and conservation of protein-coding as well as noncoding DNA sequences and link these to comparative analyses of genome/operon organization across 698 bacterial genomes. Finally, we evaluate experimentally the biological advantage/disadvantage of a simplified version of the Psp system under different oxygen-related environments. Our results suggest that the Psp system evolved around a core response mechanism by gradually co-opting genes into the system to provide more nuanced sensory, signaling, and effector functionalities. We find that recruitment of new genes into the response machinery is closely linked to incorporation of these genes into a psp operon as is seen in E. coli, which contains the bulk of genes involved in the response. The organization of this operon allows for surprising levels of additional transcriptional control and flexibility. The results discussed here suggest that the components of such signaling systems will only be evolutionarily conserved if the overall functionality of the system can be maintained. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
91	Stumpf MPH, Wiuf C. Incomplete and noisy network data as a percolation process. J R Soc Interface 2010;7:1411-9. [PMID: 20378609 PMCID: PMC2935600 DOI: 10.1098/rsif.2010.0044] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2010] [Accepted: 03/18/2010] [Indexed: 11/12/2022] Open Abstract We discuss the ramifications of noisy and incomplete observations of network data on the existence of a giant connected component (GCC). The existence of a GCC in a random graph can be described in terms of a percolation process, and building on general results for classes of random graphs with specified degree distributions we derive percolation thresholds above which GCCs exist. We show that sampling and noise can have a profound effect on the perceived existence of a GCC and find that both processes can destroy it. We also show that the absence of a GCC puts a theoretical upper bound on the false-positive rate and relate our percolation analysis to experimental protein-protein interaction data. Collapse Key Words complex networks random graphs protein interaction networks sampling problems Collapse MESH Headings Computational Biology/methods Data Interpretation, Statistical Models, Biological Protein Interaction Mapping Collapse Grants BB/C519670/1 Biotechnology and Biological Sciences Research Council BB/E01612X/1 Biotechnology and Biological Sciences Research Council BB/F005210/1 Biotechnology and Biological Sciences Research Council Collapse
92	Lèbre S, Becq J, Devaux F, Stumpf MPH, Lelandais G. Statistical inference of the time-varying structure of gene-regulation networks. BMC SYSTEMS BIOLOGY 2010;4:130. [PMID: 20860793 PMCID: PMC2955603 DOI: 10.1186/1752-0509-4-130] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2010] [Accepted: 09/22/2010] [Indexed: 01/08/2023] Abstract Background Biological networks are highly dynamic in response to environmental and physiological cues. This variability is in contrast to conventional analyses of biological networks, which have overwhelmingly employed static graph models which stay constant over time to describe biological systems and their underlying molecular interactions. Methods To overcome these limitations, we propose here a new statistical modelling framework, the ARTIVA formalism (Auto Regressive TIme VArying models), and an associated inferential procedure that allows us to learn temporally varying gene-regulation networks from biological time-course expression data. ARTIVA simultaneously infers the topology of a regulatory network and how it changes over time. It allows us to recover the chronology of regulatory associations for individual genes involved in a specific biological process (development, stress response, etc.). Results We demonstrate that the ARTIVA approach generates detailed insights into the function and dynamics of complex biological systems and exploits efficiently time-course data in systems biology. In particular, two biological scenarios are analyzed: the developmental stages of Drosophila melanogaster and the response of Saccharomyces cerevisiae to benomyl poisoning. Conclusions ARTIVA does recover essential temporal dependencies in biological systems from transcriptional data, and provide a natural starting point to learn and investigate their dynamics in greater detail. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
93	Kelly WP, Stumpf MPH. Trees on networks: resolving statistical patterns of phylogenetic similarities among interacting proteins. BMC Bioinformatics 2010;11:470. [PMID: 20854660 PMCID: PMC2955699 DOI: 10.1186/1471-2105-11-470] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2010] [Accepted: 09/20/2010] [Indexed: 11/28/2022] Open Abstract Background Phylogenies capture the evolutionary ancestry linking extant species. Correlations and similarities among a set of species are mediated by and need to be understood in terms of the phylogenic tree. In a similar way it has been argued that biological networks also induce correlations among sets of interacting genes or their protein products. Results We develop suitable statistical resampling schemes that can incorporate these two potential sources of correlation into a single inferential framework. To illustrate our approach we apply it to protein interaction data in yeast and investigate whether the phylogenetic trees of interacting proteins in a panel of yeast species are more similar than would be expected by chance. Conclusions While we find only negligible evidence for such increased levels of similarities, our statistical approach allows us to resolve the previously reported contradictory results on the levels of co-evolution induced by protein-protein interactions. We conclude with a discussion as to how we may employ the statistical framework developed here in further functional and evolutionary analyses of biological networks and systems. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
94	Joly N, Engl C, Jovanovic G, Huvet M, Toni T, Sheng X, Stumpf MPH, Buck M. Managing membrane stress: the phage shock protein (Psp) response, from molecular mechanisms to physiology. FEMS Microbiol Rev 2010;34:797-827. [PMID: 20636484 DOI: 10.1111/j.1574-6976.2010.00240.x] [Citation(s) in RCA: 168] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open Abstract The bacterial phage shock protein (Psp) response functions to help cells manage the impacts of agents impairing cell membrane function. The system has relevance to biotechnology and to medicine. Originally discovered in Escherichia coli, Psp proteins and homologues are found in Gram-positive and Gram-negative bacteria, in archaea and in plants. Study of the E. coli and Yersinia enterocolitica Psp systems provides insights into how membrane-associated sensory Psp proteins might perceive membrane stress, signal to the transcription apparatus and use an ATP-hydrolysing transcription activator to produce effector proteins to overcome the stress. Progress in understanding the mechanism of signal transduction by the membrane-bound Psp proteins, regulation of the psp gene-specific transcription activator and the cell biology of the system is presented and discussed. Many features of the action of the Psp system appear to be dominated by states of self-association of the master effector, PspA, and the transcription activator, PspF, alongside a signalling pathway that displays strong conditionality in its requirement. Collapse Key Words Collapse MESH Headings Bacterial Physiological Phenomena Bacterial Proteins/metabolism Cell Membrane/metabolism Escherichia coli/physiology Escherichia coli Proteins/chemistry Escherichia coli Proteins/metabolism Heat-Shock Proteins/metabolism Membrane Proteins/metabolism Signal Transduction Stress, Physiological Trans-Activators/chemistry Trans-Activators/metabolism Collapse Grants BB/D521922/1 Biotechnology and Biological Sciences Research Council BB/F005210/1 Biotechnology and Biological Sciences Research Council Wellcome Trust Collapse
95	Liepe J, Barnes C, Cule E, Erguler K, Kirk P, Toni T, Stumpf MPH. ABC-SysBio--approximate Bayesian computation in Python with GPU support. Bioinformatics 2010;26:1797-9. [PMID: 20591907 PMCID: PMC2894518 DOI: 10.1093/bioinformatics/btq278] [Citation(s) in RCA: 104] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2010] [Revised: 04/16/2010] [Accepted: 05/24/2010] [Indexed: 11/16/2022] Open Abstract MOTIVATION The growing field of systems biology has driven demand for flexible tools to model and simulate biological systems. Two established problems in the modeling of biological processes are model selection and the estimation of associated parameters. A number of statistical approaches, both frequentist and Bayesian, have been proposed to answer these questions. RESULTS Here we present a Python package, ABC-SysBio, that implements parameter inference and model selection for dynamical systems in an approximate Bayesian computation (ABC) framework. ABC-SysBio combines three algorithms: ABC rejection sampler, ABC SMC for parameter inference and ABC SMC for model selection. It is designed to work with models written in Systems Biology Markup Language (SBML). Deterministic and stochastic models can be analyzed in ABC-SysBio. AVAILABILITY http://abc-sysbio.sourceforge.net Collapse Key Words Collapse MESH Headings Bayes Theorem Software Systems Biology/methods Collapse Grants BB/C519670/1 Biotechnology and Biological Sciences Research Council BB/F005210/1 Biotechnology and Biological Sciences Research Council Wellcome Trust Medical Research Council Collapse
96	Toni T, Stumpf MPH. Simulation-based model selection for dynamical systems in systems and population biology. ACTA ACUST UNITED AC 2009;26:104-10. [PMID: 19880371 PMCID: PMC2796821 DOI: 10.1093/bioinformatics/btp619] [Citation(s) in RCA: 122] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Abstract MOTIVATION Computer simulations have become an important tool across the biomedical sciences and beyond. For many important problems several different models or hypotheses exist and choosing which one best describes reality or observed data is not straightforward. We therefore require suitable statistical tools that allow us to choose rationally between different mechanistic models of, e.g. signal transduction or gene regulation networks. This is particularly challenging in systems biology where only a small number of molecular species can be assayed at any given time and all measurements are subject to measurement uncertainty. RESULTS Here, we develop such a model selection framework based on approximate Bayesian computation and employing sequential Monte Carlo sampling. We show that our approach can be applied across a wide range of biological scenarios, and we illustrate its use on real data describing influenza dynamics and the JAK-STAT signalling pathway. Bayesian model selection strikes a balance between the complexity of the simulation models and their ability to describe observed data. The present approach enables us to employ the whole formal apparatus to any system that can be (efficiently) simulated, even when exact likelihoods are computationally intractable. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
97	Secrier M, Toni T, Stumpf MPH. The ABC of reverse engineering biological signalling systems. MOLECULAR BIOSYSTEMS 2009;5:1925-35. [PMID: 19798456 DOI: 10.1039/b908951a] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Abstract Modelling biological systems would be straightforward if we knew the structure of the model and the parameters governing their dynamics. For the overwhelming majority of biological processes, however, such parameter values are unknown and often impossible to measure directly. This means that we have to estimate or infer these parameters from observed data. Here we argue that it is also important to appreciate the uncertainty inherent in these estimates. We discuss a statistical approach--approximate Bayesian computation (ABC)--which allows us to approximate the posterior distribution over parameters and show how this can add insights into our understanding of the system dynamics. We illustrate the application of this approach and how the resulting posterior distribution can be analyzed in the context of the mitogen-activated protein kinase phosphorylation cascade. Our analysis also highlights the added benefit of using the distribution of parameters rather than point estimates of parameter values when considering the notion of sloppy models in systems biology. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
98	Kirk PDW, Stumpf MPH. Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data. ACTA ACUST UNITED AC 2009;25:1300-6. [PMID: 19289448 PMCID: PMC2677737 DOI: 10.1093/bioinformatics/btp139] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Motivation: Although widely accepted that high-throughput biological data are typically highly noisy, the effects that this uncertainty has upon the conclusions we draw from these data are often overlooked. However, in order to assign any degree of confidence to our conclusions, we must quantify these effects. Bootstrap resampling is one method by which this may be achieved. Here, we present a parametric bootstrapping approach for time-course data, in which Gaussian process regression (GPR) is used to fit a probabilistic model from which replicates may then be drawn. This approach implicitly allows the time dependence of the data to be taken into account, and is applicable to a wide range of problems. Results: We apply GPR bootstrapping to two datasets from the literature. In the first example, we show how the approach may be used to investigate the effects of data uncertainty upon the estimation of parameters in an ordinary differential equations (ODE) model of a cell signalling pathway. Although we find that the parameter estimates inferred from the original dataset are relatively robust to data uncertainty, we also identify a distinct second set of estimates. In the second example, we use our method to show that the topology of networks constructed from time-course gene expression data appears to be sensitive to data uncertainty, although there may be individual edges in the network that are robust in light of present data. Availability: Matlab code for performing GPR bootstrapping is available from our web site: http://www3.imperial.ac.uk/theoreticalsystemsbiology/data-software/ Contact:paul.kirk@imperial.ac.uk, m.stumpf@imperial.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
99	Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf MPH. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 2009;6:187-202. [PMID: 19205079 DOI: 10.1098/rsif.2008.0172] [Citation(s) in RCA: 658] [Impact Index Per Article: 43.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open Abstract Approximate Bayesian computation (ABC) methods can be used to evaluate posterior distributions without having to calculate likelihoods. In this paper, we discuss and apply an ABC method based on sequential Monte Carlo (SMC) to estimate parameters of dynamical models. We show that ABC SMC provides information about the inferability of parameters and model sensitivity to changes in parameters, and tends to perform better than other ABC approaches. The algorithm is applied to several well-known biological systems, for which parameters and their credible intervals are inferred. Moreover, we develop ABC SMC as a tool for model selection; given a range of different mathematical descriptions, ABC SMC is able to choose the best model using the standard Bayesian model selection apparatus. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
100	Ingram PJ, Stumpf MPH, Stark J. Nonidentifiability of the source of intrinsic noise in gene expression from single-burst data. PLoS Comput Biol 2008;4:e1000192. [PMID: 18846201 PMCID: PMC2538572 DOI: 10.1371/journal.pcbi.1000192] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2006] [Accepted: 08/25/2008] [Indexed: 12/03/2022] Open Abstract Over the last few years, experimental data on the fluctuations in gene activity between individual cells and within the same cell over time have confirmed that gene expression is a “noisy” process. This variation is in part due to the small number of molecules taking part in some of the key reactions that are involved in gene expression. One of the consequences of this is that protein production often occurs in bursts, each due to a single promoter or transcription factor binding event. Recently, the distribution of the number of proteins produced in such bursts has been experimentally measured, offering a unique opportunity to study the relative importance of different sources of noise in gene expression. Here, we provide a derivation of the theoretical probability distribution of these bursts for a wide variety of different models of gene expression. We show that there is a good fit between our theoretical distribution and that obtained from two different published experimental datasets. We then prove that, irrespective of the details of the model, the burst size distribution is always geometric and hence determined by a single parameter. Many different combinations of the biochemical rates for the constituent reactions of both transcription and translation will therefore lead to the same experimentally observed burst size distribution. It is thus impossible to identify different sources of fluctuations purely from protein burst size data or to use such data to estimate all of the model parameters. We explore methods of inferring these values when additional types of experimental data are available. Recent experimental data showing fluctuations in gene activity between individual cells and within the same cell over time confirm that gene expression is a “noisy” process. This variation is partly due to the small number of molecules involved in gene expression. One consequence is that protein production often occurs in bursts, each due to the binding of a single transcription factor. Recently, the distribution of the number of proteins produced in such bursts has been experimentally measured, offering a unique opportunity to study the relative importance of different sources of noise in gene expression. We derive the theoretical probability distribution of these bursts for a wide variety of gene expression models. We show a good fit between our theoretical distribution and experimental data and prove that, irrespective of the model details, the burst size distribution always has the same shape, determined by a single parameter. As different combinations of the reaction rates lead to the same observed distribution, it is impossible to estimate all kinetic parameters from protein burst size data. When additional data, such as protein equilibrium distributions, are available, these can be used to infer additional parameters. We present one approach to this, demonstrating its application to published data. Collapse Key Words Collapse MESH Headings Computational Biology Data Interpretation, Statistical Gene Expression Gene Expression Profiling/statistics & numerical data Models, Genetic Models, Statistical Proteins/genetics Proteins/metabolism RNA, Messenger/genetics RNA, Messenger/metabolism Collapse Grants BB/F005210/1 Biotechnology and Biological Sciences Research Council Wellcome Trust BB/C519670/1 Biotechnology and Biological Sciences Research Council Collapse