1
|
Wilsdorf P, Reinhardt O, Prike T, Hinsch M, Bijak J, Uhrmacher AM. Simulation studies of social systems: telling the story based on provenance patterns. ROYAL SOCIETY OPEN SCIENCE 2024; 11:240258. [PMID: 39113768 PMCID: PMC11304336 DOI: 10.1098/rsos.240258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 05/24/2024] [Accepted: 05/29/2024] [Indexed: 08/10/2024]
Abstract
Social simulation studies are complex. They typically combine various data sources and hypotheses about the system's mechanisms that are integrated by intertwined processes of model building, simulation experiment execution and analysis. Various documentation approaches exist to increase the transparency and traceability of complex social simulation studies. Provenance standards enable the formalization of information on sources and activities, which contribute to the generation of an entity, in a queryable and computationally accessible manner. Provenance patterns can be defined as constraints on the relationships between specific types of activities and entities of a simulation study. In this paper, we refine the provenance pattern-based approach to address specific challenges of social agent-based simulation studies. Specifically, we focus on the activities and entities involved in collecting and analysing primary data about human decisions, and the collection and quality assessment of secondary data. We illustrate the potential of this approach by applying it to central activities and results of an agent-based simulation project and by presenting its implementation in a web-based tool.
Collapse
Affiliation(s)
- Pia Wilsdorf
- Institute for Visual and Analytic Computing, University of Rostock, Rostock, Germany
| | - Oliver Reinhardt
- Institute for Visual and Analytic Computing, University of Rostock, Rostock, Germany
| | - Toby Prike
- School of Psychological Science, The University of Western Australia, Perth, Australia
| | - Martin Hinsch
- MRC/CSO Social and Public Health Sciences Unit, University of Glasgow, Glasgow, UK
| | - Jakub Bijak
- Department of Social Statistics and Demography, University of Southampton, Southampton, UK
| | - Adelinde M. Uhrmacher
- Institute for Visual and Analytic Computing, University of Rostock, Rostock, Germany
| |
Collapse
|
2
|
Gricourt G, Duigou T, Dérozier S, Faulon JL. neo4jsbml: import systems biology markup language data into the graph database Neo4j. PeerJ 2024; 12:e16726. [PMID: 38250720 PMCID: PMC10798154 DOI: 10.7717/peerj.16726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 12/05/2023] [Indexed: 01/23/2024] Open
Abstract
Systems Biology Markup Language (SBML) has emerged as a standard for representing biological models, facilitating model sharing and interoperability. It stores many types of data and complex relationships, complicating data management and analysis. Traditional database management systems struggle to effectively capture these complex networks of interactions within biological systems. Graph-oriented databases perform well in managing interactions between different entities. We present neo4jsbml, a new solution that bridges the gap between the Systems Biology Markup Language data and the Neo4j database, for storing, querying and analyzing data. The Systems Biology Markup Language organizes biological entities in a hierarchical structure, reflecting their interdependencies. The inherent graphical structure represents these hierarchical relationships, offering a natural and efficient means of navigating and exploring the model's components. Neo4j is an excellent solution for handling this type of data. By representing entities as nodes and their relationships as edges, Cypher, Neo4j's query language, efficiently traverses this type of graph representing complex biological networks. We have developed neo4jsbml, a Python library for importing Systems Biology Markup Language data into a Neo4j database using a user-defined schema. By leveraging Neo4j's graphical database technology, exploration of complex biological networks becomes intuitive and information retrieval efficient. Neo4jsbml is a tool designed to import Systems Biology Markup Language data into a Neo4j database. Only the desired data is loaded into the Neo4j database. neo4jsbml is user-friendly and can become a useful new companion for visualizing and analyzing metabolic models through the Neo4j graphical database. neo4jsbml is open source software and available at https://github.com/brsynth/neo4jsbml.
Collapse
Affiliation(s)
- Guillaume Gricourt
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, Jouy-en-Josas, France
| | - Thomas Duigou
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, Jouy-en-Josas, France
| | - Sandra Dérozier
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
| | - Jean-Loup Faulon
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, Jouy-en-Josas, France
| |
Collapse
|
3
|
Gütebier L, Bleimehl T, Henkel R, Munro J, Müller S, Morgner A, Laenge J, Pachauer A, Erdl A, Weimar J, Walther Langendorf K, Vialard V, Liebig T, Preusse M, Waltemath D, Jarasch A. CovidGraph: A Graph to fight COVID-19. Bioinformatics 2022; 38:4843-4845. [PMID: 36040169 PMCID: PMC9563682 DOI: 10.1093/bioinformatics/btac592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 05/20/2022] [Accepted: 08/29/2022] [Indexed: 11/12/2022] Open
Abstract
Summary Reliable and integrated data are prerequisites for effective research on the recent coronavirus disease 2019 (COVID-19) pandemic. The CovidGraph project integrates and connects heterogeneous COVID-19 data in a knowledge graph, referred to as ‘CovidGraph’. It provides easy access to multiple data sources through a single point of entry and enables flexible data exploration. Availability and Implementation More information on CovidGraph is available from the project website: https://healthecco.org/covidgraph/. Source code and documentation are provided on GitHub: https://github.com/covidgraph. Supplementary information Supplementary data is available at Bioinformatics online.
Collapse
Affiliation(s)
- Lea Gütebier
- Medical Informatics Laboratory, University Medicine Greifswald, Greifswald, 17475, Germany.,HealthECCO, Freiburg, 79098, Germany
| | - Tim Bleimehl
- HealthECCO, Freiburg, 79098, Germany.,German Center for Diabetes Research (DZD), Neuherberg, 85764, Germany
| | - Ron Henkel
- Medical Informatics Laboratory, University Medicine Greifswald, Greifswald, 17475, Germany
| | | | | | | | | | | | | | | | | | | | | | | | - Dagmar Waltemath
- Medical Informatics Laboratory, University Medicine Greifswald, Greifswald, 17475, Germany
| | - Alexander Jarasch
- HealthECCO, Freiburg, 79098, Germany.,German Center for Diabetes Research (DZD), Neuherberg, 85764, Germany
| |
Collapse
|
4
|
Newcomb K, Smith ME, Donohue RE, Wyngaard S, Reinking C, Sweet CR, Levine MJ, Unnasch TR, Michael E. Iterative data-driven forecasting of the transmission and management of SARS-CoV-2/COVID-19 using social interventions at the county-level. Sci Rep 2022; 12:890. [PMID: 35042958 PMCID: PMC8766467 DOI: 10.1038/s41598-022-04899-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 12/23/2021] [Indexed: 12/24/2022] Open
Abstract
The control of the initial outbreak and spread of SARS-CoV-2/COVID-19 via the application of population-wide non-pharmaceutical mitigation measures have led to remarkable successes in dampening the pandemic globally. However, with countries beginning to ease or lift these measures fully to restart activities, concern is growing regarding the impacts that such reopening of societies could have on the subsequent transmission of the virus. While mathematical models of COVID-19 transmission have played important roles in evaluating the impacts of these measures for curbing virus transmission, a key need is for models that are able to effectively capture the effects of the spatial and social heterogeneities that drive the epidemic dynamics observed at the local community level. Iterative forecasting that uses new incoming epidemiological and social behavioral data to sequentially update locally-applicable transmission models can overcome this gap, potentially resulting in better predictions and policy actions. Here, we present the development of one such data-driven iterative modelling tool based on publicly available data and an extended SEIR model for forecasting SARS-CoV-2 at the county level in the United States. Using data from the state of Florida, we demonstrate the utility of such a system for exploring the outcomes of the social measures proposed by policy makers for containing the course of the pandemic. We provide comprehensive results showing how the locally identified models could be employed for accessing the impacts and societal tradeoffs of using specific social protective strategies. We conclude that it could have been possible to lift the more disruptive social interventions related to movement restriction/social distancing measures earlier if these were accompanied by widespread testing and contact tracing. These intensified social interventions could have potentially also brought about the control of the epidemic in low- and some medium-incidence county settings first, supporting the development and deployment of a geographically-phased approach to reopening the economy of Florida. We have made our data-driven forecasting system publicly available for policymakers and health officials to use in their own locales, so that a more efficient coordinated strategy for controlling SARS-CoV-2 region-wide can be developed and successfully implemented.
Collapse
Affiliation(s)
- Ken Newcomb
- Center for Global Health Infectious Disease Research, University of South Florida, Tampa, FL, USA
| | - Morgan E Smith
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA
| | - Rose E Donohue
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA
| | - Sebastian Wyngaard
- Center for Research Computing, University of Notre Dame, Notre Dame, IN, USA
| | - Caleb Reinking
- Center for Research Computing, University of Notre Dame, Notre Dame, IN, USA
| | - Christopher R Sweet
- Center for Research Computing, University of Notre Dame, Notre Dame, IN, USA
| | - Marissa J Levine
- Center for Leadership in Public Health Practice, University of South Florida, Tampa, FL, USA
| | - Thomas R Unnasch
- Center for Global Health Infectious Disease Research, University of South Florida, Tampa, FL, USA
| | - Edwin Michael
- Center for Global Health Infectious Disease Research, University of South Florida, Tampa, FL, USA.
| |
Collapse
|
5
|
Kim L, Yahia E, Segonds F, Véron P, Mallet A. i-Dataquest: A heterogeneous information retrieval tool using data graph for the manufacturing industry. COMPUT IND 2021. [DOI: 10.1016/j.compind.2021.103527] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
6
|
Timón-Reina S, Rincón M, Martínez-Tomás R. An overview of graph databases and their applications in the biomedical domain. Database (Oxford) 2021; 2021:baab026. [PMID: 34003247 PMCID: PMC8130509 DOI: 10.1093/database/baab026] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2020] [Revised: 03/24/2021] [Accepted: 04/30/2021] [Indexed: 01/18/2023]
Abstract
Over the past couple of decades, the explosion of densely interconnected data has stimulated the research, development and adoption of graph database technologies. From early graph models to more recent native graph databases, the landscape of implementations has evolved to cover enterprise-ready requirements. Because of the interconnected nature of its data, the biomedical domain has been one of the early adopters of graph databases, enabling more natural representation models and better data integration workflows, exploration and analysis facilities. In this work, we survey the literature to explore the evolution, performance and how the most recent graph database solutions are applied in the biomedical domain, compiling a great variety of use cases. With this evidence, we conclude that the available graph database management systems are fit to support data-intensive, integrative applications, targeted at both basic research and exploratory tasks closer to the clinic.
Collapse
Affiliation(s)
- Santiago Timón-Reina
- Departamento de Inteligencia Artificial, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal, 16 Ciudad Universitaria, Madrid 28040, Spain
| | - Mariano Rincón
- Departamento de Inteligencia Artificial, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal, 16 Ciudad Universitaria, Madrid 28040, Spain
| | - Rafael Martínez-Tomás
- Departamento de Inteligencia Artificial, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal, 16 Ciudad Universitaria, Madrid 28040, Spain
| |
Collapse
|
7
|
Rougny A, Touré V, Albanese J, Waltemath D, Shirshov D, Sorokin A, Bader GD, Blinov ML, Mazein A. SBGN Bricks Ontology as a tool to describe recurring concepts in molecular networks. Brief Bioinform 2021; 22:6184415. [PMID: 33758926 PMCID: PMC8425392 DOI: 10.1093/bib/bbab049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 01/13/2021] [Indexed: 11/14/2022] Open
Abstract
A comprehensible representation of a molecular network is key to communicating and understanding scientific results in systems biology. The Systems Biology Graphical Notation (SBGN) has emerged as the main standard to represent such networks graphically. It has been implemented by different software tools, and is now largely used to communicate maps in scientific publications. However, learning the standard, and using it to build large maps, can be tedious. Moreover, SBGN maps are not grounded on a formal semantic layer and therefore do not enable formal analysis. Here, we introduce a new set of patterns representing recurring concepts encountered in molecular networks, called SBGN bricks. The bricks are structured in a new ontology, the Bricks Ontology (BKO), to define clear semantics for each of the biological concepts they represent. We show the usefulness of the bricks and BKO for both the template-based construction and the semantic annotation of molecular networks. The SBGN bricks and BKO can be freely explored and downloaded at sbgnbricks.org.
Collapse
Affiliation(s)
- Adrien Rougny
- Corresponding authors: Adrien Rougny, Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and Technology (AIST), Aomi, Tokyo, Japan and Com. Bio Big Data Open Innovation Lab. (CBBD-OIL), AIST, Aomi, Tokyo, Japan; E-mail: ; Michael L. Blinov, R. D. Berlin Center for Cell Analysis and Modeling, University of Connecticut School of Medicine, Farmington, CT 06030, USA; E-mail: ; Alexander Mazein, European Institute for Systems Biology and Medicine, CIRI UMR5308, CNRS-ENS-UCBL-INSERM, Université de Lyon, 50 Avenue Tony Garnier, 69007 Lyon, France; Institute of Cell Biophysics, Russian Academy of Sciences, 3 Institutskaya Street, Pushchino, Moscow Region, 142290, Russia; Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, L-4367 Belvaux, Luxembourg; E-mail:
| | - Vasundra Touré
- Norwegian University of Science and Technology (NTNU), Høgskoleringen 5, Realfagbygget, 7491 Trondheim, Norway
| | - John Albanese
- R. D. Berlin Center for Cell Analysis and Modeling, University of Connecticut School of Medicine, Farmington, CT 06030, USA
| | - Dagmar Waltemath
- Medical Informatics Laboratory, Institute for Community Medicine, University Medicine Greifswald, D-17475 Greifswald, Germany
| | - Denis Shirshov
- European Institute for Systems Biology and Medicine, CIRI UMR5308, CNRS-ENS-UCBL-INSERM, Université de Lyon, 50 Avenue Tony Garnier, 69007 Lyon, France
- Institute of Cell Biophysics, Russian Academy of Sciences, 3 Institutskaya Street, Pushchino, Moscow Region, 142290, Russia
| | - Anatoly Sorokin
- Institute of Cell Biophysics, Russian Academy of Sciences, 3 Institutskaya Street, Pushchino, Moscow Region, 142290, Russia
- Moscow Institute of Physics and Technology, 9 Institutsky per., Dolgoprudny, Moscow Region, 141700, Russia
- University of Liverpool, Liverpool L7 3EA, UK
| | - Gary D Bader
- The Donnelly Centre, University of Toronto, M5S 3E1, Toronto, Canada
| | - Michael L Blinov
- Corresponding authors: Adrien Rougny, Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and Technology (AIST), Aomi, Tokyo, Japan and Com. Bio Big Data Open Innovation Lab. (CBBD-OIL), AIST, Aomi, Tokyo, Japan; E-mail: ; Michael L. Blinov, R. D. Berlin Center for Cell Analysis and Modeling, University of Connecticut School of Medicine, Farmington, CT 06030, USA; E-mail: ; Alexander Mazein, European Institute for Systems Biology and Medicine, CIRI UMR5308, CNRS-ENS-UCBL-INSERM, Université de Lyon, 50 Avenue Tony Garnier, 69007 Lyon, France; Institute of Cell Biophysics, Russian Academy of Sciences, 3 Institutskaya Street, Pushchino, Moscow Region, 142290, Russia; Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, L-4367 Belvaux, Luxembourg; E-mail:
| | - Alexander Mazein
- Corresponding authors: Adrien Rougny, Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and Technology (AIST), Aomi, Tokyo, Japan and Com. Bio Big Data Open Innovation Lab. (CBBD-OIL), AIST, Aomi, Tokyo, Japan; E-mail: ; Michael L. Blinov, R. D. Berlin Center for Cell Analysis and Modeling, University of Connecticut School of Medicine, Farmington, CT 06030, USA; E-mail: ; Alexander Mazein, European Institute for Systems Biology and Medicine, CIRI UMR5308, CNRS-ENS-UCBL-INSERM, Université de Lyon, 50 Avenue Tony Garnier, 69007 Lyon, France; Institute of Cell Biophysics, Russian Academy of Sciences, 3 Institutskaya Street, Pushchino, Moscow Region, 142290, Russia; Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, L-4367 Belvaux, Luxembourg; E-mail:
| |
Collapse
|
8
|
Keating SM, Waltemath D, König M, Zhang F, Dräger A, Chaouiya C, Bergmann FT, Finney A, Gillespie CS, Helikar T, Hoops S, Malik‐Sheriff RS, Moodie SL, Moraru II, Myers CJ, Naldi A, Olivier BG, Sahle S, Schaff JC, Smith LP, Swat MJ, Thieffry D, Watanabe L, Wilkinson DJ, Blinov ML, Begley K, Faeder JR, Gómez HF, Hamm TM, Inagaki Y, Liebermeister W, Lister AL, Lucio D, Mjolsness E, Proctor CJ, Raman K, Rodriguez N, Shaffer CA, Shapiro BE, Stelling J, Swainston N, Tanimura N, Wagner J, Meier‐Schellersheim M, Sauro HM, Palsson B, Bolouri H, Kitano H, Funahashi A, Hermjakob H, Doyle JC, Hucka M. SBML Level 3: an extensible format for the exchange and reuse of biological models. Mol Syst Biol 2020; 16:e9110. [PMID: 32845085 PMCID: PMC8411907 DOI: 10.15252/msb.20199110] [Citation(s) in RCA: 126] [Impact Index Per Article: 25.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Revised: 06/24/2020] [Accepted: 07/09/2020] [Indexed: 12/25/2022] Open
Abstract
Systems biology has experienced dramatic growth in the number, size, and complexity of computational models. To reproduce simulation results and reuse models, researchers must exchange unambiguous model descriptions. We review the latest edition of the Systems Biology Markup Language (SBML), a format designed for this purpose. A community of modelers and software authors developed SBML Level 3 over the past decade. Its modular form consists of a core suited to representing reaction-based models and packages that extend the core with features suited to other model types including constraint-based models, reaction-diffusion models, logical network models, and rule-based models. The format leverages two decades of SBML and a rich software ecosystem that transformed how systems biologists build and interact with models. More recently, the rise of multiscale models of whole cells and organs, and new data sources such as single-cell measurements and live imaging, has precipitated new ways of integrating data with models. We provide our perspectives on the challenges presented by these developments and how SBML Level 3 provides the foundation needed to support this evolution.
Collapse
|
9
|
Semantic Data Management for a Virtual Factory Collaborative Environment. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9224936] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Recent developments in the area of cyber-physical systems (CPSs) and Internet of Things (IoT) are among the drivers for the emergence of the Industry 4.0 concept, setting new requirements for the architecture, technology, and design approaches of modern industrial systems. Industry 4.0 assumes a higher level of intelligence, and thus autonomy of the systems and subsystems, and a larger focus on the analysis of gathered data for further utilization. The Virtual Factory Open Operating System (vf-OS) project is intended to respond to some of these key challenges, in particular for the smart factory application domain. Complementarily, data and knowledge storage and processing are also in the scope of vf-OS. This article introduces the semantic management component of vf-OS, which aims to analyze the interrelations among stored entities, as well as to define the closeness among them to generate meaningful suggestions, which can be later used by other subsystems or operators in a user-friendly way. The semantic managing system makes use of non relational approaches, namely a graph database, which enables data to be represented as graphs for further semantic querying. The developed prototype and an illustrative application case are also presented.
Collapse
|
10
|
Neal ML, König M, Nickerson D, Mısırlı G, Kalbasi R, Dräger A, Atalag K, Chelliah V, Cooling MT, Cook DL, Crook S, de Alba M, Friedman SH, Garny A, Gennari JH, Gleeson P, Golebiewski M, Hucka M, Juty N, Myers C, Olivier BG, Sauro HM, Scharm M, Snoep JL, Touré V, Wipat A, Wolkenhauer O, Waltemath D. Harmonizing semantic annotations for computational models in biology. Brief Bioinform 2019; 20:540-550. [PMID: 30462164 PMCID: PMC6433895 DOI: 10.1093/bib/bby087] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 08/08/2018] [Accepted: 08/17/2018] [Indexed: 02/06/2023] Open
Abstract
Life science researchers use computational models to articulate and test hypotheses about the behavior of biological systems. Semantic annotation is a critical component for enhancing the interoperability and reusability of such models as well as for the integration of the data needed for model parameterization and validation. Encoded as machine-readable links to knowledge resource terms, semantic annotations describe the computational or biological meaning of what models and data represent. These annotations help researchers find and repurpose models, accelerate model composition and enable knowledge integration across model repositories and experimental data stores. However, realizing the potential benefits of semantic annotation requires the development of model annotation standards that adhere to a community-based annotation protocol. Without such standards, tool developers must account for a variety of annotation formats and approaches, a situation that can become prohibitively cumbersome and which can defeat the purpose of linking model elements to controlled knowledge resource terms. Currently, no consensus protocol for semantic annotation exists among the larger biological modeling community. Here, we report on the landscape of current annotation practices among the COmputational Modeling in BIology NEtwork community and provide a set of recommendations for building a consensus approach to semantic annotation.
Collapse
Affiliation(s)
- Maxwell Lewis Neal
- Seattle Children’s Research Institute, Center for Global Infectious Disease Research, Seattle, USA
| | - Matthias König
- Department of Biology, Humboldt-University Berlin, Institute for Theoretical Biology, Berlin, Germany
| | - David Nickerson
- Auckland Bioengineering Institute, University of Auckland, Auckland, NZ
| | - Göksel Mısırlı
- School of Computing and Mathematics, Keele University, Keele, UK
| | - Reza Kalbasi
- Auckland Bioengineering Institute, University of Auckland, Auckland, NZ
| | - Andreas Dräger
- Computational Systems Biology of Infection and Antimicrobial-Resistant Pathogens, Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Tübingen, Germany
- Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Koray Atalag
- Auckland Bioengineering Institute, University of Auckland, Auckland, NZ
| | - Vijayalakshmi Chelliah
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Michael T Cooling
- Auckland Bioengineering Institute, University of Auckland, Auckland, NZ
| | - Daniel L Cook
- Department of Physiology and Biophysics, University of Washington, Seattle, WA, USA
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Sharon Crook
- School of Mathematical and Statistical Sciences, Arizona State University, Tempe, USA
| | - Miguel de Alba
- German Federal Institute for Risk Assessment, Berlin, Germany
| | | | - Alan Garny
- Auckland Bioengineering Institute, University of Auckland, Auckland, NZ
| | - John H Gennari
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Padraig Gleeson
- Department of Neuroscience, Physiology and Pharmacology, University College London, London, UK
| | - Martin Golebiewski
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Heidelberg, Germany
| | - Michael Hucka
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Nick Juty
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Chris Myers
- Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, UT, USA
| | - Brett G Olivier
- Systems Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Modelling of Biological Processes, BioQUANT/COS, Heidelberg University, Germany
| | - Herbert M Sauro
- Department of Bioengineering, University of Washington, Seattle, WA, USA
| | - Martin Scharm
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany
| | - Jacky L Snoep
- Department of Biochemistry, Stellenbosch University, Matieland, South Africa
- Department of Molecular Cell Physiology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Manchester Institute for Biotechnology, University of Manchester, Manchester, UK
| | - Vasundra Touré
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Anil Wipat
- School of Computing Science, Newcastle University, Newcastle upon Tyne, UK
| | - Olaf Wolkenhauer
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany
- Stellenbosch Institute for Advanced Study (STIAS), Stellenbosch, South Africa
| | - Dagmar Waltemath
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany
| |
Collapse
|
11
|
Yang PC, Purawat S, Ieong PU, Jeng MT, DeMarco KR, Vorobyov I, McCulloch AD, Altintas I, Amaro RE, Clancy CE. A demonstration of modularity, reuse, reproducibility, portability and scalability for modeling and simulation of cardiac electrophysiology using Kepler Workflows. PLoS Comput Biol 2019; 15:e1006856. [PMID: 30849072 PMCID: PMC6426265 DOI: 10.1371/journal.pcbi.1006856] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 03/20/2019] [Accepted: 02/08/2019] [Indexed: 01/18/2023] Open
Abstract
Multi-scale computational modeling is a major branch of computational biology as evidenced by the US federal interagency Multi-Scale Modeling Consortium and major international projects. It invariably involves specific and detailed sequences of data analysis and simulation, often with multiple tools and datasets, and the community recognizes improved modularity, reuse, reproducibility, portability and scalability as critical unmet needs in this area. Scientific workflows are a well-recognized strategy for addressing these needs in scientific computing. While there are good examples if the use of scientific workflows in bioinformatics, medical informatics, biomedical imaging and data analysis, there are fewer examples in multi-scale computational modeling in general and cardiac electrophysiology in particular. Cardiac electrophysiology simulation is a mature area of multi-scale computational biology that serves as an excellent use case for developing and testing new scientific workflows. In this article, we develop, describe and test a computational workflow that serves as a proof of concept of a platform for the robust integration and implementation of a reusable and reproducible multi-scale cardiac cell and tissue model that is expandable, modular and portable. The workflow described leverages Python and Kepler-Python actor for plotting and pre/post-processing. During all stages of the workflow design, we rely on freely available open-source tools, to make our workflow freely usable by scientists. We present a computational workflow as a proof of concept for integration and implementation of a reusable and reproducible cardiac multi-scale electrophysiology model that is expandable, modular and portable. This framework enables scientists to create intuitive, user-friendly and flexible end-to-end automated scientific workflows using a graphical user interface. Kepler is an advanced open-source platform that supports multiple models of computation. The underlying workflow engine handles scalability, provenance, reproducibility aspects of the code, performs orchestration of data flow, and automates execution on heterogeneous computing resources. One of the main advantages of workflow utilization is the integration of code written in multiple languages Standardization occurs at the interfaces of the workflow elements and allows for general applications and easy comparison and integration of code from different research groups or even multiple programmers coding in different languages for various purposes from the same group. A workflow driven problem-solving approach enables domain scientists to focus on resolving the core science questions, and delegates the computational and process management burden to the underlying Workflow. The workflow driven approach allows scaling the computational experiment with distributed data-parallel execution on multiple computing platforms, such as, HPC resources, GPU clusters, Cloud etc. The workflow framework tracks software version information along with hardware information to allow users an opportunity to trace any variation in workflow outcome to the system configurations.
Collapse
Affiliation(s)
- Pei-Chi Yang
- Department of Physiology and Membrane Biology, Department of Pharmacology, School of Medicine, University of California Davis, Davis, California, United States of America
| | - Shweta Purawat
- San Diego Supercomputer Center (SDSC), University of California, San Diego, La Jolla, California, United States of America
| | - Pek U. Ieong
- Department of Chemistry and Biochemistry, National Biomedical Computation Resource, Drug Design Data Resource (D3R), University of California San Diego, La Jolla, California, United States of America
| | - Mao-Tsuen Jeng
- Department of Physiology and Membrane Biology, Department of Pharmacology, School of Medicine, University of California Davis, Davis, California, United States of America
| | - Kevin R. DeMarco
- Department of Physiology and Membrane Biology, Department of Pharmacology, School of Medicine, University of California Davis, Davis, California, United States of America
| | - Igor Vorobyov
- Department of Physiology and Membrane Biology, Department of Pharmacology, School of Medicine, University of California Davis, Davis, California, United States of America
| | - Andrew D. McCulloch
- Departments of Bioengineering and Medicine, University of California, San Diego, La Jolla, California, United States of America
| | - Ilkay Altintas
- San Diego Supercomputer Center (SDSC), University of California, San Diego, La Jolla, California, United States of America
| | - Rommie E. Amaro
- Department of Chemistry and Biochemistry, National Biomedical Computation Resource, Drug Design Data Resource (D3R), University of California San Diego, La Jolla, California, United States of America
| | - Colleen E. Clancy
- Department of Physiology and Membrane Biology, Department of Pharmacology, School of Medicine, University of California Davis, Davis, California, United States of America
- * E-mail:
| |
Collapse
|
12
|
Stanford NJ, Scharm M, Dobson PD, Golebiewski M, Hucka M, Kothamachu VB, Nickerson D, Owen S, Pahle J, Wittig U, Waltemath D, Goble C, Mendes P, Snoep J. Data Management in Computational Systems Biology: Exploring Standards, Tools, Databases, and Packaging Best Practices. Methods Mol Biol 2019; 2049:285-314. [PMID: 31602618 DOI: 10.1007/978-1-4939-9736-7_17] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Computational systems biology involves integrating heterogeneous datasets in order to generate models. These models can assist with understanding and prediction of biological phenomena. Generating datasets and integrating them into models involves a wide range of scientific expertise. As a result these datasets are often collected by one set of researchers, and exchanged with others researchers for constructing the models. For this process to run smoothly the data and models must be FAIR-findable, accessible, interoperable, and reusable. In order for data and models to be FAIR they must be structured in consistent and predictable ways, and described sufficiently for other researchers to understand them. Furthermore, these data and models must be shared with other researchers, with appropriately controlled sharing permissions, before and after publication. In this chapter we explore the different data and model standards that assist with structuring, describing, and sharing. We also highlight the popular standards and sharing databases within computational systems biology.
Collapse
Affiliation(s)
| | - Martin Scharm
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany
| | - Paul D Dobson
- School of Computer Science, University of Manchester, Manchester, UK
| | - Martin Golebiewski
- Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany
| | - Michael Hucka
- Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | | | - David Nickerson
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Stuart Owen
- School of Computer Science, University of Manchester, Manchester, UK
| | - Jürgen Pahle
- BIOMS/BioQuant, Heidelberg University, Heidelberg, Germany.
| | - Ulrike Wittig
- Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany
| | - Dagmar Waltemath
- Medical Informatics, University Medicine Greifswald, Greifswald, Germany
| | - Carole Goble
- School of Computer Science, University of Manchester, Manchester, UK
| | - Pedro Mendes
- Centre for Quantitative Medicine, University of Connecticut, Farmington, CT, USA
| | - Jacky Snoep
- School of Computer Science, University of Manchester, Manchester, UK.,Biochemistry, Stellenbosch University, Stellenbosch, South Africa
| |
Collapse
|
13
|
Reactome graph database: Efficient access to complex pathway data. PLoS Comput Biol 2018; 14:e1005968. [PMID: 29377902 PMCID: PMC5805351 DOI: 10.1371/journal.pcbi.1005968] [Citation(s) in RCA: 157] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 02/08/2018] [Accepted: 01/10/2018] [Indexed: 11/19/2022] Open
Abstract
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.
Collapse
|
14
|
Lambusch F, Waltemath D, Wolkenhauer O, Sandkuhl K, Rosenke C, Henkel R. Identifying frequent patterns in biochemical reaction networks: a workflow. Database (Oxford) 2018; 2018:5048438. [PMID: 29992320 PMCID: PMC6030809 DOI: 10.1093/database/bay051] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Revised: 04/27/2018] [Accepted: 04/29/2018] [Indexed: 11/15/2022]
Abstract
Computational models in biology encode molecular and cell biological processes. Many of these models can be represented as biochemical reaction networks. Studying such networks, one is mostly interested in systems that share similar reactions and mechanisms. Typical goals of an investigation thus include understanding of model parts, identification of reoccurring patterns and recognition of biologically relevant motifs. The large number and size of available models, however, require automated methods to support researchers in achieving their goals. Specifically for the problem of finding patterns in large networks only partial solutions exist. We propose a workflow that identifies frequent structural patterns in biochemical reaction networks encoded in the Systems Biology Markup Language. The workflow utilizes a subgraph mining algorithm to detect the network patterns. Once patterns are identified, the textual pattern description can automatically be converted into a graphical representation. Furthermore, information about the distribution of patterns among a selected set of models can be retrieved. The workflow was validated with 575 models from the curated branch of BioModels. In this paper, we highlight interesting and frequent structural patterns. Furthermore, we provide exemplary patterns that incorporate terms from the Systems Biology Ontology. Our workflow can be applied to a custom set of models or to models already existing in our graph database MaSyMoS. The occurrences of frequent patterns may give insight into the encoding of central biological processes, evaluate postulated biological motifs or serve as a similarity measure for models that share common structures.Database URL: https://github.com/FabienneL/BioNet-Mining.
Collapse
Affiliation(s)
- Fabienne Lambusch
- Business Information Systems, University of Rostock, Rostock, Mecklenburg-Vorpommern, Germany
| | - Dagmar Waltemath
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Mecklenburg-Vorpommern, Germany
| | - Olaf Wolkenhauer
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Mecklenburg-Vorpommern, Germany
- Stellenbosch Institute for Advanced Study (STIAS), Wallenberg Research Centre, Stellenbosch University, Stellenbosch, South Africa
| | - Kurt Sandkuhl
- Business Information Systems, University of Rostock, Rostock, Mecklenburg-Vorpommern, Germany
- ITMO University, 49 Kronverksky Pr., St. Petersburg, Russia
| | - Christian Rosenke
- Visual Computing and Computer Graphics, University of Rostock, Rostock, Mecklenburg-Vorpommern, Germany
| | - Ron Henkel
- Scientific Databases and Visualization, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| |
Collapse
|
15
|
Henkel R, Hoehndorf R, Kacprowski T, Knüpfer C, Liebermeister W, Waltemath D. Notions of similarity for systems biology models. Brief Bioinform 2018; 19:77-88. [PMID: 27742665 PMCID: PMC5862271 DOI: 10.1093/bib/bbw090] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 08/28/2016] [Indexed: 01/23/2023] Open
Abstract
Systems biology models are rapidly increasing in complexity, size and numbers. When building large models, researchers rely on software tools for the retrieval, comparison, combination and merging of models, as well as for version control. These tools need to be able to quantify the differences and similarities between computational models. However, depending on the specific application, the notion of 'similarity' may greatly vary. A general notion of model similarity, applicable to various types of models, is still missing. Here we survey existing methods for the comparison of models, introduce quantitative measures for model similarity, and discuss potential applications of combined similarity measures. To frame model comparison as a general problem, we describe a theoretical approach to defining and computing similarities based on a combination of different model aspects. The six aspects that we define as potentially relevant for similarity are underlying encoding, references to biological entities, quantitative behaviour, qualitative behaviour, mathematical equations and parameters and network structure. We argue that future similarity measures will benefit from combining these model aspects in flexible, problem-specific ways to mimic users' intuition about model similarity, and to support complex model searches in databases.
Collapse
Affiliation(s)
| | | | | | | | | | - Dagmar Waltemath
- Department of Systems Biology and Bioinformatics, Institute of Computer Science, University of Rostock, Rostock, Germany
| |
Collapse
|
16
|
Kazantsev F, Akberdin I, Lashin S, Ree N, Timonov V, Ratushny A, Khlebodarova T, Likhoshvai V. MAMMOTh: A new database for curated mathematical models of biomolecular systems. J Bioinform Comput Biol 2017; 16:1740010. [PMID: 29172865 DOI: 10.1142/s0219720017400108] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
MOTIVATION Living systems have a complex hierarchical organization that can be viewed as a set of dynamically interacting subsystems. Thus, to simulate the internal nature and dynamics of the entire biological system, we should use the iterative way for a model reconstruction, which is a consistent composition and combination of its elementary subsystems. In accordance with this bottom-up approach, we have developed the MAthematical Models of bioMOlecular sysTems (MAMMOTh) tool that consists of the database containing manually curated MAMMOTh fitted to the experimental data and a software tool that provides their further integration. RESULTS The MAMMOTh database entries are organized as building blocks in a way that the model parts can be used in different combinations to describe systems with higher organizational level (metabolic pathways and/or transcription regulatory networks). The tool supports export of a single model or their combinations in SBML or Mathematica standards. The database currently contains 110 mathematical sub-models for Escherichia coli elementary subsystems (enzymatic reactions and gene expression regulatory processes) that can be combined in at least 5100 complex/sophisticated models concerning more complex biological processes as de novo nucleotide biosynthesis, aerobic/anaerobic respiration and nitrate/nitrite utilization in E. coli. All models are functionally interconnected and sufficiently complement public model resources. AVAILABILITY http://mammoth.biomodelsgroup.ru.
Collapse
Affiliation(s)
- Fedor Kazantsev
- * Institute of Cytology and Genetics SB RAS, Lavrentyev Avenue., 10, Novosibirsk 630090, Russia.,† Novosibirsk State University, Pirogova str. 2, Novosibirsk 630090, Russia
| | - Ilya Akberdin
- * Institute of Cytology and Genetics SB RAS, Lavrentyev Avenue., 10, Novosibirsk 630090, Russia.,† Novosibirsk State University, Pirogova str. 2, Novosibirsk 630090, Russia.,¶ Biology Department, San Diego State University, San Diego, CA 92182-4614, USA
| | - Sergey Lashin
- * Institute of Cytology and Genetics SB RAS, Lavrentyev Avenue., 10, Novosibirsk 630090, Russia.,† Novosibirsk State University, Pirogova str. 2, Novosibirsk 630090, Russia
| | - Natalia Ree
- * Institute of Cytology and Genetics SB RAS, Lavrentyev Avenue., 10, Novosibirsk 630090, Russia
| | - Vladimir Timonov
- † Novosibirsk State University, Pirogova str. 2, Novosibirsk 630090, Russia
| | - Alexander Ratushny
- ‡ Center for Infectious Disease Research (Formerly Seattle, Biomedical Research Institute), Seattle, WA 98109, USA.,§ Institute for Systems Biology, Seattle, WA 98109-5234, USA
| | - Tamara Khlebodarova
- * Institute of Cytology and Genetics SB RAS, Lavrentyev Avenue., 10, Novosibirsk 630090, Russia
| | - Vitaly Likhoshvai
- * Institute of Cytology and Genetics SB RAS, Lavrentyev Avenue., 10, Novosibirsk 630090, Russia.,† Novosibirsk State University, Pirogova str. 2, Novosibirsk 630090, Russia
| |
Collapse
|
17
|
biochem4j: Integrated and extensible biochemical knowledge through graph databases. PLoS One 2017; 12:e0179130. [PMID: 28708831 PMCID: PMC5510799 DOI: 10.1371/journal.pone.0179130] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 05/24/2017] [Indexed: 01/17/2023] Open
Abstract
Biologists and biochemists have at their disposal a number of excellent, publicly available data resources such as UniProt, KEGG, and NCBI Taxonomy, which catalogue biological entities. Despite the usefulness of these resources, they remain fundamentally unconnected. While links may appear between entries across these databases, users are typically only able to follow such links by manual browsing or through specialised workflows. Although many of the resources provide web-service interfaces for computational access, performing federated queries across databases remains a non-trivial but essential activity in interdisciplinary systems and synthetic biology programmes. What is needed are integrated repositories to catalogue both biological entities and-crucially-the relationships between them. Such a resource should be extensible, such that newly discovered relationships-for example, those between novel, synthetic enzymes and non-natural products-can be added over time. With the introduction of graph databases, the barrier to the rapid generation, extension and querying of such a resource has been lowered considerably. With a particular focus on metabolic engineering as an illustrative application domain, biochem4j, freely available at http://biochem4j.org, is introduced to provide an integrated, queryable database that warehouses chemical, reaction, enzyme and taxonomic data from a range of reliable resources. The biochem4j framework establishes a starting point for the flexible integration and exploitation of an ever-wider range of biological data sources, from public databases to laboratory-specific experimental datasets, for the benefit of systems biologists, biosystems engineers and the wider community of molecular biologists and biological chemists.
Collapse
|
18
|
Costa RL, Gadelha L, Ribeiro-Alves M, Porto F. GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis. PeerJ 2017; 5:e3509. [PMID: 28695067 PMCID: PMC5501156 DOI: 10.7717/peerj.3509] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2017] [Accepted: 06/06/2017] [Indexed: 12/28/2022] Open
Abstract
There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet.
Collapse
Affiliation(s)
- Raquel L. Costa
- DEXL Lab, National Laboratory for Scientific Computing (LNCC), Petrópolis, Rio de Janeiro, Brazil
- National Institute of Cancer (INCA), Rio de Janeiro, RJ, Brazil
| | - Luiz Gadelha
- DEXL Lab, National Laboratory for Scientific Computing (LNCC), Petrópolis, Rio de Janeiro, Brazil
| | - Marcelo Ribeiro-Alves
- Laboratory of Clinical Research in DST- AIDS, National Institute of Infectology Evandro Chagas, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | - Fábio Porto
- DEXL Lab, National Laboratory for Scientific Computing (LNCC), Petrópolis, Rio de Janeiro, Brazil
| |
Collapse
|
19
|
Yoon BH, Kim SK, Kim SY. Use of Graph Database for the Integration of Heterogeneous Biological Data. Genomics Inform 2017; 15:19-27. [PMID: 28416946 PMCID: PMC5389944 DOI: 10.5808/gi.2017.15.1.19] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Revised: 02/02/2017] [Accepted: 02/02/2017] [Indexed: 12/15/2022] Open
Abstract
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
Collapse
Affiliation(s)
- Byoung-Ha Yoon
- Personalized Genomic Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Korea.,Department of Functional Genomics, University of Science and Technology (UST), Daejeon 34113, Korea
| | - Seon-Kyu Kim
- Personalized Genomic Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Korea
| | - Seon-Young Kim
- Personalized Genomic Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Korea.,Department of Functional Genomics, University of Science and Technology (UST), Daejeon 34113, Korea
| |
Collapse
|
20
|
Touré V, Mazein A, Waltemath D, Balaur I, Saqi M, Henkel R, Pellet J, Auffray C. STON: exploring biological pathways using the SBGN standard and graph databases. BMC Bioinformatics 2016; 17:494. [PMID: 27919219 PMCID: PMC5139139 DOI: 10.1186/s12859-016-1394-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 11/29/2016] [Indexed: 01/16/2023] Open
Abstract
Background When modeling in Systems Biology and Systems Medicine, the data is often extensive, complex and heterogeneous. Graphs are a natural way of representing biological networks. Graph databases enable efficient storage and processing of the encoded biological relationships. They furthermore support queries on the structure of biological networks. Results We present the Java-based framework STON (SBGN TO Neo4j). STON imports and translates metabolic, signalling and gene regulatory pathways represented in the Systems Biology Graphical Notation into a graph-oriented format compatible with the Neo4j graph database. Conclusion STON exploits the power of graph databases to store and query complex biological pathways. This advances the possibility of: i) identifying subnetworks in a given pathway; ii) linking networks across different levels of granularity to address difficulties related to incomplete knowledge representation at single level; and iii) identifying common patterns between pathways in the database. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1394-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Vasundra Touré
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, 18051, Germany. .,European Institute for Systems Biology and Medicine (EISBM), CIRI UMR 5308, CNRS-ENS-UCBL-INSERM, Université de Lyon, 50 Avenue Tony Garnier, Lyon, 69007, France.
| | - Alexander Mazein
- European Institute for Systems Biology and Medicine (EISBM), CIRI UMR 5308, CNRS-ENS-UCBL-INSERM, Université de Lyon, 50 Avenue Tony Garnier, Lyon, 69007, France
| | - Dagmar Waltemath
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, 18051, Germany
| | - Irina Balaur
- European Institute for Systems Biology and Medicine (EISBM), CIRI UMR 5308, CNRS-ENS-UCBL-INSERM, Université de Lyon, 50 Avenue Tony Garnier, Lyon, 69007, France
| | - Mansoor Saqi
- European Institute for Systems Biology and Medicine (EISBM), CIRI UMR 5308, CNRS-ENS-UCBL-INSERM, Université de Lyon, 50 Avenue Tony Garnier, Lyon, 69007, France
| | - Ron Henkel
- Scientific Databases and Visualization, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.,Department of Business Information Systems, University of Rostock, Rostock, 18051, Germany
| | - Johann Pellet
- European Institute for Systems Biology and Medicine (EISBM), CIRI UMR 5308, CNRS-ENS-UCBL-INSERM, Université de Lyon, 50 Avenue Tony Garnier, Lyon, 69007, France
| | - Charles Auffray
- European Institute for Systems Biology and Medicine (EISBM), CIRI UMR 5308, CNRS-ENS-UCBL-INSERM, Université de Lyon, 50 Avenue Tony Garnier, Lyon, 69007, France
| |
Collapse
|
21
|
Balaur I, Saqi M, Barat A, Lysenko A, Mazein A, Rawlings CJ, Ruskin HJ, Auffray C. EpiGeNet: A Graph Database of Interdependencies Between Genetic and Epigenetic Events in Colorectal Cancer. J Comput Biol 2016; 24:969-980. [PMID: 27627442 DOI: 10.1089/cmb.2016.0095] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
The development of colorectal cancer (CRC)-the third most common cancer type-has been associated with deregulations of cellular mechanisms stimulated by both genetic and epigenetic events. StatEpigen is a manually curated and annotated database, containing information on interdependencies between genetic and epigenetic signals, and specialized currently for CRC research. Although StatEpigen provides a well-developed graphical user interface for information retrieval, advanced queries involving associations between multiple concepts can benefit from more detailed graph representation of the integrated data. This can be achieved by using a graph database (NoSQL) approach. Data were extracted from StatEpigen and imported to our newly developed EpiGeNet, a graph database for storage and querying of conditional relationships between molecular (genetic and epigenetic) events observed at different stages of colorectal oncogenesis. We illustrate the enhanced capability of EpiGeNet for exploration of different queries related to colorectal tumor progression; specifically, we demonstrate the query process for (i) stage-specific molecular events, (ii) most frequently observed genetic and epigenetic interdependencies in colon adenoma, and (iii) paths connecting key genes reported in CRC and associated events. The EpiGeNet framework offers improved capability for management and visualization of data on molecular events specific to CRC initiation and progression.
Collapse
Affiliation(s)
- Irina Balaur
- 1 European Institute for Systems Biology and Medicine (EISBM) , CIRI UMR CNRS 5308, CNRS-ENS-UCBL-INSERM, Université Claude Bernard, Lyon, France
| | - Mansoor Saqi
- 1 European Institute for Systems Biology and Medicine (EISBM) , CIRI UMR CNRS 5308, CNRS-ENS-UCBL-INSERM, Université Claude Bernard, Lyon, France
| | - Ana Barat
- 2 Department of Physiology and Medical Physics, Centre for Systems Medicine, Royal College of Surgeons in Ireland , Dublin, Ireland
| | - Artem Lysenko
- 3 Rothamsted Research , Hertfordshire, United Kingdom
| | - Alexander Mazein
- 1 European Institute for Systems Biology and Medicine (EISBM) , CIRI UMR CNRS 5308, CNRS-ENS-UCBL-INSERM, Université Claude Bernard, Lyon, France
| | | | - Heather J Ruskin
- 4 Centre for Scientific Computing and Complex Systems Modelling, School of Computing, Dublin City University , Dublin, Ireland
| | - Charles Auffray
- 1 European Institute for Systems Biology and Medicine (EISBM) , CIRI UMR CNRS 5308, CNRS-ENS-UCBL-INSERM, Université Claude Bernard, Lyon, France
| |
Collapse
|
22
|
Waltemath D, Wolkenhauer O. How Modeling Standards, Software, and Initiatives Support Reproducibility in Systems Biology and Systems Medicine. IEEE Trans Biomed Eng 2016; 63:1999-2006. [PMID: 27295645 DOI: 10.1109/tbme.2016.2555481] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
OBJECTIVE Only reproducible results are of significance to science. The lack of suitable standards and appropriate support of standards in software tools has led to numerous publications with irreproducible results. Our objectives are to identify the key challenges of reproducible research and to highlight existing solutions. RESULTS In this paper, we summarize problems concerning reproducibility in systems biology and systems medicine. We focus on initiatives, standards, and software tools that aim to improve the reproducibility of simulation studies. CONCLUSIONS The long-term success of systems biology and systems medicine depends on trustworthy models and simulations. This requires openness to ensure reusability and transparency to enable reproducibility of results in these fields.
Collapse
|
23
|
Alm R, Waltemath D, Wolfien M, Wolkenhauer O, Henkel R. Annotation-based feature extraction from sets of SBML models. J Biomed Semantics 2015; 6:20. [PMID: 25904997 PMCID: PMC4405863 DOI: 10.1186/s13326-015-0014-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2014] [Accepted: 03/20/2015] [Indexed: 01/09/2023] Open
Abstract
Background Model repositories such as BioModels Database provide computational models of biological systems for the scientific community. These models contain rich semantic annotations that link model entities to concepts in well-established bio-ontologies such as Gene Ontology. Consequently, thematically similar models are likely to share similar annotations. Based on this assumption, we argue that semantic annotations are a suitable tool to characterize sets of models. These characteristics improve model classification, allow to identify additional features for model retrieval tasks, and enable the comparison of sets of models. Results In this paper we discuss four methods for annotation-based feature extraction from model sets. We tested all methods on sets of models in SBML format which were composed from BioModels Database. To characterize each of these sets, we analyzed and extracted concepts from three frequently used ontologies, namely Gene Ontology, ChEBI and SBO. We find that three out of the methods are suitable to determine characteristic features for arbitrary sets of models: The selected features vary depending on the underlying model set, and they are also specific to the chosen model set. We show that the identified features map on concepts that are higher up in the hierarchy of the ontologies than the concepts used for model annotations. Our analysis also reveals that the information content of concepts in ontologies and their usage for model annotation do not correlate. Conclusions Annotation-based feature extraction enables the comparison of model sets, as opposed to existing methods for model-to-keyword comparison, or model-to-model comparison. Electronic supplementary material The online version of this article (doi:10.1186/s13326-015-0014-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rebekka Alm
- Department of Multimedia Communication, University of Rostock, Joachim-Jungius-Str. 11, Rostock, 18051 Germany ; Fraunhofer Institute for Computer Graphics Research IGD, Joachim-Jungius-Str. 11, Rostock, 18059 Germany
| | - Dagmar Waltemath
- Department of Systems Biology and Bioinformatics, University of Rostock, Ulmenstr. 69, Rostock, 18051 Germany
| | - Markus Wolfien
- Department of Systems Biology and Bioinformatics, University of Rostock, Ulmenstr. 69, Rostock, 18051 Germany
| | - Olaf Wolkenhauer
- Department of Systems Biology and Bioinformatics, University of Rostock, Ulmenstr. 69, Rostock, 18051 Germany ; Stellenbosch Institute for Advanced Study (STIAS), Wallenberg Research Centre at Stellenbosch University, Stellenbosch, South Africa
| | - Ron Henkel
- Department of Mobile Multimedia Information Systems, University of Rostock, Albert-Einstein-Str. 22, Rostock, 18051 Germany
| |
Collapse
|
24
|
Hucka M, Nickerson DP, Bader GD, Bergmann FT, Cooper J, Demir E, Garny A, Golebiewski M, Myers CJ, Schreiber F, Waltemath D, Le Novère N. Promoting Coordinated Development of Community-Based Information Standards for Modeling in Biology: The COMBINE Initiative. Front Bioeng Biotechnol 2015; 3:19. [PMID: 25759811 PMCID: PMC4338824 DOI: 10.3389/fbioe.2015.00019] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Accepted: 02/08/2015] [Indexed: 12/19/2022] Open
Abstract
The Computational Modeling in Biology Network (COMBINE) is a consortium of groups involved in the development of open community standards and formats used in computational modeling in biology. COMBINE's aim is to act as a coordinator, facilitator, and resource for different standardization efforts whose domains of use cover related areas of the computational biology space. In this perspective article, we summarize COMBINE, its general organization, and the community standards and other efforts involved in it. Our goals are to help guide readers toward standards that may be suitable for their research activities, as well as to direct interested readers to relevant communities where they can best expect to receive assistance in how to develop interoperable computational models.
Collapse
Affiliation(s)
- Michael Hucka
- Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | - David P. Nickerson
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Gary D. Bader
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
| | - Frank T. Bergmann
- Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
- BioQuant/Centre for Organismal Studies (COS), University of Heidelberg, Heidelberg, Germany
| | - Jonathan Cooper
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Emek Demir
- Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Alan Garny
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Martin Golebiewski
- Scientific Databases and Visualization, Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany
| | - Chris J. Myers
- Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, UT, USA
| | - Falk Schreiber
- Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
- Institute of Computer Science, University Halle-Wittenberg, Halle, Germany
| | - Dagmar Waltemath
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany
| | - Nicolas Le Novère
- Babraham Institute, Cambridge, UK
- European Molecular Biology Laboratory-European Bioinformatics Institute, Cambridge, UK
| |
Collapse
|