1
|
Korir PK, Iudin A, Somasundharam S, Weyand S, Salih O, Hartley M, Sarkans U, Patwardhan A, Kleywegt GJ. Ten recommendations for organising bioimaging data for archival. F1000Res 2024; 12:ELIXIR-1391. [PMID: 38486614 PMCID: PMC10938051 DOI: 10.12688/f1000research.129720.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/12/2024] [Indexed: 03/17/2024] Open
Abstract
Organised data is easy to use but the rapid developments in the field of bioimaging, with improvements in instrumentation, detectors, software and experimental techniques, have resulted in an explosion of the volumes of data being generated, making well-organised data an elusive goal. This guide offers a handful of recommendations for bioimage depositors, analysts and microscope and software developers, whose implementation would contribute towards better organised data in preparation for archival. Based on our experience archiving large image datasets in EMPIAR, the BioImage Archive and BioStudies, we propose a number of strategies that we believe would improve the usability (clarity, orderliness, learnability, navigability, self-documentation, coherence and consistency of identifiers, accessibility, succinctness) of future data depositions more useful to the bioimaging community (data authors and analysts, researchers, clinicians, funders, collaborators, industry partners, hardware/software producers, journals, archive developers as well as interested but non-specialist users of bioimaging data). The recommendations that may also find use in other data-intensive disciplines. To facilitate the process of analysing data organisation, we present bandbox, a Python package that provides users with an assessment of their data by flagging potential issues, such as redundant directories or invalid characters in file or folder names, that should be addressed before archival. We offer these recommendations as a starting point and hope to engender more substantial conversations across and between the various data-rich communities.
Collapse
Affiliation(s)
- Paul K. Korir
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Andrii Iudin
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Simone Weyand
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Osman Salih
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Matthew Hartley
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Ugis Sarkans
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Ardan Patwardhan
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | | |
Collapse
|
2
|
Nault R, Cave MC, Ludewig G, Moseley HN, Pennell KG, Zacharewski T. A Case for Accelerating Standards to Achieve the FAIR Principles of Environmental Health Research Experimental Data. ENVIRONMENTAL HEALTH PERSPECTIVES 2023; 131:65001. [PMID: 37352010 PMCID: PMC10289218 DOI: 10.1289/ehp11484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 06/05/2023] [Accepted: 06/07/2023] [Indexed: 06/25/2023]
Abstract
BACKGROUND Funding agencies, publishers, and other stakeholders are pushing environmental health science investigators to improve data sharing; to promote the findable, accessible, interoperable, and reusable (FAIR) principles; and to increase the rigor and reproducibility of the data collected. Accomplishing these goals will require significant cultural shifts surrounding data management and strategies to develop robust and reliable resources that bridge the technical challenges and gaps in expertise. OBJECTIVE In this commentary, we examine the current state of managing data and metadata-referred to collectively as (meta)data-in the experimental environmental health sciences. We introduce new tools and resources based on in vivo experiments to serve as examples for the broader field. METHODS We discuss previous and ongoing efforts to improve (meta)data collection and curation. These include global efforts by the Functional Genomics Data Society to develop metadata collection tools such as the Investigation, Study, Assay (ISA) framework, and the Center for Expanded Data Annotation and Retrieval. We also conduct a case study of in vivo data deposited in the Gene Expression Omnibus that demonstrates the current state of in vivo environmental health data and highlights the value of using the tools we propose to support data deposition. DISCUSSION The environmental health science community has played a key role in efforts to achieve the goals of the FAIR guiding principles and is well positioned to advance them further. We present a proposed framework to further promote these objectives and minimize the obstacles between data producers and data scientists to maximize the return on research investments. https://doi.org/10.1289/EHP11484.
Collapse
Affiliation(s)
- Rance Nault
- Biochemistry & Molecular Biology Department, Institute for Integrative Toxicology, Michigan State University, East Lansing, Michigan, USA
| | - Matthew C. Cave
- Division of Gastroenterology, Hepatology, and Nutrition, University of Louisville, Louisville, Kentucky, USA
| | - Gabriele Ludewig
- Department of Occupational and Environmental Health, University of Iowa, Iowa City, Iowa, USA
| | - Hunter N.B. Moseley
- Molecular and Cellular Biochemistry Department, University of Kentucky, Lexington, Kentucky, USA
| | - Kelly G. Pennell
- Department of Civil Engineering, University of Kentucky, Lexington, Kentucky, USA
| | - Tim Zacharewski
- Biochemistry & Molecular Biology Department, Institute for Integrative Toxicology, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
3
|
Hoffmann N, Mayer G, Has C, Kopczynski D, Al Machot F, Schwudke D, Ahrends R, Marcus K, Eisenacher M, Turewicz M. A Current Encyclopedia of Bioinformatics Tools, Data Formats and Resources for Mass Spectrometry Lipidomics. Metabolites 2022; 12:metabo12070584. [PMID: 35888710 PMCID: PMC9319858 DOI: 10.3390/metabo12070584] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 06/17/2022] [Accepted: 06/19/2022] [Indexed: 12/13/2022] Open
Abstract
Mass spectrometry is a widely used technology to identify and quantify biomolecules such as lipids, metabolites and proteins necessary for biomedical research. In this study, we catalogued freely available software tools, libraries, databases, repositories and resources that support lipidomics data analysis and determined the scope of currently used analytical technologies. Because of the tremendous importance of data interoperability, we assessed the support of standardized data formats in mass spectrometric (MS)-based lipidomics workflows. We included tools in our comparison that support targeted as well as untargeted analysis using direct infusion/shotgun (DI-MS), liquid chromatography−mass spectrometry, ion mobility or MS imaging approaches on MS1 and potentially higher MS levels. As a result, we determined that the Human Proteome Organization-Proteomics Standards Initiative standard data formats, mzML and mzTab-M, are already supported by a substantial number of recent software tools. We further discuss how mzTab-M can serve as a bridge between data acquisition and lipid bioinformatics tools for interpretation, capturing their output and transmitting rich annotated data for downstream processing. However, we identified several challenges of currently available tools and standards. Potential areas for improvement were: adaptation of common nomenclature and standardized reporting to enable high throughput lipidomics and improve its data handling. Finally, we suggest specific areas where tools and repositories need to improve to become FAIRer.
Collapse
Affiliation(s)
- Nils Hoffmann
- Forschungszentrum Jülich GmbH, Institute for Bio- and Geosciences (IBG-5), 52425 Jülich, Germany
- Correspondence: (N.H.); (M.T.); Tel.: +49-(0)521-106-86780 (N.H.)
| | - Gerhard Mayer
- Institute of Medical Systems Biology, Ulm University, 89081 Ulm, Germany;
| | - Canan Has
- Biological Mass Spectrometry, Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany;
- University Hospital Carl Gustav Carus, 01307 Dresden, Germany
- CENTOGENE GmbH, 18055 Rostock, Germany
| | - Dominik Kopczynski
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (D.K.); (R.A.)
| | - Fadi Al Machot
- Faculty of Science and Technology, Norwegian University for Life Science (NMBU), 1433 Ås, Norway;
| | - Dominik Schwudke
- Bioanalytical Chemistry, Forschungszentrum Borstel, Leibniz Lung Center, 23845 Borstel, Germany;
- Airway Research Center North, German Center for Lung Research (DZL), 23845 Borstel, Germany
- German Center for Infection Research (DZIF), TTU Tuberculosis, 23845 Borstel, Germany
| | - Robert Ahrends
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (D.K.); (R.A.)
| | - Katrin Marcus
- Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Ruhr University Bochum, 44801 Bochum, Germany; (K.M.); (M.E.)
| | - Martin Eisenacher
- Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Ruhr University Bochum, 44801 Bochum, Germany; (K.M.); (M.E.)
- Faculty of Medicine, Medizinisches Proteom-Center, Ruhr University Bochum, 44801 Bochum, Germany
| | - Michael Turewicz
- Institute for Clinical Biochemistry and Pathobiochemistry, German Diabetes Center (DDZ), Leibniz Center for Diabetes Research at Heinrich-Heine-University Düsseldorf, 40225 Düsseldorf, Germany
- German Center for Diabetes Research (DZD), Partner Düsseldorf, 85764 Neuherberg, Germany
- Correspondence: (N.H.); (M.T.); Tel.: +49-(0)521-106-86780 (N.H.)
| |
Collapse
|
4
|
Hammer M, Huisman M, Rigano A, Boehm U, Chambers JJ, Gaudreault N, North AJ, Pimentel JA, Sudar D, Bajcsy P, Brown CM, Corbett AD, Faklaris O, Lacoste J, Laude A, Nelson G, Nitschke R, Farzam F, Smith CS, Grunwald D, Strambio-De-Castillia C. Towards community-driven metadata standards for light microscopy: tiered specifications extending the OME model. Nat Methods 2021; 18:1427-1440. [PMID: 34862501 PMCID: PMC9271325 DOI: 10.1038/s41592-021-01327-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Rigorous record-keeping and quality control are required to ensure the quality, reproducibility and value of imaging data. The 4DN Initiative and BINA here propose light Microscopy Metadata specifications that extend the OME data model, scale with experimental intent and complexity, and make it possible for scientists to create comprehensive records of imaging experiments.
Collapse
Affiliation(s)
- Mathias Hammer
- RNA Therapeutics Institute, UMass Chan Medical School, Worcester, MA, USA
- Department of Biology, Technical University of Darmstadt, Darmstadt, Germany
| | | | - Alessandro Rigano
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA, USA
| | - Ulrike Boehm
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA
| | - James J Chambers
- Institute for Applied Life Sciences, University of Massachusetts, Amherst, MA, USA
| | | | | | - Jaime A Pimentel
- Laboratorio Nacional de Microscopía Avanzada, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Damir Sudar
- Quantitative Imaging Systems LLC, Portland, OR, USA
| | - Peter Bajcsy
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Claire M Brown
- Advanced BioImaging Facility (ABIF), McGill University, Montreal, Quebec, Canada
| | | | - Orestis Faklaris
- MRI, BCM, University of Montpellier, CNRS, INSERM, Montpellier, France
| | | | - Alex Laude
- Bioimaging Unit, Newcastle University, Newcastle upon Tyne, UK
| | - Glyn Nelson
- Bioimaging Unit, Newcastle University, Newcastle upon Tyne, UK
| | - Roland Nitschke
- Life Imaging Center and Signalling Research Centres CIBSS and BIOSS, University of Freiburg, Freiburg, Germany
| | - Farzin Farzam
- RNA Therapeutics Institute, UMass Chan Medical School, Worcester, MA, USA
| | - Carlas S Smith
- Delft Center for Systems and Control and Department of Imaging Physics, Delft University of Technology, Delft, the Netherlands
| | - David Grunwald
- RNA Therapeutics Institute, UMass Chan Medical School, Worcester, MA, USA
| | | |
Collapse
|
5
|
Rigano A, Ehmsen S, Öztürk SU, Ryan J, Balashov A, Hammer M, Kirli K, Boehm U, Brown CM, Bellve K, Chambers JJ, Cosolo A, Coleman RA, Faklaris O, Fogarty KE, Guilbert T, Hamacher AB, Itano MS, Keeley DP, Kunis S, Lacoste J, Laude A, Ma WY, Marcello M, Montero-Llopis P, Nelson G, Nitschke R, Pimentel JA, Weidtkamp-Peters S, Park PJ, Alver BH, Grunwald D, Strambio-De-Castillia C. Micro-Meta App: an interactive tool for collecting microscopy metadata based on community specifications. Nat Methods 2021; 18:1489-1495. [PMID: 34862503 PMCID: PMC8648560 DOI: 10.1038/s41592-021-01315-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 09/30/2021] [Indexed: 12/31/2022]
Abstract
For quality, interpretation, reproducibility and sharing value, microscopy images should be accompanied by detailed descriptions of the conditions that were used to produce them. Micro-Meta App is an intuitive, highly interoperable, open-source software tool that was developed in the context of the 4D Nucleome (4DN) consortium and is designed to facilitate the extraction and collection of relevant microscopy metadata as specified by the recent 4DN-BINA-OME tiered-system of Microscopy Metadata specifications. In addition to substantially lowering the burden of quality assurance, the visual nature of Micro-Meta App makes it particularly suited for training purposes.
Collapse
Affiliation(s)
- Alessandro Rigano
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA USA
| | - Shannon Ehmsen
- grid.38142.3c000000041936754XDepartment of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Serkan Utku Öztürk
- grid.38142.3c000000041936754XDepartment of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Joel Ryan
- grid.14709.3b0000 0004 1936 8649Advanced BioImaging Facility (ABIF), McGill University, Montreal, Quebec Canada
| | - Alexander Balashov
- grid.38142.3c000000041936754XDepartment of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Mathias Hammer
- RNA Therapeutics Institute, UMass Chan Medical School, Worcester, MA USA
| | - Koray Kirli
- grid.38142.3c000000041936754XDepartment of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Ulrike Boehm
- grid.443970.dJanelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA USA
| | - Claire M. Brown
- grid.14709.3b0000 0004 1936 8649Advanced BioImaging Facility (ABIF), McGill University, Montreal, Quebec Canada
| | - Karl Bellve
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA USA
| | - James J. Chambers
- grid.266683.f0000 0001 2166 5835Institute for Applied Life Sciences, University of Massachusetts, Amherst, MA USA
| | - Andrea Cosolo
- grid.38142.3c000000041936754XDepartment of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Robert A. Coleman
- grid.251993.50000000121791997Department of Anatomy and Structural Biology, Gruss-Lipper Biophotonics Center, Albert Einstein College of Medicine, Bronx, NY USA
| | - Orestis Faklaris
- grid.121334.60000 0001 2097 0141BioCampus Montpellier (BCM), University of Montpellier, CNRS, INSERM, Montpellier, France
| | - Kevin E. Fogarty
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA USA
| | - Thomas Guilbert
- grid.508487.60000 0004 7885 7602Institut Cochin, Inserm U1016-CNRS UMR8104-Université de Paris, Paris, France
| | - Anna B. Hamacher
- grid.411327.20000 0001 2176 9917Center for Advanced Imaging, Heinrich-Heine University Duesseldorf, Düsseldorf, Germany
| | - Michelle S. Itano
- grid.10698.360000000122483208UNC Neuroscience Microscopy Core Facility, Department of Cell Biology and Physiology, Carolina Institute for Developmental Disabilities, and UNC Neuroscience Center, University of North Carolina, Chapel Hill, NC USA
| | - Daniel P. Keeley
- grid.10698.360000000122483208UNC Neuroscience Microscopy Core Facility, Department of Cell Biology and Physiology, Carolina Institute for Developmental Disabilities, and UNC Neuroscience Center, University of North Carolina, Chapel Hill, NC USA
| | - Susanne Kunis
- grid.10854.380000 0001 0672 4366Department of Biology/Chemistry and Center for Cellular Nanoanalytics, University Osnabrück, Osnabrück, Germany
| | | | - Alex Laude
- grid.1006.70000 0001 0462 7212Bioimaging Unit, Newcastle University, Newcastle upon Tyne, UK
| | - Willa Y. Ma
- grid.10698.360000000122483208UNC Neuroscience Microscopy Core Facility, Carolina Institute for Developmental Disabilities, and UNC Neuroscience Center, University of North Carolina, Chapel Hill, NC USA
| | - Marco Marcello
- grid.10025.360000 0004 1936 8470Center for Cell Imaging, University of Liverpool, Liverpool, UK
| | - Paula Montero-Llopis
- grid.38142.3c000000041936754XMicroscopy Resources of the North Quad, University of Harvard Medical School, Boston, MA USA
| | - Glyn Nelson
- grid.1006.70000 0001 0462 7212Bioimaging Unit, Newcastle University, Newcastle upon Tyne, UK
| | - Roland Nitschke
- grid.5963.9Life Imaging Center and Signalling Research Centres CIBSS and BIOSS, University of Freiburg, Freiburg, Germany
| | - Jaime A. Pimentel
- grid.9486.30000 0001 2159 0001Laboratorio Nacional de Microscopía Avanzada, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Stefanie Weidtkamp-Peters
- grid.411327.20000 0001 2176 9917Center for Advanced Imaging, Heinrich-Heine University Duesseldorf, Düsseldorf, Germany
| | - Peter J. Park
- grid.38142.3c000000041936754XDepartment of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Burak H. Alver
- grid.38142.3c000000041936754XDepartment of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - David Grunwald
- RNA Therapeutics Institute, UMass Chan Medical School, Worcester, MA USA
| | | |
Collapse
|
6
|
|
7
|
Schuler R, Czajkowski K, D'Arcy M, Tangmunarunkit H, Kesselman C. Towards Co-Evolution of Data-Centric Ecosystems. SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT : INTERNATIONAL CONFERENCE, SSDBM ... : PROCEEDINGS. INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT 2020; 2020:4. [PMID: 37614739 PMCID: PMC10445529 DOI: 10.1145/3400903.3400908] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]
Abstract
Database evolution is a notoriously difficult task, and it is exacerbated by the necessity to evolve database-dependent applications. As science becomes increasingly dependent on sophisticated data management, the need to evolve an array of database-driven systems will only intensify. In this paper, we present an architecture for data-centric ecosystems that allows the components to seamlessly co-evolve by centralizing the models and mappings at the data service and pushing model-adaptive interactions to the database clients. Boundary objects fill the gap where applications are unable to adapt and need a stable interface to interact with the components of the ecosystem. Finally, evolution of the ecosystem is enabled via integrated schema modification and model management operations. We present use cases from actual experiences that demonstrate the utility of our approach.
Collapse
Affiliation(s)
- Robert Schuler
- USC Information Sciences Institute, Marina del Rey, California
| | - Karl Czajkowski
- USC Information Sciences Institute, Marina del Rey, California
| | - Mike D'Arcy
- USC Information Sciences Institute, Marina del Rey, California
| | | | - Carl Kesselman
- USC Information Sciences Institute, Marina del Rey, California
| |
Collapse
|
8
|
Bernasconi A, Canakoglu A, Masseroli M, Ceri S. The road towards data integration in human genomics: players, steps and interactions. Brief Bioinform 2020; 22:30-44. [PMID: 32496509 DOI: 10.1093/bib/bbaa080] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Revised: 03/09/2020] [Accepted: 04/18/2020] [Indexed: 12/15/2022] Open
Abstract
Thousands of new experimental datasets are becoming available every day; in many cases, they are produced within the scope of large cooperative efforts, involving a variety of laboratories spread all over the world, and typically open for public use. Although the potential collective amount of available information is huge, the effective combination of such public sources is hindered by data heterogeneity, as the datasets exhibit a wide variety of notations and formats, concerning both experimental values and metadata. Thus, data integration is becoming a fundamental activity, to be performed prior to data analysis and biological knowledge discovery, consisting of subsequent steps of data extraction, normalization, matching and enrichment; once applied to heterogeneous data sources, it builds multiple perspectives over the genome, leading to the identification of meaningful relationships that could not be perceived by using incompatible data formats. In this paper, we first describe a technological pipeline from data production to data integration; we then propose a taxonomy of genomic data players (based on the distinction between contributors, repository hosts, consortia, integrators and consumers) and apply the taxonomy to describe about 30 important players in genomic data management. We specifically focus on the integrator players and analyse the issues in solving the genomic data integration challenges, as well as evaluate the computational environments that they provide to follow up data integration by means of visualization and analysis tools.
Collapse
|
9
|
Karakitsou E, Foguet C, de Atauri P, Kultima K, Khoonsari PE, Martins dos Santos VA, Saccenti E, Rosato A, Cascante M. Metabolomics in systems medicine: an overview of methods and applications. ACTA ACUST UNITED AC 2019. [DOI: 10.1016/j.coisb.2019.03.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
10
|
Stanford NJ, Scharm M, Dobson PD, Golebiewski M, Hucka M, Kothamachu VB, Nickerson D, Owen S, Pahle J, Wittig U, Waltemath D, Goble C, Mendes P, Snoep J. Data Management in Computational Systems Biology: Exploring Standards, Tools, Databases, and Packaging Best Practices. Methods Mol Biol 2019; 2049:285-314. [PMID: 31602618 DOI: 10.1007/978-1-4939-9736-7_17] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Computational systems biology involves integrating heterogeneous datasets in order to generate models. These models can assist with understanding and prediction of biological phenomena. Generating datasets and integrating them into models involves a wide range of scientific expertise. As a result these datasets are often collected by one set of researchers, and exchanged with others researchers for constructing the models. For this process to run smoothly the data and models must be FAIR-findable, accessible, interoperable, and reusable. In order for data and models to be FAIR they must be structured in consistent and predictable ways, and described sufficiently for other researchers to understand them. Furthermore, these data and models must be shared with other researchers, with appropriately controlled sharing permissions, before and after publication. In this chapter we explore the different data and model standards that assist with structuring, describing, and sharing. We also highlight the popular standards and sharing databases within computational systems biology.
Collapse
Affiliation(s)
| | - Martin Scharm
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany
| | - Paul D Dobson
- School of Computer Science, University of Manchester, Manchester, UK
| | - Martin Golebiewski
- Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany
| | - Michael Hucka
- Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | | | - David Nickerson
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Stuart Owen
- School of Computer Science, University of Manchester, Manchester, UK
| | - Jürgen Pahle
- BIOMS/BioQuant, Heidelberg University, Heidelberg, Germany.
| | - Ulrike Wittig
- Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany
| | - Dagmar Waltemath
- Medical Informatics, University Medicine Greifswald, Greifswald, Germany
| | - Carole Goble
- School of Computer Science, University of Manchester, Manchester, UK
| | - Pedro Mendes
- Centre for Quantitative Medicine, University of Connecticut, Farmington, CT, USA
| | - Jacky Snoep
- School of Computer Science, University of Manchester, Manchester, UK.,Biochemistry, Stellenbosch University, Stellenbosch, South Africa
| |
Collapse
|
11
|
Karcher S, Willighagen EL, Rumble J, Ehrhart F, Evelo CT, Fritts M, Gaheen S, Harper SL, Hoover MD, Jeliazkova N, Lewinski N, Marchese Robinson RL, Mills KC, Mustad AP, Thomas DG, Tsiliki G, Ogilvie Hendren C. Integration among databases and data sets to support productive nanotechnology: Challenges and recommendations. NANOIMPACT 2018; 9:85-101. [PMID: 30246165 PMCID: PMC6145474 DOI: 10.1016/j.impact.2017.11.002] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Many groups within the broad field of nanoinformatics are already developing data repositories and analytical tools driven by their individual organizational goals. Integrating these data resources across disciplines and with non-nanotechnology resources can support multiple objectives by enabling the reuse of the same information. Integration can also serve as the impetus for novel scientific discoveries by providing the framework to support deeper data analyses. This article discusses current data integration practices in nanoinformatics and in comparable mature fields, and nanotechnology-specific challenges impacting data integration. Based on results from a nanoinformatics-community-wide survey, recommendations for achieving integration of existing operational nanotechnology resources are presented. Nanotechnology-specific data integration challenges, if effectively resolved, can foster the application and validation of nanotechnology within and across disciplines. This paper is one of a series of articles by the Nanomaterial Data Curation Initiative that address data issues such as data curation workflows, data completeness and quality, curator responsibilities, and metadata.
Collapse
Affiliation(s)
- Sandra Karcher
- Civil and Environmental Engineering, Carnegie Mellon University, Pittsburgh, PA 15213-3890, USA
- Center for the Environmental Implications of Nano Technology (CEINT) Duke University, Box 90287, 121 Hudson Hall, Durham, NC 27708-0287, USA
| | - Egon L. Willighagen
- Department of Bioinformatics - BiGCaT, Maastricht University, P.O. Box 616, UNS50, Box 19, NL-6200, MD, Maastricht, The Netherlands
| | - John Rumble
- R&R Data Services, 11 Montgomery Avenue, Gaithersburg, MD 20877, USA
- CODATA-VAMAS Working Group on Nanomaterials, Paris, France
| | - Friederike Ehrhart
- Department of Bioinformatics - BiGCaT, Maastricht University, P.O. Box 616, UNS50, Box 19, NL-6200, MD, Maastricht, The Netherlands
| | - Chris T. Evelo
- Department of Bioinformatics - BiGCaT, Maastricht University, P.O. Box 616, UNS50, Box 19, NL-6200, MD, Maastricht, The Netherlands
| | - Martin Fritts
- Clinical Research Directorate/Clinical Monitoring Research Program, Leidos Biomedical Research, Inc., NCI Campus at Frederick, Frederick, MD 21702, USA
| | - Sharon Gaheen
- Clinical Research Directorate/Clinical Monitoring Research Program, Leidos Biomedical Research, Inc., NCI Campus at Frederick, Frederick, MD 21702, USA
| | - Stacey L. Harper
- Environmental and Molecular Toxicology and School of Chemical, Biological and Environmental Engineering, Oregon State University, Corvallis, OR 97331, USA
| | - Mark D. Hoover
- National Institute for Occupational Safety and Health, 1095 Willowdale Road, Morgantown, WV 26505-2888, USA
| | | | - Nastassja Lewinski
- Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Richard L. Marchese Robinson
- School of Chemical and Process Engineering, University of Leeds, Leeds LS2 9JT, United Kingdom
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, United Kingdom
| | - Karmann C. Mills
- RTI International, 3040 Cornwallis Rd., Research Triangle Park, NC 27709, USA
| | - Axel P. Mustad
- Nordic Quantum Computing Group AS, Oslo Science Park, P.O. Box 1892, Vika, N-0124 Oslo, Norway
| | - Dennis G. Thomas
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Georgia Tsiliki
- School of Chemical Engineering, National Technical University of Athens, 9 Heroon Polytechneiou Street, Zografou, 15780, Athens, Greece
- Institute for the management of Information Systems, ATHENA Research and Innovation Centre, Artemidos 6 & Epidavrou, Marousi, 15125 Athens, Greece
| | - Christine Ogilvie Hendren
- Center for the Environmental Implications of Nano Technology (CEINT) Duke University, Box 90287, 121 Hudson Hall, Durham, NC 27708-0287, USA
| |
Collapse
|
12
|
Larralde M, Lawson TN, Weber RJM, Moreno P, Haug K, Rocca-Serra P, Viant MR, Steinbeck C, Salek RM. mzML2ISA & nmrML2ISA: generating enriched ISA-Tab metadata files from metabolomics XML data. Bioinformatics 2017; 33:2598-2600. [PMID: 28402395 PMCID: PMC5870861 DOI: 10.1093/bioinformatics/btx169] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Revised: 02/14/2017] [Accepted: 04/05/2017] [Indexed: 02/03/2023] Open
Abstract
SUMMARY Submission to the MetaboLights repository for metabolomics data currently places the burden of reporting instrument and acquisition parameters in ISA-Tab format on users, who have to do it manually, a process that is time consuming and prone to user input error. Since the large majority of these parameters are embedded in instrument raw data files, an opportunity exists to capture this metadata more accurately. Here we report a set of Python packages that can automatically generate ISA-Tab metadata file stubs from raw XML metabolomics data files. The parsing packages are separated into mzML2ISA (encompassing mzML and imzML formats) and nmrML2ISA (nmrML format only). Overall, the use of mzML2ISA & nmrML2ISA reduces the time needed to capture metadata substantially (capturing 90% of metadata on assay and sample levels), is much less prone to user input errors, improves compliance with minimum information reporting guidelines and facilitates more finely grained data exploration and querying of datasets. AVAILABILITY AND IMPLEMENTATION mzML2ISA & nmrML2ISA are available under version 3 of the GNU General Public Licence at https://github.com/ISA-tools. Documentation is available from http://2isa.readthedocs.io/en/latest/. CONTACT reza.salek@ebi.ac.uk or isatools@googlegroups.com. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Thomas N Lawson
- School of Biosciences, University of Birmingham, Birmingham, UK
| | - Ralf J M Weber
- School of Biosciences, University of Birmingham, Birmingham, UK
- Phenome Centre Birmingham, University of Birmingham, Birmingham, UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, UK
| | - Kenneth Haug
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, UK
| | | | - Mark R Viant
- School of Biosciences, University of Birmingham, Birmingham, UK
- Phenome Centre Birmingham, University of Birmingham, Birmingham, UK
| | - Christoph Steinbeck
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, UK
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University, Lessingstr. 8, Jena, Germany
| | - Reza M Salek
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, UK
| |
Collapse
|
13
|
Supporting metabolomics with adaptable software: design architectures for the end-user. Curr Opin Biotechnol 2017; 43:110-117. [DOI: 10.1016/j.copbio.2016.11.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Revised: 10/31/2016] [Accepted: 11/01/2016] [Indexed: 02/07/2023]
|
14
|
Boué S, Exner T, Ghosh S, Belcastro V, Dokler J, Page D, Boda A, Bonjour F, Hardy B, Vanscheeuwijck P, Hoeng J, Peitsch M. Supporting evidence-based analysis for modified risk tobacco products through a toxicology data-sharing infrastructure. F1000Res 2017; 6:12. [PMID: 29123642 PMCID: PMC5657032 DOI: 10.12688/f1000research.10493.2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/25/2017] [Indexed: 01/24/2023] Open
Abstract
The US FDA defines modified risk tobacco products (MRTPs) as products that aim to reduce harm or the risk of tobacco-related disease associated with commercially marketed tobacco products. Establishing a product’s potential as an MRTP requires scientific substantiation including toxicity studies and measures of disease risk relative to those of cigarette smoking. Best practices encourage verification of the data from such studies through sharing and open standards. Building on the experience gained from the OpenTox project, a proof-of-concept database and website (
INTERVALS) has been developed to share results from both
in vivo inhalation studies and
in vitro studies conducted by Philip Morris International R&D to assess candidate MRTPs. As datasets are often generated by diverse methods and standards, they need to be traceable, curated, and the methods used well described so that knowledge can be gained using data science principles and tools. The data-management framework described here accounts for the latest standards of data sharing and research reproducibility. Curated data and methods descriptions have been prepared in ISA-Tab format and stored in a database accessible via a search portal on the INTERVALS website. The portal allows users to browse the data by study or mechanism (e.g., inflammation, oxidative stress) and obtain information relevant to study design, methods, and the most important results. Given the successful development of the initial infrastructure, the goal is to grow this initiative and establish a public repository for 21
st-century preclinical systems toxicology MRTP assessment data and results that supports open data principles.
Collapse
Affiliation(s)
- Stéphanie Boué
- PMI R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
| | | | | | | | - Joh Dokler
- Douglas Connect GmbH, Zeiningen, Switzerland
| | - David Page
- PMI R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
| | - Akash Boda
- PMI R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
| | - Filipe Bonjour
- PMI R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
| | - Barry Hardy
- Douglas Connect GmbH, Zeiningen, Switzerland
| | | | - Julia Hoeng
- PMI R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
| | - Manuel Peitsch
- PMI R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
| |
Collapse
|
15
|
Boué S, Exner T, Ghosh S, Belcastro V, Dokler J, Page D, Boda A, Bonjour F, Hardy B, Vanscheeuwijck P, Hoeng J, Peitsch M. Supporting evidence-based analysis for modified risk tobacco products through a toxicology data-sharing infrastructure. F1000Res 2017; 6:12. [PMID: 29123642 DOI: 10.12688/f1000research.10493.1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/03/2017] [Indexed: 12/11/2022] Open
Abstract
The US FDA defines modified risk tobacco products (MRTPs) as products that aim to reduce harm or the risk of tobacco-related disease associated with commercially marketed tobacco products. Establishing a product's potential as an MRTP requires scientific substantiation including toxicity studies and measures of disease risk relative to those of cigarette smoking. Best practices encourage verification of the data from such studies through sharing and open standards. Building on the experience gained from the OpenTox project, a proof-of-concept database and website ( INTERVALS) has been developed to share results from both in vivo inhalation studies and in vitro studies conducted by Philip Morris International R&D to assess candidate MRTPs. As datasets are often generated by diverse methods and standards, they need to be traceable, curated, and the methods used well described so that knowledge can be gained using data science principles and tools. The data-management framework described here accounts for the latest standards of data sharing and research reproducibility. Curated data and methods descriptions have been prepared in ISA-Tab format and stored in a database accessible via a search portal on the INTERVALS website. The portal allows users to browse the data by study or mechanism (e.g., inflammation, oxidative stress) and obtain information relevant to study design, methods, and the most important results. Given the successful development of the initial infrastructure, the goal is to grow this initiative and establish a public repository for 21 st-century preclinical systems toxicology MRTP assessment data and results that supports open data principles.
Collapse
Affiliation(s)
- Stéphanie Boué
- PMI R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
| | | | | | | | - Joh Dokler
- Douglas Connect GmbH, Zeiningen, Switzerland
| | - David Page
- PMI R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
| | - Akash Boda
- PMI R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
| | - Filipe Bonjour
- PMI R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
| | - Barry Hardy
- Douglas Connect GmbH, Zeiningen, Switzerland
| | | | - Julia Hoeng
- PMI R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
| | - Manuel Peitsch
- PMI R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
| |
Collapse
|
16
|
Eghbalnia HR, Romero PR, Westler WM, Baskaran K, Ulrich EL, Markley JL. Increasing rigor in NMR-based metabolomics through validated and open source tools. Curr Opin Biotechnol 2016; 43:56-61. [PMID: 27643760 DOI: 10.1016/j.copbio.2016.08.005] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2016] [Revised: 08/15/2016] [Accepted: 08/30/2016] [Indexed: 01/18/2023]
Abstract
The metabolome, the collection of small molecules associated with an organism, is a growing subject of inquiry, with the data utilized for data-intensive systems biology, disease diagnostics, biomarker discovery, and the broader characterization of small molecules in mixtures. Owing to their close proximity to the functional endpoints that govern an organism's phenotype, metabolites are highly informative about functional states. The field of metabolomics identifies and quantifies endogenous and exogenous metabolites in biological samples. Information acquired from nuclear magnetic spectroscopy (NMR), mass spectrometry (MS), and the published literature, as processed by statistical approaches, are driving increasingly wider applications of metabolomics. This review focuses on the role of databases and software tools in advancing the rigor, robustness, reproducibility, and validation of metabolomics studies.
Collapse
Affiliation(s)
- Hamid R Eghbalnia
- Biochemistry Department, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI 53706, USA.
| | - Pedro R Romero
- Biochemistry Department, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI 53706, USA
| | - William M Westler
- Biochemistry Department, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI 53706, USA
| | - Kumaran Baskaran
- Biochemistry Department, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI 53706, USA
| | - Eldon L Ulrich
- Biochemistry Department, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI 53706, USA
| | - John L Markley
- Biochemistry Department, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI 53706, USA
| |
Collapse
|
17
|
Marchese Robinson RL, Lynch I, Peijnenburg W, Rumble J, Klaessig F, Marquardt C, Rauscher H, Puzyn T, Purian R, Åberg C, Karcher S, Vriens H, Hoet P, Hoover MD, Hendren CO, Harper SL. How should the completeness and quality of curated nanomaterial data be evaluated? NANOSCALE 2016; 8:9919-43. [PMID: 27143028 PMCID: PMC4899944 DOI: 10.1039/c5nr08944a] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict nanomaterials' behaviour. This supports innovation in, and regulation of, nanotechnology. It is commonly understood that curated data need to be sufficiently complete and of sufficient quality to serve their intended purpose. However, assessing data completeness and quality is non-trivial in general and is arguably especially difficult in the nanoscience area, given its highly multidisciplinary nature. The current article, part of the Nanomaterial Data Curation Initiative series, addresses how to assess the completeness and quality of (curated) nanomaterial data. In order to address this key challenge, a variety of related issues are discussed: the meaning and importance of data completeness and quality, existing approaches to their assessment and the key challenges associated with evaluating the completeness and quality of curated nanomaterial data. Considerations which are specific to the nanoscience area and lessons which can be learned from other relevant scientific disciplines are considered. Hence, the scope of this discussion ranges from physicochemical characterisation requirements for nanomaterials and interference of nanomaterials with nanotoxicology assays to broader issues such as minimum information checklists, toxicology data quality schemes and computational approaches that facilitate evaluation of the completeness and quality of (curated) data. This discussion is informed by a literature review and a survey of key nanomaterial data curation stakeholders. Finally, drawing upon this discussion, recommendations are presented concerning the central question: how should the completeness and quality of curated nanomaterial data be evaluated?
Collapse
Affiliation(s)
- Richard L. Marchese Robinson
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool, L3 3AF, United Kingdom
| | - Iseult Lynch
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, B15 2TT Birmingham, United Kingdom
| | - Willie Peijnenburg
- National Institute of Public Health and the Environment (RIVM), Bilthoven, The Netherlands
- Institute of Environmental Sciences, Leiden University, Leiden, The Netherlands
| | - John Rumble
- R&R Data Services, 11 Montgomery Avenue, Gaithersburg MD 20877 USA
| | - Fred Klaessig
- Pennsylvania Bio Nano Systems LLC, 3805 Old Easton Road, Doylestown, PA 18902
| | - Clarissa Marquardt
- Institute of Applied Computer Sciences (IAI), Karlsruhe Institute of Technology (KIT), Hermann v. Helmholtz Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Hubert Rauscher
- European Commission, Joint Research Centre, Institute for Health and Consumer Protection, Via Fermi 2749, 21027 Ispra (VA), Italy
| | - Tomasz Puzyn
- Laboratory of Environmental Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Ronit Purian
- Faculty of Engineering, Tel Aviv University, Tel Aviv 69978 Israel
| | - Christoffer Åberg
- Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
| | - Sandra Karcher
- Civil and Environmental Engineering, Carnegie Mellon University, Pittsburgh, PA 15213-3890
| | - Hanne Vriens
- Department of Public Health and Primary Care, K.U.Leuven, Faculty of Medicine, Unit Environment & Health – Toxicology, Herestraat 49 (O&N 706), Leuven, Belgium
| | - Peter Hoet
- Department of Public Health and Primary Care, K.U.Leuven, Faculty of Medicine, Unit Environment & Health – Toxicology, Herestraat 49 (O&N 706), Leuven, Belgium
| | - Mark D. Hoover
- National Institute for Occupational Safety and Health, 1095 Willowdale Road, Morgantown, WV 26505-2888
| | - Christine Ogilvie Hendren
- Center for the Environmental Implications of NanoTechnology, Duke University, PO Box 90287 121 Hudson Hall, Durham NC 27708
| | - Stacey L. Harper
- Department of Environmental and Molecular Toxicology, School of Chemical, Biological and Environmental Engineering, Oregon State University, 1007 ALS, Corvallis, OR 97331
| |
Collapse
|
18
|
Bandrowski A, Brinkman R, Brochhausen M, Brush MH, Bug B, Chibucos MC, Clancy K, Courtot M, Derom D, Dumontier M, Fan L, Fostel J, Fragoso G, Gibson F, Gonzalez-Beltran A, Haendel MA, He Y, Heiskanen M, Hernandez-Boussard T, Jensen M, Lin Y, Lister AL, Lord P, Malone J, Manduchi E, McGee M, Morrison N, Overton JA, Parkinson H, Peters B, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Schober D, Smith B, Soldatova LN, Stoeckert CJ, Taylor CF, Torniai C, Turner JA, Vita R, Whetzel PL, Zheng J. The Ontology for Biomedical Investigations. PLoS One 2016; 11:e0154556. [PMID: 27128319 PMCID: PMC4851331 DOI: 10.1371/journal.pone.0154556] [Citation(s) in RCA: 142] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Accepted: 04/17/2016] [Indexed: 12/18/2022] Open
Abstract
The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed in association with OBI. The current release of OBI is available at http://purl.obolibrary.org/obo/obi.owl.
Collapse
Affiliation(s)
- Anita Bandrowski
- University of California San Diego, La Jolla, California, United States of America
| | - Ryan Brinkman
- British Columbia Cancer Research Centre, Vancouver, British Columbia, Canada
| | - Mathias Brochhausen
- University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - Matthew H. Brush
- Oregon Health and Science University, Portland, Oregon, United States of America
| | - Bill Bug
- Drexel University College of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Marcus C. Chibucos
- University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Kevin Clancy
- Thermo Fisher Scientific, Carlsbad, California, United States of America
| | | | - Dirk Derom
- The Vrije Universiteit Brussel, Ixelles, Brussels, Belgium
| | - Michel Dumontier
- Stanford University, Stanford, California, United States of America
| | - Liju Fan
- Ontology Workshop, LLC, Columbia, Maryland, United States of America
| | - Jennifer Fostel
- National Toxicology Program, NIEHS, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Gilberto Fragoso
- Center for Biomedical Informatics and Information Technology, National Institutes of Health, Rockville, Maryland, United States of America
| | - Frank Gibson
- Royal Society of Chemistry, Cambridge, Cambridgeshire, United Kingdom
| | | | - Melissa A. Haendel
- Oregon Health and Science University, Portland, Oregon, United States of America
| | - Yongqun He
- University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Mervi Heiskanen
- National Cancer Institute, Rockville, Maryland, United States of America
| | | | - Mark Jensen
- University at Buffalo, Buffalo, New York, United States of America
| | - Yu Lin
- University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | | | - Phillip Lord
- Newcastle University, Newcastle-upon-Tyne, Tyne and Wear, United Kingdom
| | - James Malone
- European Molecular Biology Laboratory- European Bioinformatics Institute, Hinxton, Cambridgeshire, United Kingdom
| | - Elisabetta Manduchi
- University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Monnie McGee
- Southern Methodist University, Dallas, Texas, United States of America
| | - Norman Morrison
- The University of Manchester, Manchester, Greater Manchester, United Kingdom
| | - James A. Overton
- La Jolla Institute for Allergy and Immunology, La Jolla, California, United States of America
| | - Helen Parkinson
- European Molecular Biology Laboratory- European Bioinformatics Institute, Hinxton, Cambridgeshire, United Kingdom
| | - Bjoern Peters
- La Jolla Institute for Allergy and Immunology, La Jolla, California, United States of America
| | | | - Alan Ruttenberg
- University at Buffalo, Buffalo, New York, United States of America
| | | | | | - Daniel Schober
- Leibniz Institute of Plant Biochemistry, Halle, Saxony-Anhalt, Germany
| | - Barry Smith
- University at Buffalo, Buffalo, New York, United States of America
| | | | | | - Chris F. Taylor
- European Molecular Biology Laboratory- European Bioinformatics Institute, Hinxton, Cambridgeshire, United Kingdom
| | - Carlo Torniai
- Oregon Health and Science University, Portland, Oregon, United States of America
| | - Jessica A. Turner
- Georgia State University, Atlanta, Georgia, United States of America
| | - Randi Vita
- La Jolla Institute for Allergy and Immunology, La Jolla, California, United States of America
| | - Patricia L. Whetzel
- University of California San Diego, La Jolla, California, United States of America
| | - Jie Zheng
- University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
19
|
Lubitz T, Hahn J, Bergmann FT, Noor E, Klipp E, Liebermeister W. SBtab: a flexible table format for data exchange in systems biology. Bioinformatics 2016; 32:2559-61. [PMID: 27153616 PMCID: PMC4978929 DOI: 10.1093/bioinformatics/btw179] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Accepted: 03/31/2016] [Indexed: 02/06/2023] Open
Abstract
Summary: SBtab is a table-based data format for Systems Biology, designed to support automated data integration and model building. It uses the structure of spreadsheets and defines conventions for table structure, controlled vocabularies and semantic annotations. The format comes with predefined table types for experimental data and SBML-compliant model structures and can easily be customized to cover new types of data. Availability and Implementation: SBtab documents can be created and edited with any text editor or spreadsheet tool. The website www.sbtab.net provides online tools for syntax validation and conversion to SBML and HTML, as well as software for using SBtab in MS Excel, MATLAB and R. The stand-alone Python code contains functions for file parsing, validation, conversion to SBML and HTML and an interface to SQLite databases, to be integrated into Systems Biology workflows. A detailed specification of SBtab, including examples and descriptions of table types and available tools, can be found at www.sbtab.net. Contact: wolfram.liebermeister@gmail.com
Collapse
Affiliation(s)
- Timo Lubitz
- Theoretical Biophysics, Institute of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Jens Hahn
- Theoretical Biophysics, Institute of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Frank T Bergmann
- COS/Bioquant, Ruprecht-Karls-Universität Heidelberg, Heidelberg, Germany
| | - Elad Noor
- Institute of Molecular Systems Biology, Eidgenössische Technische Hochschule Zürich, Zurich, Switzerland
| | - Edda Klipp
- Theoretical Biophysics, Institute of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | | |
Collapse
|
20
|
Misra BB, van der Hooft JJJ. Updates in metabolomics tools and resources: 2014-2015. Electrophoresis 2015; 37:86-110. [DOI: 10.1002/elps.201500417] [Citation(s) in RCA: 100] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2015] [Revised: 10/04/2015] [Accepted: 10/05/2015] [Indexed: 12/12/2022]
Affiliation(s)
- Biswapriya B. Misra
- Department of Biology, Genetics Institute; University of Florida; Gainesville FL USA
| | | |
Collapse
|
21
|
Cserhati MF, Pandey S, Beaudoin JJ, Baccaglini L, Guda C, Fox HS. The National NeuroAIDS Tissue Consortium (NNTC) Database: an integrated database for HIV-related studies. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav074. [PMID: 26228431 PMCID: PMC4520230 DOI: 10.1093/database/bav074] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/09/2015] [Accepted: 06/30/2015] [Indexed: 11/13/2022]
Abstract
We herein present the National NeuroAIDS Tissue Consortium-Data Coordinating Center (NNTC-DCC) database, which is the only available database for neuroAIDS studies that contains data in an integrated, standardized form. This database has been created in conjunction with the NNTC, which provides human tissue and biofluid samples to individual researchers to conduct studies focused on neuroAIDS. The database contains experimental datasets from 1206 subjects for the following categories (which are further broken down into subcategories): gene expression, genotype, proteins, endo-exo-chemicals, morphometrics and other (miscellaneous) data. The database also contains a wide variety of downloadable data and metadata for 95 HIV-related studies covering 170 assays from 61 principal investigators. The data represent 76 tissue types, 25 measurement types, and 38 technology types, and reaches a total of 33 017 407 data points. We used the ISA platform to create the database and develop a searchable web interface for querying the data. A gene search tool is also available, which searches for NCBI GEO datasets associated with selected genes. The database is manually curated with many user-friendly features, and is cross-linked to the NCBI, HUGO and PubMed databases. A free registration is required for qualified users to access the database. Database URL: http://nntc-dcc.unmc.edu
Collapse
Affiliation(s)
- Matyas F Cserhati
- Department of Genetics, Cell Biology and Anatomy, Bioinformatics and Systems Biology Core
| | - Sanjit Pandey
- Department of Genetics, Cell Biology and Anatomy, Bioinformatics and Systems Biology Core
| | - James J Beaudoin
- Department of Pharmacology and Experimental Neuroscience, College of Medicine
| | | | - Chittibabu Guda
- Department of Genetics, Cell Biology and Anatomy, Bioinformatics and Systems Biology Core, Fred and Pamela Buffet Cancer Center, Eppley Institute for Cancer Research, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Howard S Fox
- Department of Pharmacology and Experimental Neuroscience, College of Medicine,
| |
Collapse
|
22
|
Ara T, Enomoto M, Arita M, Ikeda C, Kera K, Yamada M, Nishioka T, Ikeda T, Nihei Y, Shibata D, Kanaya S, Sakurai N. Metabolonote: a wiki-based database for managing hierarchical metadata of metabolome analyses. Front Bioeng Biotechnol 2015; 3:38. [PMID: 25905099 PMCID: PMC4388006 DOI: 10.3389/fbioe.2015.00038] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Accepted: 03/13/2015] [Indexed: 01/04/2023] Open
Abstract
Metabolomics – technology for comprehensive detection of small molecules in an organism – lags behind the other “omics” in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called “Togo Metabolome Data” (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers’ understanding and use of data but also submitters’ motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/.
Collapse
Affiliation(s)
- Takeshi Ara
- Department of Technology Development, Kazusa DNA Research Institute , Kisarazu , Japan ; National Bioscience Database Center (NBDC), Japan Science and Technology Agency (JST) , Tokyo , Japan
| | - Mitsuo Enomoto
- Department of Technology Development, Kazusa DNA Research Institute , Kisarazu , Japan ; National Bioscience Database Center (NBDC), Japan Science and Technology Agency (JST) , Tokyo , Japan
| | - Masanori Arita
- National Bioscience Database Center (NBDC), Japan Science and Technology Agency (JST) , Tokyo , Japan ; RIKEN Center for Sustainable Resource Science , Yokohama , Japan
| | - Chiaki Ikeda
- Department of Technology Development, Kazusa DNA Research Institute , Kisarazu , Japan ; National Bioscience Database Center (NBDC), Japan Science and Technology Agency (JST) , Tokyo , Japan
| | - Kota Kera
- Department of Research & Development, Kazusa DNA Research Institute , Kisarazu , Japan
| | - Manabu Yamada
- Department of Technology Development, Kazusa DNA Research Institute , Kisarazu , Japan ; National Bioscience Database Center (NBDC), Japan Science and Technology Agency (JST) , Tokyo , Japan
| | - Takaaki Nishioka
- National Bioscience Database Center (NBDC), Japan Science and Technology Agency (JST) , Tokyo , Japan ; Graduate School of Information Science, Nara Institute of Science and Technology , Ikoma , Japan
| | - Tasuku Ikeda
- National Bioscience Database Center (NBDC), Japan Science and Technology Agency (JST) , Tokyo , Japan ; Graduate School of Information Science, Nara Institute of Science and Technology , Ikoma , Japan
| | - Yoshito Nihei
- National Bioscience Database Center (NBDC), Japan Science and Technology Agency (JST) , Tokyo , Japan ; Graduate School of Information Science, Nara Institute of Science and Technology , Ikoma , Japan
| | - Daisuke Shibata
- Department of Technology Development, Kazusa DNA Research Institute , Kisarazu , Japan
| | - Shigehiko Kanaya
- National Bioscience Database Center (NBDC), Japan Science and Technology Agency (JST) , Tokyo , Japan ; Graduate School of Information Science, Nara Institute of Science and Technology , Ikoma , Japan
| | - Nozomu Sakurai
- Department of Technology Development, Kazusa DNA Research Institute , Kisarazu , Japan ; National Bioscience Database Center (NBDC), Japan Science and Technology Agency (JST) , Tokyo , Japan
| |
Collapse
|
23
|
Marchese Robinson RL, Cronin MTD, Richarz AN, Rallo R. An ISA-TAB-Nano based data collection framework to support data-driven modelling of nanotoxicology. BEILSTEIN JOURNAL OF NANOTECHNOLOGY 2015; 6:1978-99. [PMID: 26665069 PMCID: PMC4660926 DOI: 10.3762/bjnano.6.202] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 08/27/2015] [Indexed: 05/20/2023]
Abstract
Analysis of trends in nanotoxicology data and the development of data driven models for nanotoxicity is facilitated by the reporting of data using a standardised electronic format. ISA-TAB-Nano has been proposed as such a format. However, in order to build useful datasets according to this format, a variety of issues has to be addressed. These issues include questions regarding exactly which (meta)data to report and how to report them. The current article discusses some of the challenges associated with the use of ISA-TAB-Nano and presents a set of resources designed to facilitate the manual creation of ISA-TAB-Nano datasets from the nanotoxicology literature. These resources were developed within the context of the NanoPUZZLES EU project and include data collection templates, corresponding business rules that extend the generic ISA-TAB-Nano specification as well as Python code to facilitate parsing and integration of these datasets within other nanoinformatics resources. The use of these resources is illustrated by a "Toy Dataset" presented in the Supporting Information. The strengths and weaknesses of the resources are discussed along with possible future developments.
Collapse
Affiliation(s)
- Richard L Marchese Robinson
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool, L3 3AF, United Kingdom
| | - Mark T D Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool, L3 3AF, United Kingdom
| | - Andrea-Nicole Richarz
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool, L3 3AF, United Kingdom
| | - Robert Rallo
- Departament d'Enginyeria Informatica i Matematiques, Universitat Rovira i Virgili, Av. Paisos Catalans 26, 43007 Tarragona, Catalunya, Spain
| |
Collapse
|
24
|
Hettne KM, Dharuri H, Zhao J, Wolstencroft K, Belhajjame K, Soiland-Reyes S, Mina E, Thompson M, Cruickshank D, Verdes-Montenegro L, Garrido J, de Roure D, Corcho O, Klyne G, van Schouwen R, ‘t Hoen PAC, Bechhofer S, Goble C, Roos M. Structuring research methods and data with the research object model: genomics workflows as a case study. J Biomed Semantics 2014; 5:41. [PMID: 25276335 PMCID: PMC4177597 DOI: 10.1186/2041-1480-5-41] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Accepted: 07/29/2014] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. RESULTS We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as "which particular data was input to a particular workflow to test a particular hypothesis?", and "which particular conclusions were drawn from a particular workflow?". CONCLUSIONS Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well. AVAILABILITY The Research Object is available at http://www.myexperiment.org/packs/428 The Wf4Ever Research Object Model is available at http://wf4ever.github.io/ro.
Collapse
Affiliation(s)
- Kristina M Hettne
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Harish Dharuri
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Jun Zhao
- />Department of Zoology, University of Oxford, Oxford, UK
| | - Katherine Wolstencroft
- />School of Computer Science, University of Manchester, Manchester, UK
- />Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
| | - Khalid Belhajjame
- />School of Computer Science, University of Manchester, Manchester, UK
| | | | - Eleni Mina
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Mark Thompson
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | | | | | | | - David de Roure
- />Department of Zoology, University of Oxford, Oxford, UK
| | - Oscar Corcho
- />Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain
| | - Graham Klyne
- />Department of Zoology, University of Oxford, Oxford, UK
| | - Reinout van Schouwen
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Peter A C ‘t Hoen
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Sean Bechhofer
- />School of Computer Science, University of Manchester, Manchester, UK
| | - Carole Goble
- />School of Computer Science, University of Manchester, Manchester, UK
| | - Marco Roos
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
25
|
Buckler AJ, Paik D, Ouellette M, Danagoulian J, Wernsing G, Suzek BE. A novel knowledge representation framework for the statistical validation of quantitative imaging biomarkers. J Digit Imaging 2014; 26:614-29. [PMID: 23546775 DOI: 10.1007/s10278-013-9598-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Quantitative imaging biomarkers are of particular interest in drug development for their potential to accelerate the drug development pipeline. The lack of consensus methods and carefully characterized performance hampers the widespread availability of these quantitative measures. A framework to support collaborative work on quantitative imaging biomarkers would entail advanced statistical techniques, the development of controlled vocabularies, and a service-oriented architecture for processing large image archives. Until now, this framework has not been developed. With the availability of tools for automatic ontology-based annotation of datasets, coupled with image archives, and a means for batch selection and processing of image and clinical data, imaging will go through a similar increase in capability analogous to what advanced genetic profiling techniques have brought to molecular biology. We report on our current progress on developing an informatics infrastructure to store, query, and retrieve imaging biomarker data across a wide range of resources in a semantically meaningful way that facilitates the collaborative development and validation of potential imaging biomarkers by many stakeholders. Specifically, we describe the semantic components of our system, QI-Bench, that are used to specify and support experimental activities for statistical validation in quantitative imaging.
Collapse
|
26
|
Ruggeri B, Sarkans U, Schumann G, Persico AM. Biomarkers in autism spectrum disorder: the old and the new. Psychopharmacology (Berl) 2014; 231:1201-16. [PMID: 24096533 DOI: 10.1007/s00213-013-3290-7] [Citation(s) in RCA: 118] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Accepted: 09/07/2013] [Indexed: 12/21/2022]
Abstract
RATIONALE Autism spectrum disorder (ASD) is a complex heterogeneous neurodevelopmental disorder with onset during early childhood and typically a life-long course. The majority of ASD cases stems from complex, 'multiple-hit', oligogenic/polygenic underpinnings involving several loci and possibly gene-environment interactions. These multiple layers of complexity spur interest into the identification of biomarkers able to define biologically homogeneous subgroups, predict autism risk prior to the onset of behavioural abnormalities, aid early diagnoses, predict the developmental trajectory of ASD children, predict response to treatment and identify children at risk for severe adverse reactions to psychoactive drugs. OBJECTIVES The present paper reviews (a) similarities and differences between the concepts of 'biomarker' and 'endophenotype', (b) established biomarkers and endophenotypes in autism research (biochemical, morphological, hormonal, immunological, neurophysiological and neuroanatomical, neuropsychological, behavioural), (c) -omics approaches towards the discovery of novel biomarker panels for ASD, (d) bioresource infrastructures and (e) data management for biomarker research in autism. RESULTS Known biomarkers, such as abnormal blood levels of serotonin, oxytocin, melatonin, immune cytokines and lymphocyte subtypes, multiple neuropsychological, electrophysiological and brain imaging parameters, will eventually merge with novel biomarkers identified using unbiased genomic, epigenomic, transcriptomic, proteomic and metabolomic methods, to generate multimarker panels. Bioresource infrastructures, data management and data analysis using artificial intelligence networks will be instrumental in supporting efforts to identify these biomarker panels. CONCLUSIONS Biomarker research has great heuristic potential in targeting autism diagnosis and treatment.
Collapse
Affiliation(s)
- Barbara Ruggeri
- MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, King's College London, De Crespigny Park, London, SE5 8AF, UK
| | | | | | | |
Collapse
|
27
|
Tomlinson CD, Barton GR, Woodbridge M, Butcher SA. XperimentR: painless annotation of a biological experiment for the laboratory scientist. BMC Bioinformatics 2013; 14:8. [PMID: 23323856 PMCID: PMC3571946 DOI: 10.1186/1471-2105-14-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Accepted: 12/29/2012] [Indexed: 11/10/2022] Open
Abstract
Background Today’s biological experiments often involve the collaboration of multidisciplinary researchers utilising several high throughput ‘omics platforms. There is a requirement for the details of the experiment to be adequately described using standardised ontologies to enable data preservation, the analysis of the data and to facilitate the export of the data to public repositories. However there are a bewildering number of ontologies, controlled vocabularies, and minimum standards available for use to describe experiments. There is a need for user-friendly software tools to aid laboratory scientists in capturing the experimental information. Results A web application called XperimentR has been developed for use by laboratory scientists, consisting of a browser-based interface and server-side components which provide an intuitive platform for capturing and sharing experimental metadata. Information recorded includes details about the biological samples, procedures, protocols, and experimental technologies, all of which can be easily annotated using the appropriate ontologies. Files and raw data can be imported and associated with the biological samples via the interface, from either users’ computers, or commonly used open-source data repositories. Experiments can be shared with other users, and experiments can be exported in the standard ISA-Tab format for deposition in public databases. XperimentR is freely available and can be installed natively or by using a provided pre-configured Virtual Machine. A guest system is also available for trial purposes. Conclusion We present a web based software application to aid the laboratory scientist to capture, describe and share details about their experiments.
Collapse
Affiliation(s)
- Chris D Tomlinson
- Centre for Integrated Systems Biology and Bioinformatics, Imperial College London, Sir Ernst Chain Building, South Kensington Campus, London SW7 2AZ, UK.
| | | | | | | |
Collapse
|
28
|
Thomas DG, Gaheen S, Harper SL, Fritts M, Klaessig F, Hahn-Dantona E, Paik D, Pan S, Stafford GA, Freund ET, Klemm JD, Baker NA. ISA-TAB-Nano: a specification for sharing nanomaterial research data in spreadsheet-based format. BMC Biotechnol 2013; 13:2. [PMID: 23311978 PMCID: PMC3598649 DOI: 10.1186/1472-6750-13-2] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2012] [Accepted: 12/11/2012] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND AND MOTIVATION The high-throughput genomics communities have been successfully using standardized spreadsheet-based formats to capture and share data within labs and among public repositories. The nanomedicine community has yet to adopt similar standards to share the diverse and multi-dimensional types of data (including metadata) pertaining to the description and characterization of nanomaterials. Owing to the lack of standardization in representing and sharing nanomaterial data, most of the data currently shared via publications and data resources are incomplete, poorly-integrated, and not suitable for meaningful interpretation and re-use of the data. Specifically, in its current state, data cannot be effectively utilized for the development of predictive models that will inform the rational design of nanomaterials. RESULTS We have developed a specification called ISA-TAB-Nano, which comprises four spreadsheet-based file formats for representing and integrating various types of nanomaterial data. Three file formats (Investigation, Study, and Assay files) have been adapted from the established ISA-TAB specification; while the Material file format was developed de novo to more readily describe the complexity of nanomaterials and associated small molecules. In this paper, we have discussed the main features of each file format and how to use them for sharing nanomaterial descriptions and assay metadata. CONCLUSION The ISA-TAB-Nano file formats provide a general and flexible framework to record and integrate nanomaterial descriptions, assay data (metadata and endpoint measurements) and protocol information. Like ISA-TAB, ISA-TAB-Nano supports the use of ontology terms to promote standardized descriptions and to facilitate search and integration of the data. The ISA-TAB-Nano specification has been submitted as an ASTM work item to obtain community feedback and to provide a nanotechnology data-sharing standard for public development and adoption.
Collapse
Affiliation(s)
- Dennis G Thomas
- Knowledge Discovery and Informatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - Sharon Gaheen
- SAIC-Frederick, Inc, Frederick National Laboratory for Cancer Research, Information Systems Program, Rockville, MD 20852, USA
| | - Stacey L Harper
- Department of Environmental and Molecular Toxicology, School of Chemical, Biological and Environmental Engineering, Oregon State University, Corvallis, OR 97331, USA
| | - Martin Fritts
- SAIC-Frederick, Inc, Frederick National Laboratory for Cancer Research, Information Systems Program, Rockville, MD 20852, USA
| | - Fred Klaessig
- Pennsylvania Bio Nano Systems, LLC, Doylestown, PA, USA
| | | | - David Paik
- Department of Radiology, Stanford University, Stanford, CA 94305, USA
| | - Sue Pan
- SAIC-Frederick, Inc, Frederick National Laboratory for Cancer Research, Information Systems Program, Rockville, MD 20852, USA
| | | | | | - Juli D Klemm
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD 20852, USA
| | - Nathan A Baker
- Knowledge Discovery and Informatics, Pacific Northwest National Laboratory, PO Box 999, MSID K7-28, Richland, WA 99352, USA
| |
Collapse
|
29
|
Brandizi M, Kurbatova N, Sarkans U, Rocca-Serra P. graph2tab, a library to convert experimental workflow graphs into tabular formats. Bioinformatics 2012; 28:1665-7. [PMID: 22556367 PMCID: PMC3371871 DOI: 10.1093/bioinformatics/bts258] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Motivations: Spreadsheet-like tabular formats are ever more popular in the biomedical field as a mean for experimental reporting. The problem of converting the graph of an experimental workflow into a table-based representation occurs in many such formats and is not easy to solve. Results: We describe graph2tab, a library that implements methods to realise such a conversion in a size-optimised way. Our solution is generic and can be adapted to specific cases of data exporters or data converters that need to be implemented. Availability and Implementation: The library source code and documentation are available at http://github.com/ISA-tools/graph2tab. Contact:brandizi@ebi.ac.uk. Supplementary Information: A supplementary document describes the theoretical and technical details about the library implementation.
Collapse
Affiliation(s)
- Marco Brandizi
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK.
| | | | | | | |
Collapse
|
30
|
Dreher F, Kreitler T, Hardt C, Kamburov A, Yildirimman R, Schellander K, Lehrach H, Lange BMH, Herwig R. DIPSBC--data integration platform for systems biology collaborations. BMC Bioinformatics 2012; 13:85. [PMID: 22568834 PMCID: PMC3424966 DOI: 10.1186/1471-2105-13-85] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2011] [Accepted: 05/01/2012] [Indexed: 11/17/2022] Open
Abstract
Background Modern biomedical research is often organized in collaborations involving labs worldwide. In particular in systems biology, complex molecular systems are analyzed that require the generation and interpretation of heterogeneous data for their explanation, for example ranging from gene expression studies and mass spectrometry measurements to experimental techniques for detecting molecular interactions and functional assays. XML has become the most prominent format for representing and exchanging these data. However, besides the development of standards there is still a fundamental lack of data integration systems that are able to utilize these exchange formats, organize the data in an integrative way and link it with applications for data interpretation and analysis. Results We have developed DIPSBC, an interactive data integration platform supporting collaborative research projects, based on Foswiki, Solr/Lucene, and specific helper applications. We describe the main features of the implementation and highlight the performance of the system with several use cases. All components of the system are platform independent and open-source developments and thus can be easily adopted by researchers. An exemplary installation of the platform which also provides several helper applications and detailed instructions for system usage and setup is available at http://dipsbc.molgen.mpg.de. Conclusions DIPSBC is a data integration platform for medium-scale collaboration projects that has been tested already within several research collaborations. Because of its modular design and the incorporation of XML data formats it is highly flexible and easy to use.
Collapse
Affiliation(s)
- Felix Dreher
- Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Ihnestr. 63-73, 14195 Berlin, Germany.
| | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Wolstencroft K, Owen S, du Preez F, Krebs O, Mueller W, Goble C, Snoep JL. The SEEK: a platform for sharing data and models in systems biology. Methods Enzymol 2012; 500:629-55. [PMID: 21943917 DOI: 10.1016/b978-0-12-385118-5.00029-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2023]
Abstract
Systems biology research is typically performed by multidisciplinary groups of scientists, often in large consortia and in distributed locations. The data generated in these projects tend to be heterogeneous and often involves high-throughput "omics" analyses. Models are developed iteratively from data generated in the projects and from the literature. Consequently, there is a growing requirement for exchanging experimental data, mathematical models, and scientific protocols between consortium members and a necessity to record and share the outcomes of experiments and the links between data and models. The overall output of a research consortium is also a valuable commodity in its own right. The research and associated data and models should eventually be available to the whole community for reuse and future analysis. The SEEK is an open-source, Web-based platform designed for the management and exchange of systems biology data and models. The SEEK was originally developed for the SysMO (systems biology of microorganisms) consortia, but the principles and objectives are applicable to any systems biology project. The SEEK provides an index of consortium resources and acts as gateway to other tools and services commonly used in the community. For example, the model simulation tool, JWS Online, has been integrated into the SEEK, and a plug-in to PubMed allows publications to be linked to supporting data and author profiles in the SEEK. The SEEK is a pragmatic solution to data management which encourages, but does not force, researchers to share and disseminate their data to community standard formats. It provides tools to assist with management and annotation as well as incentives and added value for following these recommendations. Data exchange and reuse rely on sufficient annotation, consistent metadata descriptions, and the use of standard exchange formats for models, data, and the experiments they are derived from. In this chapter, we present the SEEK platform, its functionalities, and the methods employed for lowering the barriers to adoption of standard formats. As the production of biological data continues to grow, in systems biology and in the life sciences in general, the need to record, manage, and exploit this wealth of information in the future is increasing. We promote the SEEK as a data and model management tool that can be adapted to the specific needs of a particular systems biology project.
Collapse
Affiliation(s)
- Katy Wolstencroft
- School of Computer Science, University of Manchester, Manchester, United Kingdom
| | | | | | | | | | | | | |
Collapse
|
32
|
Ho Sui SJ, Begley K, Reilly D, Chapman B, McGovern R, Rocca-Sera P, Maguire E, Altschuler GM, Hansen TAA, Sompallae R, Krivtsov A, Shivdasani RA, Armstrong SA, Culhane AC, Correll M, Sansone SA, Hofmann O, Hide W. The Stem Cell Discovery Engine: an integrated repository and analysis system for cancer stem cell comparisons. Nucleic Acids Res 2012; 40:D984-91. [PMID: 22121217 PMCID: PMC3245064 DOI: 10.1093/nar/gkr1051] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2011] [Revised: 10/13/2011] [Accepted: 10/25/2011] [Indexed: 11/24/2022] Open
Abstract
Mounting evidence suggests that malignant tumors are initiated and maintained by a subpopulation of cancerous cells with biological properties similar to those of normal stem cells. However, descriptions of stem-like gene and pathway signatures in cancers are inconsistent across experimental systems. Driven by a need to improve our understanding of molecular processes that are common and unique across cancer stem cells (CSCs), we have developed the Stem Cell Discovery Engine (SCDE)-an online database of curated CSC experiments coupled to the Galaxy analytical framework. The SCDE allows users to consistently describe, share and compare CSC data at the gene and pathway level. Our initial focus has been on carefully curating tissue and cancer stem cell-related experiments from blood, intestine and brain to create a high quality resource containing 53 public studies and 1098 assays. The experimental information is captured and stored in the multi-omics Investigation/Study/Assay (ISA-Tab) format and can be queried in the data repository. A linked Galaxy framework provides a comprehensive, flexible environment populated with novel tools for gene list comparisons against molecular signatures in GeneSigDB and MSigDB, curated experiments in the SCDE and pathways in WikiPathways. The SCDE is available at http://discovery.hsci.harvard.edu.
Collapse
Affiliation(s)
- Shannan J. Ho Sui
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Kimberly Begley
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Dorothy Reilly
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Brad Chapman
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Ray McGovern
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Philippe Rocca-Sera
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Eamonn Maguire
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Gabriel M. Altschuler
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Terah A. A. Hansen
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Ramakrishna Sompallae
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Andrei Krivtsov
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Ramesh A. Shivdasani
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Scott A. Armstrong
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Aedín C. Culhane
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Mick Correll
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Susanna-Assunta Sansone
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Oliver Hofmann
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Winston Hide
- Department of Biostatistics, HSPH Bioinformatics Core, Harvard School of Public Health, Boston, MA, Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA, USA, Oxford e-Research Centre, University of Oxford, UK, Department of Pediatric Oncology, Children's Hospital, Dana Farber Cancer Institute, and Harvard Medical School, Boston, Harvard Stem Cell Institute, Cambridge, Department of Biostatistics, Dana Farber Cancer Institute and Center for Cancer Computational Biology, Dana Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
33
|
Abstract
There is a general agreement that the development of metabolomics depends not only on advances in chemical analysis techniques but also on advances in computing and data analysis methods. Metabolomics data usually requires intensive pre-processing, analysis, and mining procedures. Selecting and applying such procedures requires attention to issues including justification, traceability, and reproducibility. We describe a strategy for selecting data mining techniques which takes into consideration the goals of data mining techniques on the one hand, and the goals of metabolomics investigations and the nature of the data on the other. The strategy aims to ensure the validity and soundness of results and promote the achievement of the investigation goals.
Collapse
|
34
|
Plant AL, Elliott JT, Bhat TN. New concepts for building vocabulary for cell image ontologies. BMC Bioinformatics 2011; 12:487. [PMID: 22188658 PMCID: PMC3293096 DOI: 10.1186/1471-2105-12-487] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2011] [Accepted: 12/21/2011] [Indexed: 11/10/2022] Open
Abstract
Background There are significant challenges associated with the building of ontologies for cell biology experiments including the large numbers of terms and their synonyms. These challenges make it difficult to simultaneously query data from multiple experiments or ontologies. If vocabulary terms were consistently used and reused across and within ontologies, queries would be possible through shared terms. One approach to achieving this is to strictly control the terms used in ontologies in the form of a pre-defined schema, but this approach limits the individual researcher's ability to create new terms when needed to describe new experiments. Results Here, we propose the use of a limited number of highly reusable common root terms, and rules for an experimentalist to locally expand terms by adding more specific terms under more general root terms to form specific new vocabulary hierarchies that can be used to build ontologies. We illustrate the application of the method to build vocabularies and a prototype database for cell images that uses a visual data-tree of terms to facilitate sophisticated queries based on a experimental parameters. We demonstrate how the terminology might be extended by adding new vocabulary terms into the hierarchy of terms in an evolving process. In this approach, image data and metadata are handled separately, so we also describe a robust file-naming scheme to unambiguously identify image and other files associated with each metadata value. The prototype database http://sbd.nist.gov/ consists of more than 2000 images of cells and benchmark materials, and 163 metadata terms that describe experimental details, including many details about cell culture and handling. Image files of interest can be retrieved, and their data can be compared, by choosing one or more relevant metadata values as search terms. Metadata values for any dataset can be compared with corresponding values of another dataset through logical operations. Conclusions Organizing metadata for cell imaging experiments under a framework of rules that include highly reused root terms will facilitate the addition of new terms into a vocabulary hierarchy and encourage the reuse of terms. These vocabulary hierarchies can be converted into XML schema or RDF graphs for displaying and querying, but this is not necessary for using it to annotate cell images. Vocabulary data trees from multiple experiments or laboratories can be aligned at the root terms to facilitate query development. This approach of developing vocabularies is compatible with the major advances in database technology and could be used for building the Semantic Web.
Collapse
Affiliation(s)
- Anne L Plant
- Biochemical Science Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA.
| | | | | |
Collapse
|
35
|
Gostev M, Faulconbridge A, Brandizi M, Fernandez-Banet J, Sarkans U, Brazma A, Parkinson H. The BioSample Database (BioSD) at the European Bioinformatics Institute. Nucleic Acids Res 2011; 40:D64-70. [PMID: 22096232 PMCID: PMC3245134 DOI: 10.1093/nar/gkr937] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The BioSample Database (http://www.ebi.ac.uk/biosamples) is a new database at EBI that stores information about biological samples used in molecular experiments, such as sequencing, gene expression or proteomics. The goals of the BioSample Database include: (i) recording and linking of sample information consistently within EBI databases such as ENA, ArrayExpress and PRIDE; (ii) minimizing data entry efforts for EBI database submitters by enabling submitting sample descriptions once and referencing them later in data submissions to assay databases and (iii) supporting cross database queries by sample characteristics. Each sample in the database is assigned an accession number. The database includes a growing set of reference samples, such as cell lines, which are repeatedly used in experiments and can be easily referenced from any database by their accession numbers. Accession numbers for the reference samples will be exchanged with a similar database at NCBI. The samples in the database can be queried by their attributes, such as sample types, disease names or sample providers. A simple tab-delimited format facilitates submissions of sample information to the database, initially via email to biosamples@ebi.ac.uk
Collapse
Affiliation(s)
- Mikhail Gostev
- EMBL-EBI, the European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | | | | | | | | | | | | |
Collapse
|
36
|
Thomas DG, Klaessig F, Harper SL, Fritts M, Hoover MD, Gaheen S, Stokes TH, Reznik-Zellen R, Freund ET, Klemm JD, Paik DS, Baker NA. Informatics and standards for nanomedicine technology. WILEY INTERDISCIPLINARY REVIEWS. NANOMEDICINE AND NANOBIOTECHNOLOGY 2011; 3:511-532. [PMID: 21721140 PMCID: PMC3189420 DOI: 10.1002/wnan.152] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
There are several issues to be addressed concerning the management and effective use of information (or data), generated from nanotechnology studies in biomedical research and medicine. These data are large in volume, diverse in content, and are beset with gaps and ambiguities in the description and characterization of nanomaterials. In this work, we have reviewed three areas of nanomedicine informatics: information resources; taxonomies, controlled vocabularies, and ontologies; and information standards. Informatics methods and standards in each of these areas are critical for enabling collaboration; data sharing; unambiguous representation and interpretation of data; semantic (meaningful) search and integration of data; and for ensuring data quality, reliability, and reproducibility. In particular, we have considered four types of information standards in this article, which are standard characterization protocols, common terminology standards, minimum information standards, and standard data communication (exchange) formats. Currently, because of gaps and ambiguities in the data, it is also difficult to apply computational methods and machine learning techniques to analyze, interpret, and recognize patterns in data that are high dimensional in nature, and also to relate variations in nanomaterial properties to variations in their chemical composition, synthesis, characterization protocols, and so on. Progress toward resolving the issues of information management in nanomedicine using informatics methods and standards discussed in this article will be essential to the rapidly growing field of nanomedicine informatics.
Collapse
Affiliation(s)
- Dennis G. Thomas
- Knowledge Discovery and Informatics Group, Pacific Northwest National Laboratory.
| | | | - Stacey L. Harper
- Environmental and Molecular Toxicology & School of Chemical, Biological and Environmental Engineering. Oregon State University.
| | | | | | | | - Todd H. Stokes
- Department of Biomedical Engineering, Emory University and Georgia Tech.
| | | | | | - Juli D. Klemm
- Center for Biomedical Informatics and Information Technology, National Cancer Institute.
| | - David S. Paik
- Radiological Sciences Laboratory, Stanford University.
| | - Nathan A. Baker
- Pacific Northwest National Laboratory, 902 Battelle Blvd. P.O. Box 999, MSIN K7-28, Richland, WA 99352 USA
| |
Collapse
|
37
|
Abstract
During the development cycle of a new antibody therapy, the therapeutic agent will be tested on subsequently more biologically complex models. New experiments' designs are based upon data gathered from prior models. New researchers who inherit the data and researchers from groups with different cultures or expertise are often called upon to interpret these data. Experiments which are not recorded consistently or employ ambiguous terminology can make interpreting these results difficult. The researcher who had originally collected the data may not be at hand to correct any misunderstanding or offer clarification and data can be unknowingly misused. This introduces an element of risk into the therapy development process. We have developed a reporting guideline for recording therapy experiments. This guideline consists of a checklist of data to be recorded from antibody therapy experiments performed in molecular, cellular, animal and clinical model.
Collapse
|
38
|
Webb AJ, Thorisson GA, Brookes AJ. An informatics project and online "Knowledge Centre" supporting modern genotype-to-phenotype research. Hum Mutat 2011; 32:543-50. [PMID: 21438073 DOI: 10.1002/humu.21469] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2011] [Accepted: 01/28/2011] [Indexed: 11/06/2022]
Abstract
Explosive growth in the generation of genotype-to-phenotype (G2P) data necessitates a concerted effort to tackle the logistical and informatics challenges this presents. The GEN2PHEN Project represents one such effort, with a broad strategy of uniting disparate G2P resources into a hybrid centralized-federated network. This is achieved through a holistic strategy focussed on three overlapping areas: data input standards and pipelines through which to submit and collect data (data in); federated, independent, extendable, yet interoperable database platforms on which to store and curate widely diverse datasets (data storage); and data formats and mechanisms with which to exchange, combine, and extract data (data exchange and output). To fully leverage this data network, we have constructed the "G2P Knowledge Centre" (http://www.gen2phen.org). This central platform provides holistic searching of the G2P data domain allied with facilities for data annotation and user feedback, access to extensive G2P and informatics resources, and tools for constructing online working communities centered on the G2P domain. Through the efforts of GEN2PHEN, and through combining data with broader community-derived knowledge, the Knowledge Centre opens up exciting possibilities for organizing, integrating, sharing, and interpreting new waves of G2P data in a collaborative fashion.
Collapse
Affiliation(s)
- Adam J Webb
- Department of Genetics, University of Leicester, University Road, Leicester, United Kingdom.
| | | | | | | |
Collapse
|
39
|
Chervitz SA, Deutsch EW, Field D, Parkinson H, Quackenbush J, Rocca-Serra P, Sansone SA, Stoeckert CJ, Taylor CF, Taylor R, Ball CA. Data standards for Omics data: the basis of data sharing and reuse. Methods Mol Biol 2011; 719:31-69. [PMID: 21370078 PMCID: PMC4152841 DOI: 10.1007/978-1-61779-027-0_2] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
To facilitate sharing of Omics data, many groups of scientists have been working to establish the relevant data standards. The main components of data sharing standards are experiment description standards, data exchange standards, terminology standards, and experiment execution standards. Here we provide a survey of existing and emerging standards that are intended to assist the free and open exchange of large-format data.
Collapse
|
40
|
Abstract
Technological Omics breakthroughs, including next generation sequencing, bring avalanches of data which need to undergo effective data management to ensure integrity, security, and maximal knowledge-gleaning. Data management system requirements include flexible input formats, diverse data entry mechanisms and views, user friendliness, attention to standards, hardware and software platform definition, as well as robustness. Relevant solutions elaborated by the scientific community include Laboratory Information Management Systems (LIMS) and standardization protocols facilitating data sharing and managing. In project planning, special consideration has to be made when choosing relevant Omics annotation sources, since many of them overlap and require sophisticated integration heuristics. The data modeling step defines and categorizes the data into objects (e.g., genes, articles, disorders) and creates an application flow. A data storage/warehouse mechanism must be selected, such as file-based systems and relational databases, the latter typically used for larger projects. Omics project life cycle considerations must include the definition and deployment of new versions, incorporating either full or partial updates. Finally, quality assurance (QA) procedures must validate data and feature integrity, as well as system performance expectations. We illustrate these data management principles with examples from the life cycle of the GeneCards Omics project (http://www.genecards.org), a comprehensive, widely used compendium of annotative information about human genes. For example, the GeneCards infrastructure has recently been changed from text files to a relational database, enabling better organization and views of the growing data. Omics data handling benefits from the wealth of Web-based information, the vast amount of public domain software, increasingly affordable hardware, and effective use of data management and annotation principles as outlined in this chapter.
Collapse
Affiliation(s)
- Arye Harel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | | | | | | | | |
Collapse
|
41
|
Martens L, Chambers M, Sturm M, Kessner D, Levander F, Shofstahl J, Tang WH, Römpp A, Neumann S, Pizarro AD, Montecchi-Palazzi L, Tasman N, Coleman M, Reisinger F, Souda P, Hermjakob H, Binz PA, Deutsch EW. mzML--a community standard for mass spectrometry data. Mol Cell Proteomics 2011; 10:R110.000133. [PMID: 20716697 PMCID: PMC3013463 DOI: 10.1074/mcp.r110.000133] [Citation(s) in RCA: 451] [Impact Index Per Article: 34.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2010] [Revised: 07/26/2010] [Indexed: 12/27/2022] Open
Abstract
Mass spectrometry is a fundamental tool for discovery and analysis in the life sciences. With the rapid advances in mass spectrometry technology and methods, it has become imperative to provide a standard output format for mass spectrometry data that will facilitate data sharing and analysis. Initially, the efforts to develop a standard format for mass spectrometry data resulted in multiple formats, each designed with a different underlying philosophy. To resolve the issues associated with having multiple formats, vendors, researchers, and software developers convened under the banner of the HUPO PSI to develop a single standard. The new data format incorporated many of the desirable technical attributes from the previous data formats, while adding a number of improvements, including features such as a controlled vocabulary with validation tools to ensure consistent usage of the format, improved support for selected reaction monitoring data, and immediately available implementations to facilitate rapid adoption by the community. The resulting standard data format, mzML, is a well tested open-source format for mass spectrometer output files that can be readily utilized by the community and easily adapted for incremental advances in mass spectrometry technology.
Collapse
Affiliation(s)
- Lennart Martens
- From the ‡Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- §Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | | | - Marc Sturm
- ‖Eberhard Karls University, 72074, Tübingen, Germany
| | - Darren Kessner
- **University of Southern California, Los Angeles, CA, 90089, USA
| | - Fredrik Levander
- ‡‡Department of Immunotechnology and CREATE Health, Lund University, 22362, Lund, Sweden
| | - Jim Shofstahl
- §§Thermo Fisher Scientific, San Jose, CA, 95134, USA
| | | | | | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry, 06120 Halle, Germany
| | | | - Luisa Montecchi-Palazzi
- EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB101SD, UK
| | | | | | - Florian Reisinger
- EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB101SD, UK
| | - Puneet Souda
- University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Henning Hermjakob
- EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB101SD, UK
| | - Pierre-Alain Binz
- Geneva Bioinformatics (GeneBio) SA, 1206 Geneva, Switzerland and Swiss Institute of Bioinformatics, Geneva, Switzerland
| | | |
Collapse
|
42
|
Kawaji H, Severin J, Lizio M, Forrest ARR, van Nimwegen E, Rehli M, Schroder K, Irvine K, Suzuki H, Carninci P, Hayashizaki Y, Daub CO. Update of the FANTOM web resource: from mammalian transcriptional landscape to its dynamic regulation. Nucleic Acids Res 2010; 39:D856-60. [PMID: 21075797 PMCID: PMC3013704 DOI: 10.1093/nar/gkq1112] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
The international Functional Annotation Of the Mammalian Genomes 4 (FANTOM4) research collaboration set out to better understand the transcriptional network that regulates macrophage differentiation and to uncover novel components of the transcriptome employing a series of high-throughput experiments. The primary and unique technique is cap analysis of gene expression (CAGE), sequencing mRNA 5′-ends with a second-generation sequencer to quantify promoter activities even in the absence of gene annotation. Additional genome-wide experiments complement the setup including short RNA sequencing, microarray gene expression profiling on large-scale perturbation experiments and ChIP–chip for epigenetic marks and transcription factors. All the experiments are performed in a differentiation time course of the THP-1 human leukemic cell line. Furthermore, we performed a large-scale mammalian two-hybrid (M2H) assay between transcription factors and monitored their expression profile across human and mouse tissues with qRT-PCR to address combinatorial effects of regulation by transcription factors. These interdependent data have been analyzed individually and in combination with each other and are published in related but distinct papers. We provide all data together with systematic annotation in an integrated view as resource for the scientific community (http://fantom.gsc.riken.jp/4/). Additionally, we assembled a rich set of derived analysis results including published predicted and validated regulatory interactions. Here we introduce the resource and its update after the initial release.
Collapse
Affiliation(s)
- Hideya Kawaji
- RIKEN Omics Science Center, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa 230-0045, Japan.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Rocca-Serra P, Brandizi M, Maguire E, Sklyar N, Taylor C, Begley K, Field D, Harris S, Hide W, Hofmann O, Neumann S, Sterk P, Tong W, Sansone SA. ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level. Bioinformatics 2010; 26:2354-6. [PMID: 20679334 PMCID: PMC2935443 DOI: 10.1093/bioinformatics/btq415] [Citation(s) in RCA: 186] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2010] [Revised: 07/07/2010] [Accepted: 07/08/2010] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED The first open source software suite for experimentalists and curators that (i) assists in the annotation and local management of experimental metadata from high-throughput studies employing one or a combination of omics and other technologies; (ii) empowers users to uptake community-defined checklists and ontologies; and (iii) facilitates submission to international public repositories. AVAILABILITY AND IMPLEMENTATION Software, documentation, case studies and implementations at http://www.isa-tools.org.
Collapse
Affiliation(s)
- Philippe Rocca-Serra
- The European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
van Ommen B, Bouwman J, Dragsted LO, Drevon CA, Elliott R, de Groot P, Kaput J, Mathers JC, Müller M, Pepping F, Saito J, Scalbert A, Radonjic M, Rocca-Serra P, Travis A, Wopereis S, Evelo CT. Challenges of molecular nutrition research 6: the nutritional phenotype database to store, share and evaluate nutritional systems biology studies. GENES AND NUTRITION 2010; 5:189-203. [PMID: 21052526 PMCID: PMC2935528 DOI: 10.1007/s12263-010-0167-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2009] [Accepted: 01/03/2010] [Indexed: 11/25/2022]
Abstract
The challenge of modern nutrition and health research is to identify food-based strategies promoting life-long optimal health and well-being. This research is complex because it exploits a multitude of bioactive compounds acting on an extensive network of interacting processes. Whereas nutrition research can profit enormously from the revolution in ‘omics’ technologies, it has discipline-specific requirements for analytical and bioinformatic procedures. In addition to measurements of the parameters of interest (measures of health), extensive description of the subjects of study and foods or diets consumed is central for describing the nutritional phenotype. We propose and pursue an infrastructural activity of constructing the “Nutritional Phenotype database” (dbNP). When fully developed, dbNP will be a research and collaboration tool and a publicly available data and knowledge repository. Creation and implementation of the dbNP will maximize benefits to the research community by enabling integration and interrogation of data from multiple studies, from different research groups, different countries and different—omics levels. The dbNP is designed to facilitate storage of biologically relevant, pre-processed—omics data, as well as study descriptive and study participant phenotype data. It is also important to enable the combination of this information at different levels (e.g. to facilitate linkage of data describing participant phenotype, genotype and food intake with information on study design and—omics measurements, and to combine all of this with existing knowledge). The biological information stored in the database (i.e. genetics, transcriptomics, proteomics, biomarkers, metabolomics, functional assays, food intake and food composition) is tailored to nutrition research and embedded in an environment of standard procedures and protocols, annotations, modular data-basing, networking and integrated bioinformatics. The dbNP is an evolving enterprise, which is only sustainable if it is accepted and adopted by the wider nutrition and health research community as an open source, pre-competitive and publicly available resource where many partners both can contribute and profit from its developments. We introduce the Nutrigenomics Organisation (NuGO, http://www.nugo.org) as a membership association responsible for establishing and curating the dbNP. Within NuGO, all efforts related to dbNP (i.e. usage, coordination, integration, facilitation and maintenance) will be directed towards a sustainable and federated infrastructure.
Collapse
Affiliation(s)
- Ben van Ommen
- TNO Quality of Life, PO Box 360, 6700 AJ Zeist, The Netherlands
| | - Jildau Bouwman
- TNO Quality of Life, PO Box 360, 6700 AJ Zeist, The Netherlands
| | - Lars O. Dragsted
- Institute of Human Nutrition, University of Copenhagen, 30 Rolighedsvej, 1958 Frederiksberg C, Denmark
| | - Christian A. Drevon
- Department of Nutrition, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Ruan Elliott
- Institute of Food Research, Norwich Research Park, Norwich, Norfolk NR4 7UA UK
| | - Philip de Groot
- Nutrigenomics Consortium, TI Food and Nutrition, P.O. Box 557, 6700AN Wageningen, The Netherlands
- Division of Human Nutrition, Wageningen University, PO Box 8129, 6700 EV Wageningen, The Netherlands
| | - Jim Kaput
- Division of Personalized Nutrition and Medicine, Food and Drug Administration/National Center for Toxicological Research, Jefferson, AR USA
| | - John C. Mathers
- Human Nutrition Research Centre, Institute for Ageing and Health, Newcastle University, William Leech Building, Framlington Place, Newcastle, NE44 6HE UK
| | - Michael Müller
- Nutrigenomics Consortium, TI Food and Nutrition, P.O. Box 557, 6700AN Wageningen, The Netherlands
- Division of Human Nutrition, Wageningen University, PO Box 8129, 6700 EV Wageningen, The Netherlands
| | - Fre Pepping
- Division of Human Nutrition, Wageningen University, PO Box 8129, 6700 EV Wageningen, The Netherlands
| | - Jahn Saito
- Department of Bioinformatics (BiGCaT) and Department of Knowledge Engineering (DKE), Maastricht University, Maastricht, The Netherlands
| | - Augustin Scalbert
- INRA, UMR 1019, Unite´ de Nutrition Humaine, Centre de Recherche de Clermont-Ferrand/Theix, 63122 Saint-Genes-Champanelle, France
| | | | | | - Anthony Travis
- The Rowett Institute of Nutrition and Health, University of Aberdeen, Greenburn Road, Bucksburn Aberdeen, Scotland, AB21 9SB UK
| | - Suzan Wopereis
- TNO Quality of Life, PO Box 360, 6700 AJ Zeist, The Netherlands
| | - Chris T. Evelo
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
45
|
Field D, Friedberg I, Sterk P, Kottmann R, Glöckner FO, Hirschman L, Garrity GM, Cochrane G, Wooley J, Gilbert J. Meeting Report: "Metagenomics, Metadata and Meta-analysis" (M3) Special Interest Group at ISMB 2009. Stand Genomic Sci 2009; 1:278-82. [PMID: 21304668 PMCID: PMC3035241 DOI: 10.4056/sigs.641096] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
This report summarizes the proceedings of the “Metagenomics, Metadata and Meta-analysis” (M3) Special Interest Group (SIG) meeting held at the Intelligent Systems for Molecular Biology 2009 conference. The Genomic Standards Consortium (GSC) hosted this meeting to explore the bottlenecks and emerging solutions for obtaining biological insights through large-scale comparative analysis of metagenomic datasets. The M3 SIG included 16 talks, half of which were selected from submitted abstracts, a poster session and a panel discussion involving members of the GSC Board. This report summarizes this one-day SIG, attempts to identify shared themes and recapitulates community recommendations for the future of this field. The GSC will also host an M3 workshop at the Pacific Symposium on Biocomputing (PSB) in January 2010. Further information about the GSC and its range of activities can be found at http://gensc.org/.
Collapse
|
46
|
Scalbert A, Brennan L, Fiehn O, Hankemeier T, Kristal BS, van Ommen B, Pujos-Guillot E, Verheij E, Wishart D, Wopereis S. Mass-spectrometry-based metabolomics: limitations and recommendations for future progress with particular focus on nutrition research. Metabolomics 2009; 5:435-458. [PMID: 20046865 PMCID: PMC2794347 DOI: 10.1007/s11306-009-0168-0] [Citation(s) in RCA: 371] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2009] [Accepted: 05/26/2009] [Indexed: 12/14/2022]
Abstract
Mass spectrometry (MS) techniques, because of their sensitivity and selectivity, have become methods of choice to characterize the human metabolome and MS-based metabolomics is increasingly used to characterize the complex metabolic effects of nutrients or foods. However progress is still hampered by many unsolved problems and most notably the lack of well established and standardized methods or procedures, and the difficulties still met in the identification of the metabolites influenced by a given nutritional intervention. The purpose of this paper is to review the main obstacles limiting progress and to make recommendations to overcome them. Propositions are made to improve the mode of collection and preparation of biological samples, the coverage and quality of mass spectrometry analyses, the extraction and exploitation of the raw data, the identification of the metabolites and the biological interpretation of the results.
Collapse
Affiliation(s)
- Augustin Scalbert
- INRA, UMR 1019, Unité de Nutrition Humaine, Centre de Recherche de Clermont-Ferrand/Theix, 63122 Saint-Genes-Champanelle, France
| | - Lorraine Brennan
- UCD School of Agriculture Food Science and Veterinary Medicine, UCD Conway Institute, University College Dublin, Dublin, Ireland
| | - Oliver Fiehn
- Genome Center, University of California, Davis, Davis, CA 95616 USA
| | - Thomas Hankemeier
- Analytical Biosciences, Leiden/Amsterdam Center for Drug Research, Leiden University, Einsteinweg 55, 2333 CC Leiden, The Netherlands
| | - Bruce S. Kristal
- Department of Neurosurgery, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115 USA
- Department of Surgery, Harvard Medical School, Boston, MA 02115 USA
| | - Ben van Ommen
- TNO Quality of Life, PO Box 360, 3700 AJ Zeist, The Netherlands
| | - Estelle Pujos-Guillot
- INRA, UMR 1019, Unité de Nutrition Humaine, Centre de Recherche de Clermont-Ferrand/Theix, 63122 Saint-Genes-Champanelle, France
| | - Elwin Verheij
- TNO Quality of Life, PO Box 360, 3700 AJ Zeist, The Netherlands
| | - David Wishart
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8 Canada
| | - Suzan Wopereis
- TNO Quality of Life, PO Box 360, 3700 AJ Zeist, The Netherlands
| |
Collapse
|
47
|
Song YS, Lee HW, Park YR, Kim DK, Sim J, Kang HP, Kim JH. TMA-TAB: a spreadsheet-based document for exchange of tissue microarray data based on the tissue microarray-object model. J Biomed Inform 2009; 43:435-41. [PMID: 19835983 DOI: 10.1016/j.jbi.2009.10.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2009] [Revised: 10/05/2009] [Accepted: 10/07/2009] [Indexed: 10/20/2022]
Abstract
The importance of tissue microarrays (TMA) as clinical validation tools for cDNA microarray results is increasing, whereas researchers are still suffering from TMA data management issues. After we developed a comprehensive data model for TMA data storage, exchange and analysis, TMA-OM, we focused our attention on the development of a user-friendly exchange format with high expressivity in order to promote data communication of TMA results and TMA-OM supportive database applications. We developed TMA-TAB, a spreadsheet-based data format for TMA data submission to the TMA-OM supportive TMA database system. TMA-TAB was developed by simplifying, modifying and reorganizing classes, attributes and templates of TMA-OM into five entities: experiment, block, slide, core_in_block, and core_in_slide. Five tab-delimited formats (investigation design format, block description format, slide description format, core clinicohistopathological data format, and core result data format) were made, each representing the entities of experiment, block, slide, core_in_block, and core_in_slide. We implemented TMA-TAB import and export modules on Xperanto-TMA, a TMA-OM supportive database application, to facilitate data submission. Development and implementation of TMA-TAB and TMA-OM provide a strong infrastructure for powerful and user-friendly TMA data management.
Collapse
Affiliation(s)
- Young Soo Song
- Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul, Republic of Korea
| | | | | | | | | | | | | |
Collapse
|
48
|
Jones AR, Lister AL, Hermida L, Wilkinson P, Eisenacher M, Belhajjame K, Gibson F, Lord P, Pocock M, Rosenfelder H, Santoyo-Lopez J, Wipat A, Paton NW. Modeling and managing experimental data using FuGE. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2009; 13:239-51. [PMID: 19441879 DOI: 10.1089/omi.2008.0080] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The Functional Genomics Experiment data model (FuGE) has been developed to increase the consistency and efficiency of experimental data modeling in the life sciences, and it has been adopted by a number of high-profile standardization organizations. FuGE can be used: (1) directly, whereby generic modeling constructs are used to represent concepts from specific experimental activities; or (2) as a framework within which method-specific models can be developed. FuGE is both rich and flexible, providing a considerable number of modeling constructs, which can be used in a range of different ways. However, such richness and flexibility also mean that modelers and application developers have choices to make when applying FuGE in a given context. This paper captures emerging best practice in the use of FuGE in the light of the experience of several groups by: (1) proposing guidelines for the use and extension of the FuGE data model; (2) presenting design patterns that reflect recurring requirements in experimental data modeling; and (3) describing a community software tool kit (STK) that supports application development using FuGE. We anticipate that these guidelines will encourage consistent usage of FuGE, and as such, will contribute to the development of convergent data standards in omics research.
Collapse
Affiliation(s)
- Andrew R Jones
- Department of Preclinical Veterinary Science, Faculty of Veterinary Science, University of Liverpool, Liverpool, United Kingdom.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Field D, Sansone SA, Collis A, Booth T, Dukes P, Gregurick SK, Kennedy K, Kolar P, Kolker E, Maxon M, Millard S, Mugabushaka AM, Perrin N, Remacle JE, Remington K, Rocca-Serra P, Taylor CF, Thorley M, Tiwari B, Wilbanks J. Megascience. 'Omics data sharing. Science 2009; 326:234-6. [PMID: 19815759 PMCID: PMC2770171 DOI: 10.1126/science.1180598] [Citation(s) in RCA: 93] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Data sharing, and the good annotation practices it depends on, must become part of the fabric of daily research for researchers and funders.
Collapse
Affiliation(s)
- Dawn Field
- U.K. Natural Environment Research Council (NERC), Environmental Bioinformatics Centre, NERC Centre for Ecology and Hydrology, Oxford, OX1 3SR, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Krestyaninova M, Zarins A, Viksna J, Kurbatova N, Rucevskis P, Neogi SG, Gostev M, Perheentupa T, Knuuttila J, Barrett A, Lappalainen I, Rung J, Podnieks K, Sarkans U, McCarthy MI, Brazma A. A System for Information Management in BioMedical Studies--SIMBioMS. Bioinformatics 2009; 25:2768-9. [PMID: 19633095 PMCID: PMC2759553 DOI: 10.1093/bioinformatics/btp420] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Summary: SIMBioMS is a web-based open source software system for managing data and information in biomedical studies. It provides a solution for the collection, storage, management and retrieval of information about research subjects and biomedical samples, as well as experimental data obtained using a range of high-throughput technologies, including gene expression, genotyping, proteomics and metabonomics. The system can easily be customized and has proven to be successful in several large-scale multi-site collaborative projects. It is compatible with emerging functional genomics data standards and provides data import and export in accepted standard formats. Protocols for transferring data to durable archives at the European Bioinformatics Institute have been implemented. Availability: The source code, documentation and initialization scripts are available at http://simbioms.org. Contact:support@simbioms.org; mariak@ebi.ac.uk
Collapse
Affiliation(s)
- Maria Krestyaninova
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB101SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|