1
|
Nault R, Cave MC, Ludewig G, Moseley HN, Pennell KG, Zacharewski T. A Case for Accelerating Standards to Achieve the FAIR Principles of Environmental Health Research Experimental Data. ENVIRONMENTAL HEALTH PERSPECTIVES 2023; 131:65001. [PMID: 37352010 PMCID: PMC10289218 DOI: 10.1289/ehp11484] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 06/05/2023] [Accepted: 06/07/2023] [Indexed: 06/25/2023]
Abstract
BACKGROUND Funding agencies, publishers, and other stakeholders are pushing environmental health science investigators to improve data sharing; to promote the findable, accessible, interoperable, and reusable (FAIR) principles; and to increase the rigor and reproducibility of the data collected. Accomplishing these goals will require significant cultural shifts surrounding data management and strategies to develop robust and reliable resources that bridge the technical challenges and gaps in expertise. OBJECTIVE In this commentary, we examine the current state of managing data and metadata-referred to collectively as (meta)data-in the experimental environmental health sciences. We introduce new tools and resources based on in vivo experiments to serve as examples for the broader field. METHODS We discuss previous and ongoing efforts to improve (meta)data collection and curation. These include global efforts by the Functional Genomics Data Society to develop metadata collection tools such as the Investigation, Study, Assay (ISA) framework, and the Center for Expanded Data Annotation and Retrieval. We also conduct a case study of in vivo data deposited in the Gene Expression Omnibus that demonstrates the current state of in vivo environmental health data and highlights the value of using the tools we propose to support data deposition. DISCUSSION The environmental health science community has played a key role in efforts to achieve the goals of the FAIR guiding principles and is well positioned to advance them further. We present a proposed framework to further promote these objectives and minimize the obstacles between data producers and data scientists to maximize the return on research investments. https://doi.org/10.1289/EHP11484.
Collapse
Affiliation(s)
- Rance Nault
- Biochemistry & Molecular Biology Department, Institute for Integrative Toxicology, Michigan State University, East Lansing, Michigan, USA
| | - Matthew C. Cave
- Division of Gastroenterology, Hepatology, and Nutrition, University of Louisville, Louisville, Kentucky, USA
| | - Gabriele Ludewig
- Department of Occupational and Environmental Health, University of Iowa, Iowa City, Iowa, USA
| | - Hunter N.B. Moseley
- Molecular and Cellular Biochemistry Department, University of Kentucky, Lexington, Kentucky, USA
| | - Kelly G. Pennell
- Department of Civil Engineering, University of Kentucky, Lexington, Kentucky, USA
| | - Tim Zacharewski
- Biochemistry & Molecular Biology Department, Institute for Integrative Toxicology, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
2
|
Bunaciu AA, Fleschin S, Aboul-Enein HY. Determination of some edible oils adulteration with paraffin oil using infrared spectroscopy. PHARMACIA 2022. [DOI: 10.3897/pharmacia.69.e76175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The spectroscopy of molecular vibrations using mid-infrared or near-infrared techniques was used more and more to characterize different compounds, including edible oil, in order to monitor any changes and to detect fraudulent modifications. This article presents a new method for quantitative adulteration of extra virgin olive oil (EVOO) or corn germ oil (CGO) with a mineral oil, such as paraffin oil (PO). A Fourier transform infrared (FT-IR) spectrometric method, using ATR spectra, was developed for the rapid, direct measurement of edible oils adulteration. The results indicate the efficiency of the proposed method for the detection of paraffin oil in adulteration of EVOO and CGO with RSD (< 3.0%).
Graphical abstract:
Collapse
|
3
|
Drobne D. Adding Toxicological Context to Nanotoxicity Study Reporting Using the NanoTox Metadata List. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2021; 17:e2005622. [PMID: 33605049 DOI: 10.1002/smll.202005622] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 11/08/2020] [Indexed: 06/12/2023]
Abstract
This paper proposes a list of specifications (NanoTox metadata list) to be reported about nanotoxicity experiments (metadata) together with resultant data to add toxicological context to reported studies. In areas involving nanomaterials (NMs), existing metadata reporting standards include the reporting of experimental conditions and protocols (MIRIBEL) and material characteristics (MINChar and MIAN), as well as reporting focused on specific experiments (MINBE). NanoCRED is a similarly transparent and structured framework, however, it is developed to guide risk assessors in evaluating the reliability and relevance of NM ecotoxicity studies. There is no reporting standard which would include interpretation of the aims and outcomes of nanotoxicity studies beyond regulatory purposes. The proposed NanoTox metadata reporting checklist is elaborated to extend reporting toward describing nanotoxicological context and thus is a logical complement to technology/material-assay focused reporting checklists. It is further designed to allow for NM toxicity data and knowledge integration, reuse, and communication. Its ultimate goal is to adhere to the basic rules of toxicology when taking a stand on the toxicity of NMs and to limit speculations on safety. As nanotoxicology becomes more interdisciplinary with the advent of new tools and new materials to be tested, reporting standards will contribute to cross-disciplinary communication.
Collapse
Affiliation(s)
- Damjana Drobne
- Department of Biology, Biotechnical Faculty, University of Ljubljana, Večna pot 111, Ljubljana, 1000, Slovenia
| |
Collapse
|
4
|
Bukhari SAC, Martínez-Romero M, O' Connor MJ, Egyedi AL, Willrett D, Graybeal J, Musen MA, Cheung KH, Kleinstein SH. CEDAR OnDemand: a browser extension to generate ontology-based scientific metadata. BMC Bioinformatics 2018; 19:268. [PMID: 30012108 PMCID: PMC6048706 DOI: 10.1186/s12859-018-2247-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Accepted: 06/14/2018] [Indexed: 12/17/2022] Open
Abstract
Background Public biomedical data repositories often provide web-based interfaces to collect experimental metadata. However, these interfaces typically reflect the ad hoc metadata specification practices of the associated repositories, leading to a lack of standardization in the collected metadata. This lack of standardization limits the ability of the source datasets to be broadly discovered, reused, and integrated with other datasets. To increase reuse, discoverability, and reproducibility of the described experiments, datasets should be appropriately annotated by using agreed-upon terms, ideally from ontologies or other controlled term sources. Results This work presents “CEDAR OnDemand”, a browser extension powered by the NCBO (National Center for Biomedical Ontology) BioPortal that enables users to seamlessly enter ontology-based metadata through existing web forms native to individual repositories. CEDAR OnDemand analyzes the web page contents to identify the text input fields and associate them with relevant ontologies which are recommended automatically based upon input fields’ labels (using the NCBO ontology recommender) and a pre-defined list of ontologies. These field-specific ontologies are used for controlling metadata entry. CEDAR OnDemand works for any web form designed in the HTML format. We demonstrate how CEDAR OnDemand works through the NCBI (National Center for Biotechnology Information) BioSample web-based metadata entry. Conclusion CEDAR OnDemand helps lower the barrier of incorporating ontologies into standardized metadata entry for public data repositories. CEDAR OnDemand is available freely on the Google Chrome store https://chrome.google.com/webstore/search/CEDAROnDemand
Collapse
Affiliation(s)
| | - Marcos Martínez-Romero
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| | - Martin J O' Connor
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| | - Attila L Egyedi
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| | - Debra Willrett
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| | - John Graybeal
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| | - Mark A Musen
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| | - Kei-Hoi Cheung
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA. .,Department of Emergency Medicine and Yale Center for Medical Informatics, Yale University School of Medicine, New Haven, CT, USA.
| | - Steven H Kleinstein
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA. .,Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
| |
Collapse
|
5
|
Van Eyk JE, Corrales FJ, Aebersold R, Cerciello F, Deutsch EW, Roncada P, Sanchez JC, Yamamoto T, Yang P, Zhang H, Omenn GS. Highlights of the Biology and Disease-driven Human Proteome Project, 2015-2016. J Proteome Res 2016; 15:3979-3987. [PMID: 27573249 DOI: 10.1021/acs.jproteome.6b00444] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The Biology and Disease-driven Human Proteome Project (B/D-HPP) is aimed at supporting and enhancing the broad use of state-of-the-art proteomic methods to characterize and quantify proteins for in-depth understanding of the molecular mechanisms of biological processes and human disease. Based on a foundation of the pre-existing HUPO initiatives begun in 2002, the B/D-HPP is designed to provide standardized methods and resources for mass spectrometry and specific protein affinity reagents and facilitate accessibility of these resources to the broader life sciences research and clinical communities. Currently there are 22 B/D-HPP initiatives and 3 closely related HPP resource pillars. The B/D-HPP groups are working to define sets of protein targets that are highly relevant to each particular field to deliver relevant assays for the measurement of these selected targets and to disseminate and make publicly accessible the information and tools generated. Major developments are the 2016 publications of the Human SRM Atlas and of "popular protein sets" for six organ systems. Here we present the current activities and plans of the BD-HPP initiatives as highlighted in numerous B/D-HPP workshops at the 14th annual HUPO 2015 World Congress of Proteomics in Vancouver, Canada.
Collapse
Affiliation(s)
- Jennifer E Van Eyk
- Advanced Clinical BioSystems Research Institute, Department of Medicine, Cedars-Sinai Medical Centre , Los Angeles, California 90038, United States
| | - Fernando J Corrales
- Department of Hepatology, Proteomics Laboratory, CIMA, University of Navarra; Ciberhed; PRB2, ProteoRed-ISCIII, 31008 Pamplona, Spain
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich , 8093 Zürich, Switzerland
| | - Ferdinando Cerciello
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich , 8093 Zürich, Switzerland
| | - Eric W Deutsch
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - Paola Roncada
- Istituto Sperimentale Italiano L. Spallanzani , 20133 Milano, Italy
| | - Jean-Charles Sanchez
- Centre Medicale Universitaire , Human Protein Sciences Department, CH-1211 Geneva, Switzerland
| | - Tadashi Yamamoto
- Niigata University , Department of Structural Pathology, Institute of Nephrology, Medical and Dental School, Asachimachi-dori Niigata 951-8510, Japan
| | - Pengyuan Yang
- Fudan University , Department of Chemistry, Shanghai 200433, P.R. China
| | - Hui Zhang
- Johns Hopkins University , Department of Pathology, Baltimore, Maryland 21287, United States
| | - Gilbert S Omenn
- Center for Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, Michigan 48109, United States
| |
Collapse
|
6
|
Kumuthini J, Mbiyavanga M, Chimusa ER, Pathak J, Somervuo P, Van Schaik RH, Dolzan V, Mizzi C, Kalideen K, Ramesar RS, Macek M, Patrinos GP, Squassina A. Minimum information required for a DMET experiment reporting. Pharmacogenomics 2016; 17:1533-45. [PMID: 27548815 DOI: 10.2217/pgs-2016-0015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
AIM To provide pharmacogenomics reporting guidelines, the information and tools required for reporting to public omic databases. MATERIAL & METHODS For effective DMET data interpretation, sharing, interoperability, reproducibility and reporting, we propose the Minimum Information required for a DMET Experiment (MIDE) reporting. RESULTS MIDE provides reporting guidelines and describes the information required for reporting, data storage and data sharing in the form of XML. CONCLUSION The MIDE guidelines will benefit the scientific community with pharmacogenomics experiments, including reporting pharmacogenomics data from other technology platforms, with the tools that will ease and automate the generation of such reports using the standardized MIDE XML schema, facilitating the sharing, dissemination, reanalysis of datasets through accessible and transparent pharmacogenomics data reporting.
Collapse
Affiliation(s)
- Judit Kumuthini
- Centre for Proteomic & Genomic Research, Cape Town, South Africa
| | | | - Emile R Chimusa
- Centre for Proteomic & Genomic Research, Cape Town, South Africa.,Computational Biology Group, Institute for Infectious Diseases & Molecular Medicine, University of Cape Town, South Africa
| | - Jyotishman Pathak
- Division of Biomedical Statistics & Informatics, Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA
| | - Panu Somervuo
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Ron Hn Van Schaik
- Department of Clinical Chemistry, Erasmus University Medical Center Rotterdam, Room Na-415, Wytemaweg 80, 3015CN Rotterdam, The Netherlands
| | - Vita Dolzan
- Pharmacogenetics Laboratory, Institute of Biochemistry, Faculty of Medicine, University of Ljubljana, Vrazov trg 2, SI-1000 Ljubljana, Slovenia
| | - Clint Mizzi
- Department of Bioinformatics, Faculty of Medicine & Health Sciences, Erasmus University Medical Center, Rotterdam, The Netherlands.,Department of Physiology & Biochemistry, Faculty of Medicine and Surgery, University of Malta, Malta
| | - Kusha Kalideen
- UCT/SA MRC Human Genetics Research Unit, Division of Human Genetics, Institute for Infectious Diseases & Molecular Medicine, Division of Human Genetics, University of Cape Town, South Africa
| | - Raj S Ramesar
- UCT/SA MRC Human Genetics Research Unit, Division of Human Genetics, Institute for Infectious Diseases & Molecular Medicine, Division of Human Genetics, University of Cape Town, South Africa
| | - Milan Macek
- Department of Biology & Medical Genetics, Charles University Prague & 2nd Faculty of Medicine, Prague, Czechia
| | - George P Patrinos
- Department of Bioinformatics, Faculty of Medicine & Health Sciences, Erasmus University Medical Center, Rotterdam, The Netherlands.,Department of Pharmacy, School of Health Sciences, University of Patras, Patras, Greece
| | - Alessio Squassina
- Laboratory of Pharmacogenomics, Section of Neuroscience & Clinical Pharmacology, Department of Biomedical Sciences, University of Cagliari, sp 8 Sestu-Monserrato, Km 0.700, 09042 Cagliari, Italy
| |
Collapse
|
7
|
Danezis GP, Tsagkaris AS, Brusic V, Georgiou CA. Food authentication: state of the art and prospects. Curr Opin Food Sci 2016. [DOI: 10.1016/j.cofs.2016.07.003] [Citation(s) in RCA: 98] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
8
|
Marchese Robinson RL, Lynch I, Peijnenburg W, Rumble J, Klaessig F, Marquardt C, Rauscher H, Puzyn T, Purian R, Åberg C, Karcher S, Vriens H, Hoet P, Hoover MD, Hendren CO, Harper SL. How should the completeness and quality of curated nanomaterial data be evaluated? NANOSCALE 2016; 8:9919-43. [PMID: 27143028 PMCID: PMC4899944 DOI: 10.1039/c5nr08944a] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict nanomaterials' behaviour. This supports innovation in, and regulation of, nanotechnology. It is commonly understood that curated data need to be sufficiently complete and of sufficient quality to serve their intended purpose. However, assessing data completeness and quality is non-trivial in general and is arguably especially difficult in the nanoscience area, given its highly multidisciplinary nature. The current article, part of the Nanomaterial Data Curation Initiative series, addresses how to assess the completeness and quality of (curated) nanomaterial data. In order to address this key challenge, a variety of related issues are discussed: the meaning and importance of data completeness and quality, existing approaches to their assessment and the key challenges associated with evaluating the completeness and quality of curated nanomaterial data. Considerations which are specific to the nanoscience area and lessons which can be learned from other relevant scientific disciplines are considered. Hence, the scope of this discussion ranges from physicochemical characterisation requirements for nanomaterials and interference of nanomaterials with nanotoxicology assays to broader issues such as minimum information checklists, toxicology data quality schemes and computational approaches that facilitate evaluation of the completeness and quality of (curated) data. This discussion is informed by a literature review and a survey of key nanomaterial data curation stakeholders. Finally, drawing upon this discussion, recommendations are presented concerning the central question: how should the completeness and quality of curated nanomaterial data be evaluated?
Collapse
Affiliation(s)
- Richard L. Marchese Robinson
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool, L3 3AF, United Kingdom
| | - Iseult Lynch
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, B15 2TT Birmingham, United Kingdom
| | - Willie Peijnenburg
- National Institute of Public Health and the Environment (RIVM), Bilthoven, The Netherlands
- Institute of Environmental Sciences, Leiden University, Leiden, The Netherlands
| | - John Rumble
- R&R Data Services, 11 Montgomery Avenue, Gaithersburg MD 20877 USA
| | - Fred Klaessig
- Pennsylvania Bio Nano Systems LLC, 3805 Old Easton Road, Doylestown, PA 18902
| | - Clarissa Marquardt
- Institute of Applied Computer Sciences (IAI), Karlsruhe Institute of Technology (KIT), Hermann v. Helmholtz Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Hubert Rauscher
- European Commission, Joint Research Centre, Institute for Health and Consumer Protection, Via Fermi 2749, 21027 Ispra (VA), Italy
| | - Tomasz Puzyn
- Laboratory of Environmental Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Ronit Purian
- Faculty of Engineering, Tel Aviv University, Tel Aviv 69978 Israel
| | - Christoffer Åberg
- Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
| | - Sandra Karcher
- Civil and Environmental Engineering, Carnegie Mellon University, Pittsburgh, PA 15213-3890
| | - Hanne Vriens
- Department of Public Health and Primary Care, K.U.Leuven, Faculty of Medicine, Unit Environment & Health – Toxicology, Herestraat 49 (O&N 706), Leuven, Belgium
| | - Peter Hoet
- Department of Public Health and Primary Care, K.U.Leuven, Faculty of Medicine, Unit Environment & Health – Toxicology, Herestraat 49 (O&N 706), Leuven, Belgium
| | - Mark D. Hoover
- National Institute for Occupational Safety and Health, 1095 Willowdale Road, Morgantown, WV 26505-2888
| | - Christine Ogilvie Hendren
- Center for the Environmental Implications of NanoTechnology, Duke University, PO Box 90287 121 Hudson Hall, Durham NC 27708
| | - Stacey L. Harper
- Department of Environmental and Molecular Toxicology, School of Chemical, Biological and Environmental Engineering, Oregon State University, 1007 ALS, Corvallis, OR 97331
| |
Collapse
|
9
|
Mack SJ, Milius RP, Gifford BD, Sauter J, Hofmann J, Osoegawa K, Robinson J, Groeneweg M, Turenchalk GS, Adai A, Holcomb C, Rozemuller EH, Penning MT, Heuer ML, Wang C, Salit ML, Schmidt AH, Parham PR, Müller C, Hague T, Fischer G, Fernandez-Viňa M, Hollenbach JA, Norman PJ, Maiers M. Minimum information for reporting next generation sequence genotyping (MIRING): Guidelines for reporting HLA and KIR genotyping via next generation sequencing. Hum Immunol 2015; 76:954-62. [PMID: 26407912 DOI: 10.1016/j.humimm.2015.09.011] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Revised: 08/30/2015] [Accepted: 09/22/2015] [Indexed: 11/27/2022]
Abstract
The development of next-generation sequencing (NGS) technologies for HLA and KIR genotyping is rapidly advancing knowledge of genetic variation of these highly polymorphic loci. NGS genotyping is poised to replace older methods for clinical use, but standard methods for reporting and exchanging these new, high quality genotype data are needed. The Immunogenomic NGS Consortium, a broad collaboration of histocompatibility and immunogenetics clinicians, researchers, instrument manufacturers and software developers, has developed the Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines. MIRING is a checklist that specifies the content of NGS genotyping results as well as a set of messaging guidelines for reporting the results. A MIRING message includes five categories of structured information - message annotation, reference context, full genotype, consensus sequence and novel polymorphism - and references to three categories of accessory information - NGS platform documentation, read processing documentation and primary data. These eight categories of information ensure the long-term portability and broad application of this NGS data for all current histocompatibility and immunogenetics use cases. In addition, MIRING can be extended to allow the reporting of genotype data generated using pre-NGS technologies. Because genotyping results reported using MIRING are easily updated in accordance with reference and nomenclature databases, MIRING represents a bold departure from previous methods of reporting HLA and KIR genotyping results, which have provided static and less-portable data. More information about MIRING can be found online at miring.immunogenomics.org.
Collapse
Affiliation(s)
- Steven J Mack
- Children's Hospital Oakland Research Institute, Oakland, CA, USA.
| | | | | | - Jürgen Sauter
- DKMS German Bone Marrow Donor Center, Tübingen, Germany
| | - Jan Hofmann
- DKMS German Bone Marrow Donor Center, Tübingen, Germany
| | | | - James Robinson
- Anthony Nolan Research Institute, Royal Free Hospital, London, UK; University College London Cancer Institute, University College London, London, UK
| | | | | | - Alex Adai
- Bioinformatics, Roche Sequencing, Pleasanton, CA, USA
| | | | | | | | | | - Chunlin Wang
- Stanford Genome Technology Center, Stanford University, Stanford, CA, USA
| | - Marc L Salit
- National Institute of Standards and Technology, Stanford, CA, USA; Department of Bioengineering, Stanford University, Stanford, CA, USA
| | | | - Peter R Parham
- Department of Structural Biology, Stanford University, Stanford, CA, USA
| | | | | | | | | | - Jill A Hollenbach
- Department of Neurology, University of California, San Francisco, CA, USA
| | - Paul J Norman
- Department of Structural Biology, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
10
|
González-Beltrán A, Li P, Zhao J, Avila-Garcia MS, Roos M, Thompson M, van der Horst E, Kaliyaperumal R, Luo R, Lee TL, Lam TW, Edmunds SC, Sansone SA, Rocca-Serra P. From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics. PLoS One 2015; 10:e0127612. [PMID: 26154165 PMCID: PMC4495984 DOI: 10.1371/journal.pone.0127612] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 04/16/2015] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler. RESULTS Executable workflows were developed using Galaxy, which reproduced results that were consistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP, and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables, and findings. The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata. AVAILABILITY SOAPdenovo2 scripts, data, and results are available through the GigaScience Database: http://dx.doi.org/10.5524/100044; the workflows are available from GigaGalaxy: http://galaxy.cbiit.cuhk.edu.hk; and the representations using the ISA, NP, and RO models are available through the SOAPdenovo2 case study website http://isa-tools.github.io/soapdenovo2/. CONTACT philippe.rocca-serra@oerc.ox.ac.uk and susanna-assunta.sansone@oerc.ox.ac.uk.
Collapse
Affiliation(s)
| | - Peter Li
- GigaScience, BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong, People’s Republic of China
| | - Jun Zhao
- InfoLab21, Lancaster University, Bailrigg, Lancaster, LA1 4WA, United Kingdom
| | - Maria Susana Avila-Garcia
- Nuffield Department of Medicine, Experimental Medicine Division, John Radcliffe Hospital, Headley Way, Headington, Oxford, OX3 9DU, United Kingdom
| | - Marco Roos
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Mark Thompson
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Eelke van der Horst
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Rajaram Kaliyaperumal
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Ruibang Luo
- HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong, People’s Republic of China
| | - Tin-Lap Lee
- School of Biomedical Sciences and CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, Hong Kong, People’s Republic of China
| | - Tak-wah Lam
- HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong, People’s Republic of China
| | - Scott C. Edmunds
- GigaScience, BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong, People’s Republic of China
| | | | - Philippe Rocca-Serra
- Oxford e-Research Centre, University of Oxford, 7 Keble Road, OX1 3QG, United Kingdom
| |
Collapse
|
11
|
González-Beltrán A, Maguire E, Sansone SA, Rocca-Serra P. linkedISA: semantic representation of ISA-Tab experimental metadata. BMC Bioinformatics 2014; 15 Suppl 14:S4. [PMID: 25472428 PMCID: PMC4255742 DOI: 10.1186/1471-2105-15-s14-s4] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Reporting and sharing experimental metadata- such as the experimental design, characteristics of the samples, and procedures applied, along with the analysis results, in a standardised manner ensures that datasets are comprehensible and, in principle, reproducible, comparable and reusable. Furthermore, sharing datasets in formats designed for consumption by humans and machines will also maximize their use. The Investigation/Study/Assay (ISA) open source metadata tracking framework facilitates standards-compliant collection, curation, visualization, storage and sharing of datasets, leveraging on other platforms to enable analysis and publication. The ISA software suite includes several components used in increasingly diverse set of life science and biomedical domains; it is underpinned by a general-purpose format, ISA-Tab, and conversions exist into formats required by public repositories. While ISA-Tab works well mainly as a human readable format, we have also implemented a linked data approach to semantically define the ISA-Tab syntax. RESULTS We present a semantic web representation of the ISA-Tab syntax that complements ISA-Tab's syntactic interoperability with semantic interoperability. We introduce the linkedISA conversion tool from ISA-Tab to the Resource Description Framework (RDF), supporting mappings from the ISA syntax to multiple community-defined, open ontologies and capitalising on user-provided ontology annotations in the experimental metadata. We describe insights of the implementation and how annotations can be expanded driven by the metadata. We applied the conversion tool as part of Bio-GraphIIn, a web-based application supporting integration of the semantically-rich experimental descriptions. Designed in a user-friendly manner, the Bio-GraphIIn interface hides most of the complexities to the users, exposing a familiar tabular view of the experimental description to allow seamless interaction with the RDF representation, and visualising descriptors to drive the query over the semantic representation of the experimental design. In addition, we defined queries over the linkedISA RDF representation and demonstrated its use over the linkedISA conversion of datasets from Nature' Scientific Data online publication. CONCLUSIONS Our linked data approach has allowed us to: 1) make the ISA-Tab semantics explicit and machine-processable, 2) exploit the existing ontology-based annotations in the ISA-Tab experimental descriptions, 3) augment the ISA-Tab syntax with new descriptive elements, 4) visualise and query elements related to the experimental design. Reasoning over ISA-Tab metadata and associated data will facilitate data integration and knowledge discovery.
Collapse
Affiliation(s)
| | - Eamonn Maguire
- Oxford e-Research Centre, University of Oxford, Oxford, OX1 3QG, UK
| | | | | |
Collapse
|
12
|
Lebendiker M, Danieli T, de Marco A. The Trip Adviser guide to the protein science world: a proposal to improve the awareness concerning the quality of recombinant proteins. BMC Res Notes 2014; 7:585. [PMID: 25178166 PMCID: PMC4161829 DOI: 10.1186/1756-0500-7-585] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Accepted: 07/31/2014] [Indexed: 11/24/2022] Open
Abstract
In many research articles, where protein purification is required for various assays, (protein-protein interactions, activity assays, etc.), we always have access to the final results, but seldom have access to the raw data required for an accurate evaluation of the protein quality. This data is extremely important on one hand to critically evaluate the quality of the proteins used in the described research and, on the other hand, to allow other laboratories to safely use the described procedure in a reproducible manner. We herby propose to include a standardized methodology that can easily be incorporated in research papers. Moreover, this methodology can be utilized as a “quality control” ladder, where the more information given, will lead to a higher ranking of the article. This “quality control” stamp will allow researchers retrieving relevant and useful materials and methods in the field of protein research.
Collapse
Affiliation(s)
| | | | - Ario de Marco
- Department of Biomedical Sciences and Engineering, University of Nova Gorica, Glavni Trg 9, SI-5261 Vipava, Slovenia.
| |
Collapse
|
13
|
Kolker E, Özdemir V, Martens L, Hancock W, Anderson G, Anderson N, Aynacioglu S, Baranova A, Campagna SR, Chen R, Choiniere J, Dearth SP, Feng WC, Ferguson L, Fox G, Frishman D, Grossman R, Heath A, Higdon R, Hutz MH, Janko I, Jiang L, Joshi S, Kel A, Kemnitz JW, Kohane IS, Kolker N, Lancet D, Lee E, Li W, Lisitsa A, Llerena A, MacNealy-Koch C, Marshall JC, Masuzzo P, May A, Mias G, Monroe M, Montague E, Mooney S, Nesvizhskii A, Noronha S, Omenn G, Rajasimha H, Ramamoorthy P, Sheehan J, Smarr L, Smith CV, Smith T, Snyder M, Rapole S, Srivastava S, Stanberry L, Stewart E, Toppo S, Uetz P, Verheggen K, Voy BH, Warnich L, Wilhelm SW, Yandl G. Toward more transparent and reproducible omics studies through a common metadata checklist and data publications. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2014; 18:10-4. [PMID: 24456465 PMCID: PMC3903324 DOI: 10.1089/omi.2013.0149] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.
Collapse
Affiliation(s)
- Eugene Kolker
- Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Vural Özdemir
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Office of the President, Gaziantep University, International Affairs and Global Development Strategy
- Faculty of Communications, Universite Bulvarı, Kilis Yolu, Turkey
| | - Lennart Martens
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Medical Protein Research, Vlaams Instituut voor Biotechnologie, Ghent, Belgium
- Department of Biochemistry, Ghent University; Ghent, Belgium
| | - William Hancock
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Chemistry, Barnett Institute, Northeastern University, Boston, Massachusetts
| | - Gordon Anderson
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Fundamental and Computational Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington
| | - Nathaniel Anderson
- Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Sukru Aynacioglu
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Pharmacology, Gaziantep University, Gaziantep, Turkey
| | - Ancha Baranova
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- School of Systems Biology, George Mason University, Manassas, Virginia
| | - Shawn R. Campagna
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Chemistry, University of Tennessee Knoxville, Knoxville, Tennessee
| | - Rui Chen
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Genetics, Stanford University, Stanford, California
| | - John Choiniere
- Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Stephen P. Dearth
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Chemistry, University of Tennessee Knoxville, Knoxville, Tennessee
| | - Wu-Chun Feng
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia
- Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, Virginia
- Department of SyNeRGy Laboratory, Virginia Tech, Blacksburg, Virginia
| | - Lynnette Ferguson
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Nutrition, Auckland Cancer Society Research Centre, University of Auckland, Auckland, New Zealand
| | - Geoffrey Fox
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- School of Informatics and Computing, Indiana University, Bloomington, Indiana
| | - Dmitrij Frishman
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Technische Universitat Munchen, Wissenshaftzentrum Weihenstephan, Freising, Germany
| | - Robert Grossman
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois
- Department of Medicine, University of Chicago, Chicago, Illinois
| | - Allison Heath
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois
- Knapp Center for Biomedical Discovery, University of Chicago, Chicago, Illinois
| | - Roger Higdon
- Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Mara H. Hutz
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Departamento de Genetica, Instituto de Biociencias, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
| | - Imre Janko
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
| | - Lihua Jiang
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Genetics, Stanford University, Stanford, California
| | - Sanjay Joshi
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Life Sciences, EMC, Hopkinton, Massachusetts
| | - Alexander Kel
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- GeneXplain GmbH, Wolfenbüttel, Germany
| | - Joseph W. Kemnitz
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, Wisconsin
| | - Isaac S. Kohane
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Pediatrics and Health Sciences Technology, Children's Hospital and Harvard Medical School, Boston, Massachusetts
- HMS Center for Biomedical Informatics, Countway Library of Medicine, Boston, Massachusetts
| | - Natali Kolker
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
| | - Doron Lancet
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Molecular Genetics, Crown Human Genome Center, Weizmann Institute of Science, Rehovot, Israel
| | - Elaine Lee
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
| | - Weizhong Li
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, California
| | - Andrey Lisitsa
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Russian Human Proteome Organization (RHUPO), Moscow, Russia
- Institute of Biomedical Chemistry, Moscow, Russia
| | - Adrian Llerena
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Clinical Research Center, Extremadura University Hospital and Medical School, Badajoz, Spain
| | - Courtney MacNealy-Koch
- Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Jean-Claude Marshall
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Center for Translational Research, Catholic Health Initiatives, Towson, Maryland
| | - Paola Masuzzo
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Medical Protein Research, Vlaams Instituut voor Biotechnologie, Ghent, Belgium
- Department of Biochemistry, Ghent University; Ghent, Belgium
| | - Amanda May
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Chemistry, University of Tennessee Knoxville, Knoxville, Tennessee
| | - George Mias
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Genetics, Stanford University, Stanford, California
| | - Matthew Monroe
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington
| | - Elizabeth Montague
- Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Sean Mooney
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- The Buck Institute for Research on Aging, Novato, California
| | - Alexey Nesvizhskii
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Pathology, University of Michigan, Ann Arbor, Michigan
- Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Santosh Noronha
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Chemical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, India
| | - Gilbert Omenn
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor Michigan
- Department of Molecular Medicine & Genetics and Human Genetics, University of Michigan, Ann Arbor Michigan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor Michigan
- School of Public Health, University of Michigan, Ann Arbor Michigan
| | - Harsha Rajasimha
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Jeeva Informatics Solutions LLC, Derwood, Maryland
| | - Preveen Ramamoorthy
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Molecular Diagnostics Department, National Jewish Health, Denver, Colorado
| | - Jerry Sheehan
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- California Institute for Telecommunications and Information Technology, University of California-San Diego, La Jolla, California
| | - Larry Smarr
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- California Institute for Telecommunications and Information Technology, University of California-San Diego, La Jolla, California
| | - Charles V. Smith
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
| | - Todd Smith
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Digital World Biology, Seattle, Washington
| | - Michael Snyder
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Genetics, Stanford University, Stanford, California
- Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, California
| | - Srikanth Rapole
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Proteomics Laboratory, National Centre for Cell Science, University of Pune, Pune, India
| | - Sanjeeva Srivastava
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Proteomics Laboratory, Indian Institute of Technology Bombay, Mumbai, India
| | - Larissa Stanberry
- Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Elizabeth Stewart
- Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Stefano Toppo
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Peter Uetz
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Center for the Study of Biological Complexity (CSBC), Virginia Commonwealth University, Richmond, Virginia
| | - Kenneth Verheggen
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Medical Protein Research, Vlaams Instituut voor Biotechnologie, Ghent, Belgium
- Department of Biochemistry, Ghent University; Ghent, Belgium
| | - Brynn H. Voy
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Animal Science, University of Tennessee Institute of Agriculture, Knoxville, Tennessee
| | - Louise Warnich
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Genetics, Faculty of AgriSciences, University of Stellenbosch, Stellenbosch, South Africa
| | - Steven W. Wilhelm
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Department of Microbiology, University of Tennessee-Knoxville, Knoxville, Tennessee
| | - Gregory Yandl
- Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| |
Collapse
|
14
|
González-Beltrán A, Maguire E, Sansone SA, Rocca-Serra P. linkedISA: semantic representation of ISA-Tab experimental metadata. BMC Bioinformatics 2014; 15. [PMID: 25472428 PMCID: PMC4255742 DOI: 10.1186/1471-2105-15-s14-s4,] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023] Open
Abstract
BACKGROUND Reporting and sharing experimental metadata- such as the experimental design, characteristics of the samples, and procedures applied, along with the analysis results, in a standardised manner ensures that datasets are comprehensible and, in principle, reproducible, comparable and reusable. Furthermore, sharing datasets in formats designed for consumption by humans and machines will also maximize their use. The Investigation/Study/Assay (ISA) open source metadata tracking framework facilitates standards-compliant collection, curation, visualization, storage and sharing of datasets, leveraging on other platforms to enable analysis and publication. The ISA software suite includes several components used in increasingly diverse set of life science and biomedical domains; it is underpinned by a general-purpose format, ISA-Tab, and conversions exist into formats required by public repositories. While ISA-Tab works well mainly as a human readable format, we have also implemented a linked data approach to semantically define the ISA-Tab syntax. RESULTS We present a semantic web representation of the ISA-Tab syntax that complements ISA-Tab's syntactic interoperability with semantic interoperability. We introduce the linkedISA conversion tool from ISA-Tab to the Resource Description Framework (RDF), supporting mappings from the ISA syntax to multiple community-defined, open ontologies and capitalising on user-provided ontology annotations in the experimental metadata. We describe insights of the implementation and how annotations can be expanded driven by the metadata. We applied the conversion tool as part of Bio-GraphIIn, a web-based application supporting integration of the semantically-rich experimental descriptions. Designed in a user-friendly manner, the Bio-GraphIIn interface hides most of the complexities to the users, exposing a familiar tabular view of the experimental description to allow seamless interaction with the RDF representation, and visualising descriptors to drive the query over the semantic representation of the experimental design. In addition, we defined queries over the linkedISA RDF representation and demonstrated its use over the linkedISA conversion of datasets from Nature' Scientific Data online publication. CONCLUSIONS Our linked data approach has allowed us to: 1) make the ISA-Tab semantics explicit and machine-processable, 2) exploit the existing ontology-based annotations in the ISA-Tab experimental descriptions, 3) augment the ISA-Tab syntax with new descriptive elements, 4) visualise and query elements related to the experimental design. Reasoning over ISA-Tab metadata and associated data will facilitate data integration and knowledge discovery.
Collapse
Affiliation(s)
| | - Eamonn Maguire
- Oxford e-Research Centre, University of Oxford, Oxford, OX1 3QG, UK
| | | | | |
Collapse
|
15
|
Kolker E, Özdemir V, Martens L, Hancock W, Anderson G, Anderson N, Aynacioglu S, Baranova A, Campagna SR, Chen R, Choiniere J, Dearth SP, Feng WC, Ferguson L, Fox G, Frishman D, Grossman R, Heath A, Higdon R, Hutz MH, Janko I, Jiang L, Joshi S, Kel A, Kemnitz JW, Kohane IS, Kolker N, Lancet D, Lee E, Li W, Lisitsa A, Llerena A, MacNealy-Koch C, Marshall JC, Masuzzo P, May A, Mias G, Monroe M, Montague E, Mooney S, Nesvizhskii A, Noronha S, Omenn G, Rajasimha H, Ramamoorthy P, Sheehan J, Smarr L, Smith CV, Smith T, Snyder M, Rapole S, Srivastava S, Stanberry L, Stewart E, Toppo S, Uetz P, Verheggen K, Voy BH, Warnich L, Wilhelm SW, Yandl G. Toward More Transparent and Reproducible Omics Studies Through a Common Metadata Checklist and Data Publications. BIG DATA 2013; 1:196-201. [PMID: 27447251 DOI: 10.1089/big.2013.0039] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.
Collapse
Affiliation(s)
- Eugene Kolker
- 1 Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 2 Predictive Analytics , Seattle Children's, Seattle, Washington
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Vural Özdemir
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 4 Office of the President, Gaziantep University , International Affairs and Global Development Strategy
- 5 Faculty of Communications, Universite Bulvarı , Kilis Yolu, Turkey
| | - Lennart Martens
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 6 Department of Medical Protein Research, Vlaams Instituut voor Biotechnologie , Ghent, Belgium
- 7 Department of Biochemistry, Ghent University, Ghent , Belgium
| | - William Hancock
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 8 Department of Chemistry, Barnett Institute, Northeastern University , Boston, Massachusetts
| | - Gordon Anderson
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 9 Fundamental & Computational Sciences Directorate, Pacific Northwest National Laboratory , Richland, Washington
| | - Nathaniel Anderson
- 1 Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Sukru Aynacioglu
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 10 Department of Pharmacology, Gaziantep University , Gaziantep, Turkey
| | - Ancha Baranova
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 11 School of Systems Biology, George Mason University , Manassas, Virginia
| | - Shawn R Campagna
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 12 Department of Chemistry, University of Tennessee Knoxville , Knoxville, Tennessee
| | - Rui Chen
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 13 Department of Genetics, Stanford University , Stanford, California
| | - John Choiniere
- 1 Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Stephen P Dearth
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 12 Department of Chemistry, University of Tennessee Knoxville , Knoxville, Tennessee
| | - Wu-Chun Feng
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 14 Department of Computer Science, Virginia Tech, Blacksburg Virginia
- 15 Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg Virginia
- 16 SyNeRGy Laboratory, Virginia Tech, Blacksburg, Virginia
| | - Lynnette Ferguson
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 17 Department of Nutrition, Auckland Cancer Society Research Centre, University of Auckland , Auckland, New Zealand
| | - Geoffrey Fox
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 18 School of Informatics and Computing, Indiana University , Bloomington, Indiana
| | - Dmitrij Frishman
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 19 Technische Universitat Munchen , Wissenshaftzentrum Weihenstephan, Freising, Germany
| | - Robert Grossman
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 20 Institute for Genomics and Systems Biology, University of Chicago , Chicago Illinois
- 21 Department of Medicine, University of Chicago , Chicago, Illinois
| | - Allison Heath
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 20 Institute for Genomics and Systems Biology, University of Chicago , Chicago Illinois
- 22 Knapp Center for Biomedical Discovery, University of Chicago , Chicago, Illinois
| | - Roger Higdon
- 1 Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 2 Predictive Analytics , Seattle Children's, Seattle, Washington
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Mara H Hutz
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 23 Departamento de Genetica, Instituto de Biociencias, Federal University of Rio Grande do Sul , Porto Alegre, Brazil
| | - Imre Janko
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 24 High-Throughput Analysis Core, Seattle Children's Research Institute , Seattle, Washington
| | - Lihua Jiang
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 13 Department of Genetics, Stanford University , Stanford, California
| | - Sanjay Joshi
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 25 Life Sciences , EMC, Hopkinton, Massachusetts
| | - Alexander Kel
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 26 GeneXplain GmbH , Wolfenbüttel, Germany
| | - Joseph W Kemnitz
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 27 Department of Cell and Regenerative Biology, University of Wisconsin-Madison , Madison, Wisconsin
- 28 Wisconsin National Primate Research Center, University of Wisconsin-Madison , Madison, Wisconsin
| | - Isaac S Kohane
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 29 Pediatrics and Health Sciences Technology, Children's Hospital and Harvard Medical School , Boston, Massachusetts
- 30 HMS Center for Biomedical Informatics, Countway Library of Medicine , Boston, Massachusetts
| | - Natali Kolker
- 2 Predictive Analytics , Seattle Children's, Seattle, Washington
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 24 High-Throughput Analysis Core, Seattle Children's Research Institute , Seattle, Washington
| | - Doron Lancet
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 31 Department of Molecular Genetics, Crown Human Genome Center , Weizmann Institute of Science, Rehovot, Israel
| | - Elaine Lee
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 24 High-Throughput Analysis Core, Seattle Children's Research Institute , Seattle, Washington
| | - Weizhong Li
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 32 Center for Research in Biological Systems, University of California , San Diego, La Jolla, California
| | - Andrey Lisitsa
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 33 Russian Human Proteome Organization (RHUPO) , Moscow, Russia
- 34 Institute of Biomedical Chemistry , Moscow, Russia
| | - Adrian Llerena
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 35 Clinical Research Center, Extremadura University Hospital and Medical School , Badajoz, Spain
| | - Courtney MacNealy-Koch
- 1 Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Jean-Claude Marshall
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 36 Center for Translational Research, Catholic Health Initiatives , Towson, Maryland
| | - Paola Masuzzo
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 6 Department of Medical Protein Research, Vlaams Instituut voor Biotechnologie , Ghent, Belgium
- 7 Department of Biochemistry, Ghent University, Ghent , Belgium
| | - Amanda May
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 12 Department of Chemistry, University of Tennessee Knoxville , Knoxville, Tennessee
| | - George Mias
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 13 Department of Genetics, Stanford University , Stanford, California
| | - Matthew Monroe
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 37 Biological Sciences Division, Pacific Northwest National Laboratory , Richland, Washington
| | - Elizabeth Montague
- 1 Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 2 Predictive Analytics , Seattle Children's, Seattle, Washington
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Sean Mooney
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 38 The Buck Institute for Research on Aging , Novato, California
| | - Alexey Nesvizhskii
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 39 Department of Pathology, University of Michigan , Ann Arbor, Michigan
- 40 Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, Michigan
| | - Santosh Noronha
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 41 Department of Chemical Engineering, Indian Institute of Technology Bombay , Powai, Mumbai, India
| | - Gilbert Omenn
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 42 Center for Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, Michigan
- 43 Departments of Molecular Medicine & Genetics and Human Genetics, University of Michigan , Ann Arbor Michigan
- 44 Department of Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, Michigan
- 45 School of Public Health, University of Michigan , Ann Arbor, Michigan
| | - Harsha Rajasimha
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 46 J eeva Informatics Solutions LLC , Derwood, Maryland
| | - Preveen Ramamoorthy
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 47 Molecular Diagnostics Department, National Jewish Health , Denver Colorado
| | - Jerry Sheehan
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 48 California Institute for Telecommunications and Information Technology, University of California-San Diego , La Jolla, California
| | - Larry Smarr
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 48 California Institute for Telecommunications and Information Technology, University of California-San Diego , La Jolla, California
| | - Charles V Smith
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 49 Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
| | - Todd Smith
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 50 Digital World Biology , Seattle, Washington
| | - Michael Snyder
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 13 Department of Genetics, Stanford University , Stanford, California
- 51 Stanford Center for Genomics and Personalized Medicine, Stanford University , Stanford, California
| | - Srikanth Rapole
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 52 Proteomics Laboratory, National Centre for Cell Science, University of Pune , Pune, India
| | - Sanjeeva Srivastava
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 53 Proteomics Laboratory, Indian Institute of Technology Bombay , Mumbai, India
| | - Larissa Stanberry
- 1 Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 2 Predictive Analytics , Seattle Children's, Seattle, Washington
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Elizabeth Stewart
- 1 Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Stefano Toppo
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 54 Department of Molecular Medicine, University of Padova , Padova, Italy
| | - Peter Uetz
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 55 Center for the Study of Biological Complexity (CSBC), Virginia Commonwealth University , Richmond, Virginia
| | - Kenneth Verheggen
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 6 Department of Medical Protein Research, Vlaams Instituut voor Biotechnologie , Ghent, Belgium
- 7 Department of Biochemistry, Ghent University, Ghent , Belgium
| | - Brynn H Voy
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 56 Department of Animal Science, University of Tennessee Institute of Agriculture , Knoxville, Tennessee
| | - Louise Warnich
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 57 Department of Genetics, Faculty of AgriSciences, University of Stellenbosch , Stellenbosch, South Africa
| | - Steven W Wilhelm
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 58 Department of Microbiology, University of Tennessee-Knoxville , Knoxville, Tennessee
| | - Gregory Yandl
- 1 Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 3 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| |
Collapse
|
16
|
Adamusiak T, Parkinson H, Muilu J, Roos E, van der Velde KJ, Thorisson GA, Byrne M, Pang C, Gollapudi S, Ferretti V, Hillege H, Brookes AJ, Swertz MA. Observ-OM and Observ-TAB: Universal syntax solutions for the integration, search, and exchange of phenotype and genotype information. Hum Mutat 2012; 33:867-73. [DOI: 10.1002/humu.22070] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Accepted: 02/22/2012] [Indexed: 11/12/2022]
|
17
|
González-Beltrán AN, Yong MY, Dancey G, Begent R. Guidelines for information about therapy experiments: a proposal on best practice for recording experimental data on cancer therapy. BMC Res Notes 2012; 5:10. [PMID: 22226027 PMCID: PMC3285520 DOI: 10.1186/1756-0500-5-10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2011] [Accepted: 01/06/2012] [Indexed: 12/03/2022] Open
Abstract
Background Biology, biomedicine and healthcare have become data-driven enterprises, where scientists and clinicians need to generate, access, validate, interpret and integrate different kinds of experimental and patient-related data. Thus, recording and reporting of data in a systematic and unambiguous fashion is crucial to allow aggregation and re-use of data. This paper reviews the benefits of existing biomedical data standards and focuses on key elements to record experiments for therapy development. Specifically, we describe the experiments performed in molecular, cellular, animal and clinical models. We also provide an example set of elements for a therapy tested in a phase I clinical trial. Findings We introduce the Guidelines for Information About Therapy Experiments (GIATE), a minimum information checklist creating a consistent framework to transparently report the purpose, methods and results of the therapeutic experiments. A discussion on the scope, design and structure of the guidelines is presented, together with a description of the intended audience. We also present complementary resources such as a classification scheme, and two alternative ways of creating GIATE information: an electronic lab notebook and a simple spreadsheet-based format. Finally, we use GIATE to record the details of the phase I clinical trial of CHT-25 for patients with refractory lymphomas. The benefits of using GIATE for this experiment are discussed. Conclusions While data standards are being developed to facilitate data sharing and integration in various aspects of experimental medicine, such as genomics and clinical data, no previous work focused on therapy development. We propose a checklist for therapy experiments and demonstrate its use in the 131Iodine labeled CHT-25 chimeric antibody cancer therapy. As future work, we will expand the set of GIATE tools to continue to encourage its use by cancer researchers, and we will engineer an ontology to annotate GIATE elements and facilitate unambiguous interpretation and data integration.
Collapse
|
18
|
Abstract
Synthetic Biology is founded on the idea that complex biological systems are built most effectively when the task is divided in abstracted layers and all required components are readily available and well-described. This requires interdisciplinary collaboration at several levels and a common understanding of the functioning of each component. Standardization of the physical composition and the description of each part is required as well as a controlled vocabulary to aid design and ensure interoperability. Here, we describe standardization initiatives from several disciplines, which can contribute to Synthetic Biology. We provide examples of the concerted standardization efforts of the BioBricks Foundation comprising the request for comments (RFC) and the Registry of Standardized Biological parts as well as the international Genetically Engineered Machine (iGEM) competition.
Collapse
Affiliation(s)
- Kristian M Müller
- Institute for Biochemistry and Biology, University of Potsdam, Potsdam, Germany.
| | | |
Collapse
|
19
|
Huang J, Mirel D, Pugh E, Xing C, Robinson PN, Pertsemlidis A, Ding L, Kozlitina J, Maher J, Rios J, Story M, Marthandan N, Scheuermann RH. Minimum Information about a Genotyping Experiment (MIGEN). Stand Genomic Sci 2011; 5:224-9. [PMID: 22180825 PMCID: PMC3235517 DOI: 10.4056/sigs.1994602] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Genotyping experiments are widely used in clinical and basic research laboratories to identify associations between genetic variations and normal/abnormal phenotypes. Genotyping assay techniques vary from single genomic regions that are interrogated using PCR reactions to high throughput assays examining genome-wide sequence and structural variation. The resulting genotype data may include millions of markers of thousands of individuals, requiring various statistical, modeling or other data analysis methodologies to interpret the results. To date, there are no standards for reporting genotyping experiments. Here we present the Minimum Information about a Genotyping Experiment (MIGen) standard, defining the minimum information required for reporting genotyping experiments. MIGen standard covers experimental design, subject description, genotyping procedure, quality control and data analysis. MIGen is a registered project under MIBBI (Minimum Information for Biological and Biomedical Investigations) and is being developed by an interdisciplinary group of experts in basic biomedical science, clinical science, biostatistics and bioinformatics. To accommodate the wide variety of techniques and methodologies applied in current and future genotyping experiment, MIGen leverages foundational concepts from the Ontology for Biomedical Investigations (OBI) for the description of the various types of planned processes and implements a hierarchical document structure. The adoption of MIGen by the research community will facilitate consistent genotyping data interpretation and independent data validation. MIGen can also serve as a framework for the development of data models for capturing and storing genotyping results and experiment metadata in a structured way, to facilitate the exchange of metadata.
Collapse
|
20
|
Buckle AM, Bate MA, Androulakis S, Cinquanta M, Basquin J, Bonneau F, Chatterjee DK, Cittaro D, Gräslund S, Gruszka A, Page R, Suppmann S, Wheeler JX, Agostini D, Taussig M, Taylor CF, Bottomley SP, Villaverde A, de Marco A. Recombinant protein quality evaluation: proposal for a minimal information standard. Stand Genomic Sci 2011; 5:195-7. [PMID: 22180821 PMCID: PMC3235516 DOI: 10.4056/sigs.1834511] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Affiliation(s)
- Ashley M. Buckle
- The Department of Biochemistry and Molecular Biology, School of Biomedical Sciences, Faculty of Medicine, Nursing and Health Sciences, Monash University, Australia
- Corresponding authors: Ario de Marco, University of Nova Gorica (UNG), Rožna Dolina (Nova Gorica), Slovenia. Tel. 0039.3493542056;
| | - Mark A. Bate
- The Department of Biochemistry and Molecular Biology, School of Biomedical Sciences, Faculty of Medicine, Nursing and Health Sciences, Monash University, Australia
| | - Steve Androulakis
- Monash eResearch Centre, Monash University, Clayton, Victoria, Australia
| | | | - Jerome Basquin
- Max Planck Institute of Biochemistry, Department of Structural Cell Biology, Martinsried, Germany
| | - Fabien Bonneau
- Max Planck Institute of Biochemistry, Department of Structural Cell Biology, Martinsried, Germany
| | - Deb K. Chatterjee
- Protein Expression Laboratory, SAIC-Frederick Inc., National Cancer Institute, Frederick, MD USA
| | | | - Susanne Gräslund
- Structural Genomics Consortium, Karolinska Institutet, Department of Medical Biophysics and Biochemistry, Stockholm, Sweden
| | | | - Rebecca Page
- Brown University, Department of Molecular Biology, Cell Biology and Biochemistry, Providence, RI, USA
| | - Sabine Suppmann
- Max Planck Institute of Biochemistry, Microchemistry Core Facility, Martinsried, Germany
| | - Jun X. Wheeler
- National Institute for Biological Standards and Control, Health Protection Agency, Hertfordshire, UK
| | | | - Mike Taussig
- Protein Technologies Group, Babraham Bioscience Technologies, Cambridge UK
| | - Chris F. Taylor
- The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - Stephen P. Bottomley
- The Department of Biochemistry and Molecular Biology, School of Biomedical Sciences, Faculty of Medicine, Nursing and Health Sciences, Monash University, Australia
| | - Antonio Villaverde
- Institute for Biotechnology and Biomedicine and Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, and CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Barcelona, Spain
| | - Ario de Marco
- Department Environmental Sciences, University of Nova Gorica, Nova Gorica, Slovenia
- Corresponding authors: Ario de Marco, University of Nova Gorica (UNG), Rožna Dolina (Nova Gorica), Slovenia. Tel. 0039.3493542056;
| |
Collapse
|
21
|
Quinn TA, Granite S, Allessie MA, Antzelevitch C, Bollensdorff C, Bub G, Burton RAB, Cerbai E, Chen PS, Delmar M, Difrancesco D, Earm YE, Efimov IR, Egger M, Entcheva E, Fink M, Fischmeister R, Franz MR, Garny A, Giles WR, Hannes T, Harding SE, Hunter PJ, Iribe G, Jalife J, Johnson CR, Kass RS, Kodama I, Koren G, Lord P, Markhasin VS, Matsuoka S, McCulloch AD, Mirams GR, Morley GE, Nattel S, Noble D, Olesen SP, Panfilov AV, Trayanova NA, Ravens U, Richard S, Rosenbaum DS, Rudy Y, Sachs F, Sachse FB, Saint DA, Schotten U, Solovyova O, Taggart P, Tung L, Varró A, Volders PG, Wang K, Weiss JN, Wettwer E, White E, Wilders R, Winslow RL, Kohl P. Minimum Information about a Cardiac Electrophysiology Experiment (MICEE): standardised reporting for model reproducibility, interoperability, and data sharing. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2011; 107:4-10. [PMID: 21745496 PMCID: PMC3190048 DOI: 10.1016/j.pbiomolbio.2011.07.001] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Accepted: 07/01/2011] [Indexed: 11/21/2022]
Abstract
Cardiac experimental electrophysiology is in need of a well-defined Minimum Information Standard for recording, annotating, and reporting experimental data. As a step towards establishing this, we present a draft standard, called Minimum Information about a Cardiac Electrophysiology Experiment (MICEE). The ultimate goal is to develop a useful tool for cardiac electrophysiologists which facilitates and improves dissemination of the minimum information necessary for reproduction of cardiac electrophysiology research, allowing for easier comparison and utilisation of findings by others. It is hoped that this will enhance the integration of individual results into experimental, computational, and conceptual models. In its present form, this draft is intended for assessment and development by the research community. We invite the reader to join this effort, and, if deemed productive, implement the Minimum Information about a Cardiac Electrophysiology Experiment standard in their own work.
Collapse
Affiliation(s)
- T A Quinn
- National Heart and Lung Institute, Imperial College London, London, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Jeliazkova N, Jeliazkov V. AMBIT RESTful web services: an implementation of the OpenTox application programming interface. J Cheminform 2011; 3:18. [PMID: 21575202 PMCID: PMC3120779 DOI: 10.1186/1758-2946-3-18] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2010] [Accepted: 05/16/2011] [Indexed: 11/10/2022] Open
Abstract
The AMBIT web services package is one of the several existing independent implementations of the OpenTox Application Programming Interface and is built according to the principles of the Representational State Transfer (REST) architecture. The Open Source Predictive Toxicology Framework, developed by the partners in the EC FP7 OpenTox project, aims at providing a unified access to toxicity data and predictive models, as well as validation procedures. This is achieved by i) an information model, based on a common OWL-DL ontology ii) links to related ontologies; iii) data and algorithms, available through a standardized REST web services interface, where every compound, data set or predictive method has a unique web address, used to retrieve its Resource Description Framework (RDF) representation, or initiate the associated calculations.The AMBIT web services package has been developed as an extension of AMBIT modules, adding the ability to create (Quantitative) Structure-Activity Relationship (QSAR) models and providing an OpenTox API compliant interface. The representation of data and processing resources in W3C Resource Description Framework facilitates integrating the resources as Linked Data. By uploading datasets with chemical structures and arbitrary set of properties, they become automatically available online in several formats. The services provide unified interfaces to several descriptor calculation, machine learning and similarity searching algorithms, as well as to applicability domain and toxicity prediction models. All Toxtree modules for predicting the toxicological hazard of chemical compounds are also integrated within this package. The complexity and diversity of the processing is reduced to the simple paradigm "read data from a web address, perform processing, write to a web address". The online service allows to easily run predictions, without installing any software, as well to share online datasets and models. The downloadable web application allows researchers to setup an arbitrary number of service instances for specific purposes and at suitable locations. These services could be used as a distributed framework for processing of resource-intensive tasks and data sharing or in a fully independent way, according to the specific needs. The advantage of exposing the functionality via the OpenTox API is seamless interoperability, not only within a single web application, but also in a network of distributed services. Last, but not least, the services provide a basis for building web mashups, end user applications with friendly GUIs, as well as embedding the functionalities in existing workflow systems.
Collapse
|
23
|
Field D, Kottmann R, Sterk P. The first special issue of Standards in Genomic Sciences from the Genomic Standards Consortium. Stand Genomic Sci 2010; 3:214-5. [PMID: 21304721 PMCID: PMC3035305 DOI: 10.4056/sigs.1493697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
|
24
|
Tolopko AN, Sullivan JP, Erickson SD, Wrobel D, Chiang SL, Rudnicki K, Rudnicki S, Nale J, Selfors LM, Greenhouse D, Muhlich JL, Shamu CE. Screensaver: an open source lab information management system (LIMS) for high throughput screening facilities. BMC Bioinformatics 2010; 11:260. [PMID: 20482787 PMCID: PMC3001403 DOI: 10.1186/1471-2105-11-260] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2010] [Accepted: 05/18/2010] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Shared-usage high throughput screening (HTS) facilities are becoming more common in academe as large-scale small molecule and genome-scale RNAi screening strategies are adopted for basic research purposes. These shared facilities require a unique informatics infrastructure that must not only provide access to and analysis of screening data, but must also manage the administrative and technical challenges associated with conducting numerous, interleaved screening efforts run by multiple independent research groups. RESULTS We have developed Screensaver, a free, open source, web-based lab information management system (LIMS), to address the informatics needs of our small molecule and RNAi screening facility. Screensaver supports the storage and comparison of screening data sets, as well as the management of information about screens, screeners, libraries, and laboratory work requests. To our knowledge, Screensaver is one of the first applications to support the storage and analysis of data from both genome-scale RNAi screening projects and small molecule screening projects. CONCLUSIONS The informatics and administrative needs of an HTS facility may be best managed by a single, integrated, web-accessible application such as Screensaver. Screensaver has proven useful in meeting the requirements of the ICCB-Longwood/NSRB Screening Facility at Harvard Medical School, and has provided similar benefits to other HTS facilities.
Collapse
Affiliation(s)
- Andrew N Tolopko
- ICCB-Longwood/NSRB Screening Facility, Harvard Medical School, 250 Longwood Avenue, Boston, MA 02115, USA
| | - John P Sullivan
- ICCB-Longwood/NSRB Screening Facility, Harvard Medical School, 250 Longwood Avenue, Boston, MA 02115, USA
- Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA
| | - Sean D Erickson
- ICCB-Longwood/NSRB Screening Facility, Harvard Medical School, 250 Longwood Avenue, Boston, MA 02115, USA
| | - David Wrobel
- ICCB-Longwood/NSRB Screening Facility, Harvard Medical School, 250 Longwood Avenue, Boston, MA 02115, USA
| | - Su L Chiang
- ICCB-Longwood/NSRB Screening Facility, Harvard Medical School, 250 Longwood Avenue, Boston, MA 02115, USA
| | - Katrina Rudnicki
- ICCB-Longwood/NSRB Screening Facility, Harvard Medical School, 250 Longwood Avenue, Boston, MA 02115, USA
| | - Stewart Rudnicki
- ICCB-Longwood/NSRB Screening Facility, Harvard Medical School, 250 Longwood Avenue, Boston, MA 02115, USA
| | - Jennifer Nale
- ICCB-Longwood/NSRB Screening Facility, Harvard Medical School, 250 Longwood Avenue, Boston, MA 02115, USA
| | - Laura M Selfors
- ICCB-Longwood/NSRB Screening Facility, Harvard Medical School, 250 Longwood Avenue, Boston, MA 02115, USA
| | - Dara Greenhouse
- ICCB-Longwood/NSRB Screening Facility, Harvard Medical School, 250 Longwood Avenue, Boston, MA 02115, USA
| | - Jeremy L Muhlich
- Department of Systems Biology, Harvard Medical School, 200 Longwood Ave. Boston MA 02115, USA
| | - Caroline E Shamu
- ICCB-Longwood/NSRB Screening Facility, Harvard Medical School, 250 Longwood Avenue, Boston, MA 02115, USA
- Department of Systems Biology, Harvard Medical School, 200 Longwood Ave. Boston MA 02115, USA
| |
Collapse
|
25
|
Kind T, Scholz M, Fiehn O. How large is the metabolome? A critical analysis of data exchange practices in chemistry. PLoS One 2009; 4:e5440. [PMID: 19415114 PMCID: PMC2673031 DOI: 10.1371/journal.pone.0005440] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2008] [Accepted: 04/15/2009] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Calculating the metabolome size of species by genome-guided reconstruction of metabolic pathways misses all products from orphan genes and from enzymes lacking annotated genes. Hence, metabolomes need to be determined experimentally. Annotations by mass spectrometry would greatly benefit if peer-reviewed public databases could be queried to compile target lists of structures that already have been reported for a given species. We detail current obstacles to compile such a knowledge base of metabolites. RESULTS As an example, results are presented for rice. Two rice (oryza sativa) subspecies have been fully sequenced, oryza japonica and oryza indica. Several major small molecule databases were compared for listing known rice metabolites comprising PubChem, Chemical Abstracts, Beilstein, Patent databases, Dictionary of Natural Products, SetupX/BinBase, KNApSAcK DB, and finally those databases which were obtained by computational approaches, i.e. RiceCyc, KEGG, and Reactome. More than 5,000 small molecules were retrieved when searching these databases. Unfortunately, most often, genuine rice metabolites were retrieved together with non-metabolite database entries such as pesticides. Overlaps from database compound lists were very difficult to compare because structures were either not encoded in machine-readable format or because compound identifiers were not cross-referenced between databases. CONCLUSIONS We conclude that present databases are not capable of comprehensively retrieving all known metabolites. Metabolome lists are yet mostly restricted to genome-reconstructed pathways. We suggest that providers of (bio)chemical databases enrich their database identifiers to PubChem IDs and InChIKeys to enable cross-database queries. In addition, peer-reviewed journal repositories need to mandate submission of structures and spectra in machine readable format to allow automated semantic annotation of articles containing chemical structures. Such changes in publication standards and database architectures will enable researchers to compile current knowledge about the metabolome of species, which may extend to derived information such as spectral libraries, organ-specific metabolites, and cross-study comparisons.
Collapse
Affiliation(s)
- Tobias Kind
- University of California Davis, Genome Center – Metabolomics, Davis, California, United States of America
| | - Martin Scholz
- University of California Davis, Genome Center – Metabolomics, Davis, California, United States of America
| | - Oliver Fiehn
- University of California Davis, Genome Center – Metabolomics, Davis, California, United States of America
| |
Collapse
|
26
|
Genotype-phenotype databases: challenges and solutions for the post-genomic era. Nat Rev Genet 2009; 10:9-18. [PMID: 19065136 DOI: 10.1038/nrg2483] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The flow of research data concerning the genetic basis of health and disease is rapidly increasing in speed and complexity. In response, many projects are seeking to ensure that there are appropriate informatics tools, systems and databases available to manage and exploit this flood of information. Previous solutions, such as central databases, journal-based publication and manually intensive data curation, are now being enhanced with new systems for federated databases, database publication, and more automated management of data flows and quality control. Along with emerging technologies that enhance connectivity and data retrieval, these advances should help to create a powerful knowledge environment for genotype-phenotype information.
Collapse
|
27
|
Kind T, Scholz M, Fiehn O. How large is the metabolome? A critical analysis of data exchange practices in chemistry. PLoS One 2009. [PMID: 19415114 DOI: 10.1371/journal.pone.05440] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2023] Open
Abstract
BACKGROUND Calculating the metabolome size of species by genome-guided reconstruction of metabolic pathways misses all products from orphan genes and from enzymes lacking annotated genes. Hence, metabolomes need to be determined experimentally. Annotations by mass spectrometry would greatly benefit if peer-reviewed public databases could be queried to compile target lists of structures that already have been reported for a given species. We detail current obstacles to compile such a knowledge base of metabolites. RESULTS As an example, results are presented for rice. Two rice (oryza sativa) subspecies have been fully sequenced, oryza japonica and oryza indica. Several major small molecule databases were compared for listing known rice metabolites comprising PubChem, Chemical Abstracts, Beilstein, Patent databases, Dictionary of Natural Products, SetupX/BinBase, KNApSAcK DB, and finally those databases which were obtained by computational approaches, i.e. RiceCyc, KEGG, and Reactome. More than 5,000 small molecules were retrieved when searching these databases. Unfortunately, most often, genuine rice metabolites were retrieved together with non-metabolite database entries such as pesticides. Overlaps from database compound lists were very difficult to compare because structures were either not encoded in machine-readable format or because compound identifiers were not cross-referenced between databases. CONCLUSIONS We conclude that present databases are not capable of comprehensively retrieving all known metabolites. Metabolome lists are yet mostly restricted to genome-reconstructed pathways. We suggest that providers of (bio)chemical databases enrich their database identifiers to PubChem IDs and InChIKeys to enable cross-database queries. In addition, peer-reviewed journal repositories need to mandate submission of structures and spectra in machine readable format to allow automated semantic annotation of articles containing chemical structures. Such changes in publication standards and database architectures will enable researchers to compile current knowledge about the metabolome of species, which may extend to derived information such as spectral libraries, organ-specific metabolites, and cross-study comparisons.
Collapse
Affiliation(s)
- Tobias Kind
- University of California Davis, Genome Center - Metabolomics, Davis, CA, USA
| | | | | |
Collapse
|