1
|
Schmidt C, Boissonnet T, Dohle J, Bernhardt K, Ferrando-May E, Wernet T, Nitschke R, Kunis S, Weidtkamp-Peters S. A practical guide to bioimaging research data management in core facilities. J Microsc 2024; 294:350-371. [PMID: 38752662 DOI: 10.1111/jmi.13317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 04/29/2024] [Accepted: 04/30/2024] [Indexed: 05/21/2024]
Abstract
Bioimage data are generated in diverse research fields throughout the life and biomedical sciences. Its potential for advancing scientific progress via modern, data-driven discovery approaches reaches beyond disciplinary borders. To fully exploit this potential, it is necessary to make bioimaging data, in general, multidimensional microscopy images and image series, FAIR, that is, findable, accessible, interoperable and reusable. These FAIR principles for research data management are now widely accepted in the scientific community and have been adopted by funding agencies, policymakers and publishers. To remain competitive and at the forefront of research, implementing the FAIR principles into daily routines is an essential but challenging task for researchers and research infrastructures. Imaging core facilities, well-established providers of access to imaging equipment and expertise, are in an excellent position to lead this transformation in bioimaging research data management. They are positioned at the intersection of research groups, IT infrastructure providers, the institution´s administration, and microscope vendors. In the frame of German BioImaging - Society for Microscopy and Image Analysis (GerBI-GMB), cross-institutional working groups and third-party funded projects were initiated in recent years to advance the bioimaging community's capability and capacity for FAIR bioimage data management. Here, we provide an imaging-core-facility-centric perspective outlining the experience and current strategies in Germany to facilitate the practical adoption of the FAIR principles closely aligned with the international bioimaging community. We highlight which tools and services are ready to be implemented and what the future directions for FAIR bioimage data have to offer.
Collapse
Affiliation(s)
- Christian Schmidt
- Enabling Technology Department, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Tom Boissonnet
- Center for Advanced Imaging, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Julia Dohle
- Center of Cellular Nanoanalytics, Integrated Bioimaging Facility iBiOs, University of Osnabrück, Osnabrück, Germany
| | - Karen Bernhardt
- Center of Cellular Nanoanalytics, Integrated Bioimaging Facility iBiOs, University of Osnabrück, Osnabrück, Germany
| | - Elisa Ferrando-May
- Enabling Technology Department, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Tobias Wernet
- Life Imaging Center, University of Freiburg, Freiburg, Germany
| | - Roland Nitschke
- Life Imaging Center, University of Freiburg, Freiburg, Germany
- CIBSS and BIOSS - Centres for Biological Signalling Studies, University of Freiburg, Freiburg, Germany
| | - Susanne Kunis
- Center of Cellular Nanoanalytics, Integrated Bioimaging Facility iBiOs, University of Osnabrück, Osnabrück, Germany
| | | |
Collapse
|
2
|
Seep L, Grein S, Splichalova I, Ran D, Mikhael M, Hildebrand S, Lauterbach M, Hiller K, Ribeiro DJS, Sieckmann K, Kardinal R, Huang H, Yu J, Kallabis S, Behrens J, Till A, Peeva V, Strohmeyer A, Bruder J, Blum T, Soriano-Arroquia A, Tischer D, Kuellmer K, Li Y, Beyer M, Gellner AK, Fromme T, Wackerhage H, Klingenspor M, Fenske WK, Scheja L, Meissner F, Schlitzer A, Mass E, Wachten D, Latz E, Pfeifer A, Hasenauer J. From Planning Stage Towards FAIR Data: A Practical Metadatasheet For Biomedical Scientists. Sci Data 2024; 11:524. [PMID: 38778016 PMCID: PMC11111677 DOI: 10.1038/s41597-024-03349-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 05/08/2024] [Indexed: 05/25/2024] Open
Abstract
Datasets consist of measurement data and metadata. Metadata provides context, essential for understanding and (re-)using data. Various metadata standards exist for different methods, systems and contexts. However, relevant information resides at differing stages across the data-lifecycle. Often, this information is defined and standardized only at publication stage, which can lead to data loss and workload increase. In this study, we developed Metadatasheet, a metadata standard based on interviews with members of two biomedical consortia and systematic screening of data repositories. It aligns with the data-lifecycle allowing synchronous metadata recording within Microsoft Excel, a widespread data recording software. Additionally, we provide an implementation, the Metadata Workbook, that offers user-friendly features like automation, dynamic adaption, metadata integrity checks, and export options for various metadata standards. By design and due to its extensive documentation, the proposed metadata standard simplifies recording and structuring of metadata for biomedical scientists, promoting practicality and convenience in data management. This framework can accelerate scientific progress by enhancing collaboration and knowledge transfer throughout the intermediate steps of data creation.
Collapse
Affiliation(s)
- Lea Seep
- Computational Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Stephan Grein
- Computational Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Iva Splichalova
- Developmental Biology of the Immune System, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Danli Ran
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
| | - Mickel Mikhael
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
| | - Staffan Hildebrand
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
| | - Mario Lauterbach
- Department of Bioinformatics and Biochemistry, Technical University Braunschweig, Braunschweig, Germany
| | - Karsten Hiller
- Department of Bioinformatics and Biochemistry, Technical University Braunschweig, Braunschweig, Germany
| | | | - Katharina Sieckmann
- Institute of Innate Immunity, University Hospital Bonn, University of Bonn, Bonn, Germany
| | - Ronja Kardinal
- Institute of Innate Immunity, University Hospital Bonn, University of Bonn, Bonn, Germany
| | - Hao Huang
- Developmental Biology of the Immune System, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Jiangyan Yu
- Computational Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
- Quantitative Systems Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Sebastian Kallabis
- Systems Immunology and Proteomics, Institute of Innate Immunity, Medical Faculty, University of Bonn, Bonn, Germany
| | - Janina Behrens
- Department of Biochemistry and Molecular Cell Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Andreas Till
- Department of Internal Medicine I, Division of Endocrinology, Diabetes and Metabolism, University Medical Center Bonn, Bonn, Germany
| | - Viktoriya Peeva
- Department of Internal Medicine I, Division of Endocrinology, Diabetes and Metabolism, University Medical Center Bonn, Bonn, Germany
| | - Akim Strohmeyer
- Chair of Molecular Nutritional Medicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Johanna Bruder
- Chair of Molecular Nutritional Medicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Tobias Blum
- Immunology and Environment, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Ana Soriano-Arroquia
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
| | - Dominik Tischer
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
| | - Katharina Kuellmer
- Chair of Molecular Nutritional Medicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Yuanfang Li
- Immunogenomics & Neurodegeneration, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
| | - Marc Beyer
- Immunogenomics & Neurodegeneration, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
- PRECISE, Platform for Single Cell Genomics and Epigenomics at the German Center for Neurodegenerative Diseases and the University of Bonn, Bonn, Germany
| | - Anne-Kathrin Gellner
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany
- Institute of Physiology II, Medical Faculty, University of Bonn, Bonn, Germany
| | - Tobias Fromme
- Chair of Molecular Nutritional Medicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Henning Wackerhage
- School for Medicine and Health, Faculty of Sport and Health Sciences, Technical University of Munich, Munich, Germany
| | - Martin Klingenspor
- Chair of Molecular Nutritional Medicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- EKFZ-Else Kröner-Fresenius Center for Nutritional Medicine, Technical University of Munich, Freising, Germany
- ZIEL Institute for Food & Health, Technical University of Munich, Freising, Germany
| | - Wiebke K Fenske
- Department of Internal Medicine I, Division of Endocrinology, Diabetes and Metabolism, University Medical Center Bonn, Bonn, Germany
- Department of Internal Medicine I - Endocrinology, Diabetology and Metabolism, Gastroenterology and Hepatology, University Hospital Bergmannsheil, Bochum, Germany
| | - Ludger Scheja
- Department of Biochemistry and Molecular Cell Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Felix Meissner
- Systems Immunology and Proteomics, Institute of Innate Immunity, Medical Faculty, University of Bonn, Bonn, Germany
- Experimental Systems Immunology, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Andreas Schlitzer
- Quantitative Systems Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Elvira Mass
- Developmental Biology of the Immune System, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Dagmar Wachten
- Institute of Innate Immunity, University Hospital Bonn, University of Bonn, Bonn, Germany
| | - Eicke Latz
- Institute of Innate Immunity, University Hospital Bonn, University of Bonn, Bonn, Germany
| | - Alexander Pfeifer
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
- PharmaCenter Bonn, University of Bonn, Bonn, Germany
| | - Jan Hasenauer
- Computational Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany.
- Helmholtz Center Munich, German Research Center for Environmental Health, Computational Health Center, Munich, Germany.
| |
Collapse
|
3
|
Amos JD, Zhang Z, Tian Y, Lowry GV, Wiesner MR, Hendren CO. Knowledge and Instance Mapping: architecture for premeditated interoperability of disparate data for materials. Sci Data 2024; 11:173. [PMID: 38321063 PMCID: PMC10847415 DOI: 10.1038/s41597-024-03006-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 01/26/2024] [Indexed: 02/08/2024] Open
Abstract
Predicting and elucidating the impacts of materials on human health and the environment is an unending task that has taken on special significance in the context of nanomaterials research over the last two decades. The properties of materials in environmental and physiological media are dynamic, reflecting the complex interactions between materials and these media. This dynamic behavior requires special consideration in the design of databases and data curation that allow for subsequent comparability and interrogation of the data from potentially diverse sources. We present two data processing methods that can be integrated into the experimental process to encourage pre-mediated interoperability of disparate material data: Knowledge Mapping and Instance Mapping. Originally developed as a framework for the NanoInformatics Knowledge Commons (NIKC) database, this architecture and associated methods can be used independently of the NIKC and applied across multiple subfields of nanotechnology and material science.
Collapse
Affiliation(s)
- Jaleesia D Amos
- Center for the Environmental Implications of Nano Technology (CEINT), Durham, USA
- Civil & Environmental Engineering, Duke University, Durham, North Carolina, 2770y8, USA
| | - Zhao Zhang
- Center for the Environmental Implications of Nano Technology (CEINT), Durham, USA
- Civil & Environmental Engineering, Duke University, Durham, North Carolina, 2770y8, USA
- Lucideon M+P, Morrisville, North Carolina, 27560, USA
| | - Yuan Tian
- Center for the Environmental Implications of Nano Technology (CEINT), Durham, USA
- Civil & Environmental Engineering, Duke University, Durham, North Carolina, 2770y8, USA
| | - Gregory V Lowry
- Center for the Environmental Implications of Nano Technology (CEINT), Durham, USA
- Civil & Environmental Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, USA
| | - Mark R Wiesner
- Center for the Environmental Implications of Nano Technology (CEINT), Durham, USA.
- Civil & Environmental Engineering, Duke University, Durham, North Carolina, 2770y8, USA.
| | - Christine Ogilvie Hendren
- Center for the Environmental Implications of Nano Technology (CEINT), Durham, USA
- Civil & Environmental Engineering, Duke University, Durham, North Carolina, 2770y8, USA
- Department of Geological and Environmental Sciences, Appalachian State University, Boone, North Carolina, 28608, USA
| |
Collapse
|
4
|
Mukhin AM, Kazantsev FV, Lashin SA. Laboratory information systems for research management in biology. Vavilovskii Zhurnal Genet Selektsii 2023; 27:898-905. [PMID: 38213703 PMCID: PMC10777299 DOI: 10.18699/vjgb-23-104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 09/28/2023] [Accepted: 09/29/2023] [Indexed: 01/13/2024] Open
Abstract
Modern investigations in biology often require the efforts of one or more groups of researchers. Often these are groups of specialists from various scientific fields who generate and share data of different formats and sizes. Without modern approaches to work automation and data versioning (where data from different collaborators are stored at different points in time), teamwork quickly devolves into unmanageable confusion. In this review, we present a number of information systems designed to solve these problems. Their application to the organization of scientific activity helps to manage the flow of actions and data, allowing all participants to work with relevant information and solving the issue of reproducibility of both experimental and computational results. The article describes methods for organizing data flows within a team, principles for organizing metadata and ontologies. The information systems Trello, Git, Redmine, SEEK, OpenBIS and Galaxy are considered. Their functionality and scope of use are described. Before using any tools, it is important to understand the purpose of implementation, to define the set of tasks they should solve, and, based on this, to formulate requirements and finally to monitor the application of recommendations in the field. The tasks of creating a framework of ontologies, metadata, data warehousing schemas and software systems are key for a team that has decided to undertake work to automate data circulation. It is not always possible to implement such systems in their entirety, but one should still strive to do so through a step-by-step introduction of principles for organizing data and tasks with the mastery of individual software tools. It is worth noting that Trello, Git, and Redmine are easier to use, customize, and support for small research groups. At the same time, SEEK, OpenBIS, and Galaxy are more specific and their use is advisable if the capabilities of simple systems are no longer sufficient.
Collapse
Affiliation(s)
- A M Mukhin
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Kurchatov Genomic Center of ICG SB RAS, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
| | - F V Kazantsev
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Kurchatov Genomic Center of ICG SB RAS, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
| | - S A Lashin
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Kurchatov Genomic Center of ICG SB RAS, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
5
|
Hosseini R, Vlasveld M, Willemse J, van de Water B, Le Dévédec SE, Wolstencroft KJ. FAIR High Content Screening in Bioimaging. Sci Data 2023; 10:462. [PMID: 37460560 PMCID: PMC10352356 DOI: 10.1038/s41597-023-02367-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 07/05/2023] [Indexed: 07/20/2023] Open
Affiliation(s)
- Rohola Hosseini
- Life Science Semantics, Leiden Institute of Advanced Computer Science, Leiden, The Netherlands
| | - Matthijs Vlasveld
- Drug Discovery and Safety, Cell Observatory, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Joost Willemse
- Cell Observatory, Institute of Biology Leiden, Leiden, The Netherlands
| | - Bob van de Water
- Drug Discovery and Safety, Cell Observatory, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Sylvia E Le Dévédec
- Drug Discovery and Safety, Cell Observatory, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
| | | |
Collapse
|
6
|
Papoutsoglou EA, Athanasiadis IN, Visser RGF, Finkers R. The benefits and struggles of FAIR data: the case of reusing plant phenotyping data. Sci Data 2023; 10:457. [PMID: 37443110 PMCID: PMC10345100 DOI: 10.1038/s41597-023-02364-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/03/2023] [Indexed: 07/15/2023] Open
Abstract
Plant phenotyping experiments are conducted under a variety of experimental parameters and settings for diverse purposes. The data they produce is heterogeneous, complicated, often poorly documented and, as a result, difficult to reuse. Meeting societal needs (nutrition, crop adaptation and stability) requires more efficient methods toward data integration and reuse. In this work, we examine what "making data FAIR" entails, and investigate the benefits and the struggles not only of reusing FAIR data, but also making data FAIR using genotype by environment and QTL by environment interactions for developmental traits in potato as a case study. We assume the role of a scientist discovering a phenotypic dataset on a FAIR data point, verifying the existence of related datasets with environmental data, acquiring both and integrating them. We report and discuss the challenges and the potential for reusability and reproducibility of FAIRifying existing datasets, using metadata standards such as MIAPPE, that were encountered in this process.
Collapse
Affiliation(s)
- Evangelia A Papoutsoglou
- Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands
- Taxonic B.V., De Meern, The Netherlands
| | - Ioannis N Athanasiadis
- Wageningen Data Competence Center and Geo-Information Science & Remote Sensing Lab, Wageningen University and Research, Wageningen, The Netherlands
| | - Richard G F Visser
- Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands
| | - Richard Finkers
- Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands.
- GenNovation B.V., Wageningen, The Netherlands.
| |
Collapse
|
7
|
Nault R, Cave MC, Ludewig G, Moseley HN, Pennell KG, Zacharewski T. A Case for Accelerating Standards to Achieve the FAIR Principles of Environmental Health Research Experimental Data. ENVIRONMENTAL HEALTH PERSPECTIVES 2023; 131:65001. [PMID: 37352010 PMCID: PMC10289218 DOI: 10.1289/ehp11484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 06/05/2023] [Accepted: 06/07/2023] [Indexed: 06/25/2023]
Abstract
BACKGROUND Funding agencies, publishers, and other stakeholders are pushing environmental health science investigators to improve data sharing; to promote the findable, accessible, interoperable, and reusable (FAIR) principles; and to increase the rigor and reproducibility of the data collected. Accomplishing these goals will require significant cultural shifts surrounding data management and strategies to develop robust and reliable resources that bridge the technical challenges and gaps in expertise. OBJECTIVE In this commentary, we examine the current state of managing data and metadata-referred to collectively as (meta)data-in the experimental environmental health sciences. We introduce new tools and resources based on in vivo experiments to serve as examples for the broader field. METHODS We discuss previous and ongoing efforts to improve (meta)data collection and curation. These include global efforts by the Functional Genomics Data Society to develop metadata collection tools such as the Investigation, Study, Assay (ISA) framework, and the Center for Expanded Data Annotation and Retrieval. We also conduct a case study of in vivo data deposited in the Gene Expression Omnibus that demonstrates the current state of in vivo environmental health data and highlights the value of using the tools we propose to support data deposition. DISCUSSION The environmental health science community has played a key role in efforts to achieve the goals of the FAIR guiding principles and is well positioned to advance them further. We present a proposed framework to further promote these objectives and minimize the obstacles between data producers and data scientists to maximize the return on research investments. https://doi.org/10.1289/EHP11484.
Collapse
Affiliation(s)
- Rance Nault
- Biochemistry & Molecular Biology Department, Institute for Integrative Toxicology, Michigan State University, East Lansing, Michigan, USA
| | - Matthew C. Cave
- Division of Gastroenterology, Hepatology, and Nutrition, University of Louisville, Louisville, Kentucky, USA
| | - Gabriele Ludewig
- Department of Occupational and Environmental Health, University of Iowa, Iowa City, Iowa, USA
| | - Hunter N.B. Moseley
- Molecular and Cellular Biochemistry Department, University of Kentucky, Lexington, Kentucky, USA
| | - Kelly G. Pennell
- Department of Civil Engineering, University of Kentucky, Lexington, Kentucky, USA
| | - Tim Zacharewski
- Biochemistry & Molecular Biology Department, Institute for Integrative Toxicology, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
8
|
Fahlgren N, Kapoor M, Yordanova G, Papatheodorou I, Waese J, Cole B, Harrison P, Ware D, Tickle T, Paten B, Burdett T, Elsik CG, Tuggle CK, Provart NJ. Toward a data infrastructure for the Plant Cell Atlas. PLANT PHYSIOLOGY 2023; 191:35-46. [PMID: 36200899 PMCID: PMC9806565 DOI: 10.1093/plphys/kiac468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/18/2022] [Indexed: 06/16/2023]
Abstract
We review how a data infrastructure for the Plant Cell Atlas might be built using existing infrastructure and platforms. The Human Cell Atlas has developed an extensive infrastructure for human and mouse single cell data, while the European Bioinformatics Institute has developed a Single Cell Expression Atlas, that currently houses several plant data sets. We discuss issues related to appropriate ontologies for describing a plant single cell experiment. We imagine how such an infrastructure will enable biologists and data scientists to glean new insights into plant biology in the coming decades, as long as such data are made accessible to the community in an open manner.
Collapse
Affiliation(s)
- Noah Fahlgren
- Donald Danforth Plant Science Center, Saint Louis, Missouri 63132, USA
| | - Muskan Kapoor
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | | | | | - Jamie Waese
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| | - Benjamin Cole
- DOE-Joint Genome Institute, Lawrence Berkeley National Laboratory, 1, Cyclotron Road, Berkeley, California 94720, USA
| | - Peter Harrison
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Doreen Ware
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, New York 11724, USA
- USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Ithaca, New York 14853, USA
| | - Timothy Tickle
- Data Sciences Platform, The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, Massachusetts 02142, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, Baskin School of Engineering, 1156 High Street, Santa Cruz, California 95064, USA
| | - Tony Burdett
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Christine G Elsik
- Division of Animal Sciences/Division of Plant Science & Technology/Institute for Data Science & Informatics, University of Missouri, Columbia, Missouri 65211, USA
| | - Christopher K Tuggle
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | - Nicholas J Provart
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| |
Collapse
|
9
|
Röckel F, Schreiber T, Schüler D, Braun U, Krukenberg I, Schwander F, Peil A, Brandt C, Willner E, Gransow D, Scholz U, Kecke S, Maul E, Lange M, Töpfer R. PhenoApp: A mobile tool for plant phenotyping to record field and greenhouse observations. F1000Res 2022; 11:12. [PMID: 36636476 PMCID: PMC9813448 DOI: 10.12688/f1000research.74239.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/20/2021] [Indexed: 01/21/2023] Open
Abstract
With the ongoing cost decrease of genotyping and sequencing technologies, accurate and fast phenotyping remains the bottleneck in the utilizing of plant genetic resources for breeding and breeding research. Although cost-efficient high-throughput phenotyping platforms are emerging for specific traits and/or species, manual phenotyping is still widely used and is a time- and money-consuming step. Approaches that improve data recording, processing or handling are pivotal steps towards the efficient use of genetic resources and are demanded by the research community. Therefore, we developed PhenoApp, an open-source Android app for tablets and smartphones to facilitate the digital recording of phenotypical data in the field and in greenhouses. It is a versatile tool that offers the possibility to fully customize the descriptors/scales for any possible scenario, also in accordance with international information standards such as MIAPPE (Minimum Information About a Plant Phenotyping Experiment) and FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. Furthermore, PhenoApp enables the use of pre-integrated ready-to-use BBCH (Biologische Bundesanstalt für Land- und Forstwirtschaft, Bundessortenamt und CHemische Industrie) scales for apple, cereals, grapevine, maize, potato, rapeseed and rice. Additional BBCH scales can easily be added. The simple and adaptable structure of input and output files enables an easy data handling by either spreadsheet software or even the integration in the workflow of laboratory information management systems (LIMS). PhenoApp is therefore a decisive contribution to increase efficiency of digital data acquisition in genebank management but also contributes to breeding and breeding research by accelerating the labour intensive and time-consuming acquisition of phenotyping data.
Collapse
Affiliation(s)
- Franco Röckel
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Grapevine Breeding Geilweilerhof, Siebeldingen, 76833, Germany,
| | - Toni Schreiber
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Data Processing Department, Erwin-Baur-Straße 27, Quedlinburg, 06484, Germany
| | - Danuta Schüler
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, Seeland, 06466, Germany
| | - Ulrike Braun
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Grapevine Breeding Geilweilerhof, Siebeldingen, 76833, Germany
| | - Ina Krukenberg
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Data Processing Department, Königin-Luise-Strasse 19, Berlin, 14195, Germany
| | - Florian Schwander
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Grapevine Breeding Geilweilerhof, Siebeldingen, 76833, Germany
| | - Andreas Peil
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Breeding Research on Fruit Crops, Pillnitzer Platz 3a, Dresden/Pillnitz, 01326, Germany
| | - Christine Brandt
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), The Satellite Collections North, Parkweg 3a, Sanitz, 18190, Germany
| | - Evelin Willner
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), The Satellite Collections North, Inselstraße 9, Malchow/Poel, 23999, Germany
| | - Daniel Gransow
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), The Satellite Collections North, Inselstraße 9, Malchow/Poel, 23999, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, Seeland, 06466, Germany
| | - Steffen Kecke
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Data Processing Department, Erwin-Baur-Straße 27, Quedlinburg, 06484, Germany
| | - Erika Maul
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Grapevine Breeding Geilweilerhof, Siebeldingen, 76833, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, Seeland, 06466, Germany
| | - Reinhard Töpfer
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Grapevine Breeding Geilweilerhof, Siebeldingen, 76833, Germany
| |
Collapse
|
10
|
Röckel F, Schreiber T, Schüler D, Braun U, Krukenberg I, Schwander F, Peil A, Brandt C, Willner E, Gransow D, Scholz U, Kecke S, Maul E, Lange M, Töpfer R. PhenoApp: A mobile tool for plant phenotyping to record field and greenhouse observations. F1000Res 2022; 11:12. [PMID: 36636476 PMCID: PMC9813448 DOI: 10.12688/f1000research.74239.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open
Abstract
With the ongoing cost decrease of genotyping and sequencing technologies, accurate and fast phenotyping remains the bottleneck in the utilizing of plant genetic resources for breeding and breeding research. Although cost-efficient high-throughput phenotyping platforms are emerging for specific traits and/or species, manual phenotyping is still widely used and is a time- and money-consuming step. Approaches that improve data recording, processing or handling are pivotal steps towards the efficient use of genetic resources and are demanded by the research community. Therefore, we developed PhenoApp, an open-source Android app for tablets and smartphones to facilitate the digital recording of phenotypical data in the field and in greenhouses. It is a versatile tool that offers the possibility to fully customize the descriptors/scales for any possible scenario, also in accordance with international information standards such as MIAPPE (Minimum Information About a Plant Phenotyping Experiment) and FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. Furthermore, PhenoApp enables the use of pre-integrated ready-to-use BBCH (Biologische Bundesanstalt für Land- und Forstwirtschaft, Bundessortenamt und CHemische Industrie) scales for apple, cereals, grapevine, maize, potato, rapeseed and rice. Additional BBCH scales can easily be added. The simple and adaptable structure of input and output files enables an easy data handling by either spreadsheet software or even the integration in the workflow of laboratory information management systems (LIMS). PhenoApp is therefore a decisive contribution to increase efficiency of digital data acquisition in genebank management but also contributes to breeding and breeding research by accelerating the labour intensive and time-consuming acquisition of phenotyping data.
Collapse
Affiliation(s)
- Franco Röckel
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Grapevine Breeding Geilweilerhof, Siebeldingen, 76833, Germany,
| | - Toni Schreiber
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Data Processing Department, Erwin-Baur-Straße 27, Quedlinburg, 06484, Germany
| | - Danuta Schüler
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, Seeland, 06466, Germany
| | - Ulrike Braun
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Grapevine Breeding Geilweilerhof, Siebeldingen, 76833, Germany
| | - Ina Krukenberg
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Data Processing Department, Königin-Luise-Strasse 19, Berlin, 14195, Germany
| | - Florian Schwander
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Grapevine Breeding Geilweilerhof, Siebeldingen, 76833, Germany
| | - Andreas Peil
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Breeding Research on Fruit Crops, Pillnitzer Platz 3a, Dresden/Pillnitz, 01326, Germany
| | - Christine Brandt
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), The Satellite Collections North, Parkweg 3a, Sanitz, 18190, Germany
| | - Evelin Willner
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), The Satellite Collections North, Inselstraße 9, Malchow/Poel, 23999, Germany
| | - Daniel Gransow
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), The Satellite Collections North, Inselstraße 9, Malchow/Poel, 23999, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, Seeland, 06466, Germany
| | - Steffen Kecke
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Data Processing Department, Erwin-Baur-Straße 27, Quedlinburg, 06484, Germany
| | - Erika Maul
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Grapevine Breeding Geilweilerhof, Siebeldingen, 76833, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, Seeland, 06466, Germany
| | - Reinhard Töpfer
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Grapevine Breeding Geilweilerhof, Siebeldingen, 76833, Germany
| |
Collapse
|
11
|
Modeling community standards for metadata as templates makes data FAIR. Sci Data 2022; 9:696. [DOI: 10.1038/s41597-022-01815-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 11/01/2022] [Indexed: 11/13/2022] Open
Abstract
AbstractIt is challenging to determine whether datasets are findable, accessible, interoperable, and reusable (FAIR) because the FAIR Guiding Principles refer to highly idiosyncratic criteria regarding the metadata used to annotate datasets. Specifically, the FAIR principles require metadata to be “rich” and to adhere to “domain-relevant” community standards. Scientific communities should be able to define their own machine-actionable templates for metadata that encode these “rich,” discipline-specific elements. We have explored this template-based approach in the context of two software systems. One system is the CEDAR Workbench, which investigators use to author new metadata. The other is the FAIRware Workbench, which evaluates the metadata of archived datasets for their adherence to community standards. Benefits accrue when templates for metadata become central elements in an ecosystem of tools to manage online datasets—both because the templates serve as a community reference for what constitutes FAIR data, and because they embody that perspective in a form that can be distributed among a variety of software applications to assist with data stewardship and data sharing.
Collapse
|
12
|
pISA-tree - a data management framework for life science research projects using a standardised directory tree. Sci Data 2022; 9:685. [DOI: 10.1038/s41597-022-01805-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 10/24/2022] [Indexed: 11/12/2022] Open
Abstract
AbstractWe developed pISA-tree, a straightforward and flexible data management solution for organisation of life science project-associated research data and metadata. pISA-tree was initiated by end-user requirements thus its strong points are practicality and low maintenance cost. It enables on-the-fly creation of enriched directory tree structure (project/Investigation/Study/Assay) based on the ISA model, in a standardised manner via consecutive batch files. Templates-based metadata is generated in parallel at each level enabling guided submission of experiment metadata. pISA-tree is complemented by two R packages, pisar and seekr. pisar facilitates integration of pISA-tree datasets into bioinformatic pipelines and generation of ISA-Tab exports. seekr enables synchronisation with the FAIRDOMHub repository. Applicability of pISA-tree was demonstrated in several national and international multi-partner projects. The system thus supports findable, accessible, interoperable and reusable (FAIR) research and is in accordance with the Open Science initiative. Source code and documentation of pISA-tree are available at https://github.com/NIB-SI/pISA-tree.
Collapse
|
13
|
Abstract
Community-developed minimum information checklists are designed to drive the rich and consistent reporting of metadata, underpinning the reproducibility and reuse of the data. These reporting guidelines, however, are usually in the form of narratives intended for human consumption. Modular and reusable machine-readable versions are also needed. Firstly, to provide the necessary quantitative and verifiable measures of the degree to which the metadata descriptors meet these community requirements, a requirement of the FAIR Principles. Secondly, to encourage the creation of standards-driven templates for metadata authoring, especially when describing complex experiments that require multiple reporting guidelines to be used in combination or extended. We present new functionalities to support the creation and improvements of machine-readable models. We apply the approach to an exemplar set of reporting guidelines in Life Science and discuss the challenges. Our work, targeted to developers of standards and those familiar with standards, promotes the concept of compositional metadata elements and encourages the creation of community-standards which are modular and interoperable from the onset.
Collapse
|
14
|
van Rijn J, Afantitis A, Culha M, Dusinska M, Exner TE, Jeliazkova N, Longhin EM, Lynch I, Melagraki G, Nymark P, Papadiamantis AG, Winkler DA, Yilmaz H, Willighagen E. European Registry of Materials: global, unique identifiers for (undisclosed) nanomaterials. J Cheminform 2022; 14:57. [PMID: 36002868 PMCID: PMC9400299 DOI: 10.1186/s13321-022-00614-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 05/21/2022] [Indexed: 11/25/2022] Open
Abstract
Management of nanomaterials and nanosafety data needs to operate under the FAIR (findability, accessibility, interoperability, and reusability) principles and this requires a unique, global identifier for each nanomaterial. Existing identifiers may not always be applicable or sufficient to definitively identify the specific nanomaterial used in a particular study, resulting in the use of textual descriptions in research project communications and reporting. To ensure that internal project documentation can later be linked to publicly released data and knowledge for the specific nanomaterials, or even to specific batches and variants of nanomaterials utilised in that project, a new identifier is proposed: the European Registry of Materials Identifier. We here describe the background to this new identifier, including FAIR interoperability as defined by FAIRSharing, identifiers.org, Bioregistry, and the CHEMINF ontology, and show how it complements other identifiers such as CAS numbers and the ongoing efforts to extend the InChI identifier to cover nanomaterials. We provide examples of its use in various H2020-funded nanosafety projects.
Collapse
Affiliation(s)
- Jeaphianne van Rijn
- Department of Bioinformatics-BiGCaT, NUTRIM, FHML, Maastricht University, Maastricht, The Netherlands.
| | | | - Mustafa Culha
- Sabanci University Nanotechnology Research and Application Center (SUNUM), Tuzla, 34956, Istanbul, Turkey
| | - Maria Dusinska
- Health Effects Laboratory, Department of Environmental Chemistry, Norwegian Institute for Air Research, 2007, Kjeller, Norway
| | | | | | - Eleonora Marta Longhin
- Health Effects Laboratory, Department of Environmental Chemistry, Norwegian Institute for Air Research, 2007, Kjeller, Norway
| | - Iseult Lynch
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, B15 2TT, UK
| | | | - Penny Nymark
- Institute of Environmental Medicine, Karolinska Institute, Stockholm, Sweden
| | - Anastasios G Papadiamantis
- NovaMechanics Ltd., 1070, Nicosia, Cyprus.,School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, B15 2TT, UK
| | - David A Winkler
- School of Biochemistry and Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Bundoora, Australia.,Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Australia.,School of Pharmacy, University of Nottingham, Nottingham, UK
| | - Hulya Yilmaz
- Sabanci University Nanotechnology Research and Application Center (SUNUM), Tuzla, 34956, Istanbul, Turkey
| | - Egon Willighagen
- Department of Bioinformatics-BiGCaT, NUTRIM, FHML, Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
15
|
Possamai T, Wiedemann-Merdinoglu S. Phenotyping for QTL identification: A case study of resistance to Plasmopara viticola and Erysiphe necator in grapevine. FRONTIERS IN PLANT SCIENCE 2022; 13:930954. [PMID: 36035702 PMCID: PMC9403010 DOI: 10.3389/fpls.2022.930954] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 06/27/2022] [Indexed: 06/01/2023]
Abstract
Vitis vinifera is the most widely cultivated grapevine species. It is highly susceptible to Plasmopara viticola and Erysiphe necator, the causal agents of downy mildew (DM) and powdery mildew (PM), respectively. Current strategies to control DM and PM mainly rely on agrochemical applications that are potentially harmful to humans and the environment. Breeding for resistance to DM and PM in wine grape cultivars by introgressing resistance loci from wild Vitis spp. is a complementary and more sustainable solution to manage these two diseases. During the last two decades, 33 loci of resistance to P. viticola (Rpv) and 15 loci of resistance to E. necator (Ren and Run) have been identified. Phenotyping is salient for QTL characterization and understanding the genetic basis of resistant traits. However, phenotyping remains a major bottleneck for research on Rpv and Ren/Run loci and disease resistance evaluation. A thorough analysis of the literature on phenotyping methods used for DM and PM resistance evaluation highlighted phenotyping performed in the vineyard, greenhouse or laboratory with major sources of variation, such as environmental conditions, plant material (organ physiology and age), pathogen inoculum (genetic and origin), pathogen inoculation (natural or controlled), and disease assessment method (date, frequency, and method of scoring). All these factors affect resistance assessment and the quality of phenotyping data. We argue that the use of new technologies for disease symptom assessment, and the production and adoption of standardized experimental guidelines should enhance the accuracy and reliability of phenotyping data. This should contribute to a better replicability of resistance evaluation outputs, facilitate QTL identification, and contribute to streamline disease resistance breeding programs.
Collapse
Affiliation(s)
- Tyrone Possamai
- CREA—Research Centre for Viticulture and Enology, Conegliano, Italy
| | | |
Collapse
|
16
|
Hoffmann N, Mayer G, Has C, Kopczynski D, Al Machot F, Schwudke D, Ahrends R, Marcus K, Eisenacher M, Turewicz M. A Current Encyclopedia of Bioinformatics Tools, Data Formats and Resources for Mass Spectrometry Lipidomics. Metabolites 2022; 12:metabo12070584. [PMID: 35888710 PMCID: PMC9319858 DOI: 10.3390/metabo12070584] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 06/17/2022] [Accepted: 06/19/2022] [Indexed: 12/13/2022] Open
Abstract
Mass spectrometry is a widely used technology to identify and quantify biomolecules such as lipids, metabolites and proteins necessary for biomedical research. In this study, we catalogued freely available software tools, libraries, databases, repositories and resources that support lipidomics data analysis and determined the scope of currently used analytical technologies. Because of the tremendous importance of data interoperability, we assessed the support of standardized data formats in mass spectrometric (MS)-based lipidomics workflows. We included tools in our comparison that support targeted as well as untargeted analysis using direct infusion/shotgun (DI-MS), liquid chromatography−mass spectrometry, ion mobility or MS imaging approaches on MS1 and potentially higher MS levels. As a result, we determined that the Human Proteome Organization-Proteomics Standards Initiative standard data formats, mzML and mzTab-M, are already supported by a substantial number of recent software tools. We further discuss how mzTab-M can serve as a bridge between data acquisition and lipid bioinformatics tools for interpretation, capturing their output and transmitting rich annotated data for downstream processing. However, we identified several challenges of currently available tools and standards. Potential areas for improvement were: adaptation of common nomenclature and standardized reporting to enable high throughput lipidomics and improve its data handling. Finally, we suggest specific areas where tools and repositories need to improve to become FAIRer.
Collapse
Affiliation(s)
- Nils Hoffmann
- Forschungszentrum Jülich GmbH, Institute for Bio- and Geosciences (IBG-5), 52425 Jülich, Germany
- Correspondence: (N.H.); (M.T.); Tel.: +49-(0)521-106-86780 (N.H.)
| | - Gerhard Mayer
- Institute of Medical Systems Biology, Ulm University, 89081 Ulm, Germany;
| | - Canan Has
- Biological Mass Spectrometry, Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany;
- University Hospital Carl Gustav Carus, 01307 Dresden, Germany
- CENTOGENE GmbH, 18055 Rostock, Germany
| | - Dominik Kopczynski
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (D.K.); (R.A.)
| | - Fadi Al Machot
- Faculty of Science and Technology, Norwegian University for Life Science (NMBU), 1433 Ås, Norway;
| | - Dominik Schwudke
- Bioanalytical Chemistry, Forschungszentrum Borstel, Leibniz Lung Center, 23845 Borstel, Germany;
- Airway Research Center North, German Center for Lung Research (DZL), 23845 Borstel, Germany
- German Center for Infection Research (DZIF), TTU Tuberculosis, 23845 Borstel, Germany
| | - Robert Ahrends
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (D.K.); (R.A.)
| | - Katrin Marcus
- Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Ruhr University Bochum, 44801 Bochum, Germany; (K.M.); (M.E.)
| | - Martin Eisenacher
- Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Ruhr University Bochum, 44801 Bochum, Germany; (K.M.); (M.E.)
- Faculty of Medicine, Medizinisches Proteom-Center, Ruhr University Bochum, 44801 Bochum, Germany
| | - Michael Turewicz
- Institute for Clinical Biochemistry and Pathobiochemistry, German Diabetes Center (DDZ), Leibniz Center for Diabetes Research at Heinrich-Heine-University Düsseldorf, 40225 Düsseldorf, Germany
- German Center for Diabetes Research (DZD), Partner Düsseldorf, 85764 Neuherberg, Germany
- Correspondence: (N.H.); (M.T.); Tel.: +49-(0)521-106-86780 (N.H.)
| |
Collapse
|
17
|
Beier S, Fiebig A, Pommier C, Liyanage I, Lange M, Kersey PJ, Weise S, Finkers R, Koylass B, Cezard T, Courtot M, Contreras-Moreira B, Naamati G, Dyer S, Scholz U. Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR. F1000Res 2022; 11. [PMID: 35811804 PMCID: PMC9218589 DOI: 10.12688/f1000research.109080.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/17/2022] [Indexed: 11/20/2022] Open
Abstract
In this opinion article, we discuss the formatting of files from (plant) genotyping studies, in particular the formatting of metadata in Variant Call Format (VCF) files. The flexibility of the VCF format specification facilitates its use as a generic interchange format across domains but can lead to inconsistency between files in the presentation of metadata. To enable fully autonomous machine actionable data flow, generic elements need to be further specified. We strongly support the merits of the FAIR principles and see the need to facilitate them also through technical implementation specifications. They form a basis for the proposed VCF extensions here. We have learned from the existing application of VCF that the definition of relevant metadata using controlled standards, vocabulary and the consistent use of cross-references via resolvable identifiers (machine-readable) are particularly necessary and propose their encoding. VCF is an established standard for the exchange and publication of genotyping data. Other data formats are also used to capture variant data (for example, the HapMap and the gVCF formats), but none currently have the reach of VCF. For the sake of simplicity, we will only discuss VCF and our recommendations for its use, but these recommendations could also be applied to gVCF. However, the part of the VCF standard relating to metadata (as opposed to the actual variant calls) defines a syntactic format but no vocabulary, unique identifier or recommended content. In practice, often only sparse descriptive metadata is included. When descriptive metadata is provided, proprietary metadata fields are frequently added that have not been agreed upon within the community which may limit long-term and comprehensive interoperability. To address this, we propose recommendations for supplying and encoding metadata, focusing on use cases from plant sciences. We expect there to be overlap, but also divergence, with the needs of other domains.
Collapse
Affiliation(s)
- Sebastian Beier
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
- Institute of Bio- and Geosciences, Bioinformatics (IBG-4), Forschungszentrum Jülich GmbH, Jülich, 52425, Germany
| | - Anne Fiebig
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Cyril Pommier
- BioinfOmics, Plant bioinformatics facility, Université Paris-Saclay, INRAE, Versailles, France
| | - Isuru Liyanage
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Matthias Lange
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | | | - Stephan Weise
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Richard Finkers
- Plant Breeding, Wageningen University & Research, Wageningen, The Netherlands
- Gennovation B.V., Wageningen, The Netherlands
| | - Baron Koylass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Timothee Cezard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Mélanie Courtot
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- Ontario Institute for Cancer Research, Toronto, Canada
| | - Bruno Contreras-Moreira
- Laboratorio de Biología Computacional y Estructural, Estación Experimental Aula Dei-CSIC, Zaragoza, 50059, Spain
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Sarah Dyer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Uwe Scholz
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| |
Collapse
|
18
|
Pradhan D, Ding H, Zhu J, Engelward BP, Levine SS. NExtSEEK: Extending SEEK for Active Management of Interoperable Metadata. J Biomol Tech 2022; 33:3fc1f5fe.db404124. [PMID: 35836998 PMCID: PMC9258913 DOI: 10.7171/3fc1f5fe.db404124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Data management is a critical challenge required to improve the rigor and reproducibility of large projects. Adhering to Findable, Accessible, Interoperable, and Reusable (FAIR) standards provides a baseline for meeting these requirements. Although many existing repositories handle data in a FAIR-compliant manner, there are limited tools in the public domain to handle the metadata burden required to connect data from multi-omic projects that span multiple institutions and are deposited in diverse repositories. One promising approach is the SEEK platform, which allows for diverse metadata and provides an established repository. SEEK is challenged by the assumption of single deposition events where a sample is immutable once entered in the database. This is structured for published data but presents a limitation for ongoing studies where multiple sequential events may occur in a single sample at different sites. To address this issue, we have created a modified wrapper around the SEEK platform that allows for active data management by establishing more discrete sample types that are mutable to permit the expansion of the types of metadata, allowing researchers to track additional information. The use of discrete nodes also converts assays from nodes to edges, creating a network model of the study and more accurately representing the experimental process. With these changes to SEEK, users are able to collect and organize the information that researchers need to improve reusability and reproducibility as well as make data and metadata available to the scientific community through public repositories.
Collapse
Affiliation(s)
- Dikshant Pradhan
- MIT BioMicro Center,
Department of Biology, Massachusetts Institute of TechnologyCambridge,
Massachusetts02139USA
| | - Huiming Ding
- MIT BioMicro Center,
Department of Biology, Massachusetts Institute of TechnologyCambridge,
Massachusetts02139USA
| | - Jingzhi Zhu
- MIT BioMicro Center,
Department of Biology, Massachusetts Institute of TechnologyCambridge,
Massachusetts02139USA
| | - Bevin P. Engelward
- MIT BioMicro Center,
Department of Biology, Massachusetts Institute of TechnologyCambridge,
Massachusetts02139USA
| | - Stuart S. Levine
- MIT BioMicro Center,
Department of Biology, Massachusetts Institute of TechnologyCambridge,
Massachusetts02139USA
- ADDRESS CORRESPONDENCE TO: Stuart Levine, Massachusetts Institute
of Technology, 77 Massachusetts Avenue, Building 68-304D, Cambridge,
MA 02139, USA (Phone: 617-452-2949; E-mail:
)
| |
Collapse
|
19
|
Raboudi A, Allanic M, Balvay D, Hervé PY, Viel T, Yoganathan T, Certain A, Hilbey J, Charlet J, Durupt A, Boutinaud P, Eynard B, Tavitian B. The BMS-LM ontology for biomedical data reporting throughout the lifecycle of a research study: From data model to ontology. J Biomed Inform 2022; 127:104007. [DOI: 10.1016/j.jbi.2022.104007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 12/24/2021] [Accepted: 01/28/2022] [Indexed: 11/16/2022]
|
20
|
Samuel S, König-Ries B. End-to-End provenance representation for the understandability and reproducibility of scientific experiments using a semantic approach. J Biomed Semantics 2022; 13:1. [PMID: 34991705 PMCID: PMC8734275 DOI: 10.1186/s13326-021-00253-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 10/18/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The advancement of science and technologies play an immense role in the way scientific experiments are being conducted. Understanding how experiments are performed and how results are derived has become significantly more complex with the recent explosive growth of heterogeneous research data and methods. Therefore, it is important that the provenance of results is tracked, described, and managed throughout the research lifecycle starting from the beginning of an experiment to its end to ensure reproducibility of results described in publications. However, there is a lack of interoperable representation of end-to-end provenance of scientific experiments that interlinks data, processing steps, and results from an experiment's computational and non-computational processes. RESULTS We present the "REPRODUCE-ME" data model and ontology to describe the end-to-end provenance of scientific experiments by extending existing standards in the semantic web. The ontology brings together different aspects of the provenance of scientific studies by interlinking non-computational data and steps with computational data and steps to achieve understandability and reproducibility. We explain the important classes and properties of the ontology and how they are mapped to existing ontologies like PROV-O and P-Plan. The ontology is evaluated by answering competency questions over the knowledge base of scientific experiments consisting of computational and non-computational data and steps. CONCLUSION We have designed and developed an interoperable way to represent the complete path of a scientific experiment consisting of computational and non-computational steps. We have applied and evaluated our approach to a set of scientific experiments in different subject domains like computational science, biological imaging, and microscopy.
Collapse
Affiliation(s)
- Sheeba Samuel
- Heinz-Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University, Jena, Germany. .,Michael Stifel Center Jena, Jena, Germany.
| | - Birgitta König-Ries
- Heinz-Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University, Jena, Germany.,Michael Stifel Center Jena, Jena, Germany
| |
Collapse
|
21
|
Langstroff A, Heuermann MC, Stahl A, Junker A. Opportunities and limits of controlled-environment plant phenotyping for climate response traits. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:1-16. [PMID: 34302493 PMCID: PMC8741719 DOI: 10.1007/s00122-021-03892-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Accepted: 06/17/2021] [Indexed: 05/19/2023]
Abstract
Rising temperatures and changing precipitation patterns will affect agricultural production substantially, exposing crops to extended and more intense periods of stress. Therefore, breeding of varieties adapted to the constantly changing conditions is pivotal to enable a quantitatively and qualitatively adequate crop production despite the negative effects of climate change. As it is not yet possible to select for adaptation to future climate scenarios in the field, simulations of future conditions in controlled-environment (CE) phenotyping facilities contribute to the understanding of the plant response to special stress conditions and help breeders to select ideal genotypes which cope with future conditions. CE phenotyping facilities enable the collection of traits that are not easy to measure under field conditions and the assessment of a plant's phenotype under repeatable, clearly defined environmental conditions using automated, non-invasive, high-throughput methods. However, extrapolation and translation of results obtained under controlled environments to field environments is ambiguous. This review outlines the opportunities and challenges of phenotyping approaches under controlled environments complementary to conventional field trials. It gives an overview on general principles and introduces existing phenotyping facilities that take up the challenge of obtaining reliable and robust phenotypic data on climate response traits to support breeding of climate-adapted crops.
Collapse
Affiliation(s)
- Anna Langstroff
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University Giessen, Heinrich Buff-Ring 26, 35392, Giessen, Germany
| | - Marc C Heuermann
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, OT Gatersleben, 06466, Seeland, Germany
| | - Andreas Stahl
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University Giessen, Heinrich Buff-Ring 26, 35392, Giessen, Germany
- Institute for Resistance Research and Stress Tolerance, Federal Research Centre for Cultivated Plants, Julius Kühn-Institut (JKI), Erwin-Baur-Strasse 27, 06484, Quedlinburg, Germany
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, OT Gatersleben, 06466, Seeland, Germany.
| |
Collapse
|
22
|
Sheffield NC, Stolarczyk M, Reuter VP, Rendeiro AF. Linking big biomedical datasets to modular analysis with Portable Encapsulated Projects. Gigascience 2021; 10:6454632. [PMID: 34890448 PMCID: PMC8673555 DOI: 10.1093/gigascience/giab077] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 04/20/2021] [Accepted: 11/04/2021] [Indexed: 12/26/2022] Open
Abstract
Background Organizing and annotating biological sample data is critical in data-intensive bioinformatics. Unfortunately, metadata formats from a data provider are often incompatible with requirements of a processing tool. There is no broadly accepted standard to organize metadata across biological projects and bioinformatics tools, restricting the portability and reusability of both annotated datasets and analysis software. Results To address this, we present the Portable Encapsulated Project (PEP) specification, a formal specification for biological sample metadata structure. The PEP specification accommodates typical features of data-intensive bioinformatics projects with many biological samples. In addition to standardization, the PEP specification provides descriptors and modifiers for project-level and sample-level metadata, which improve portability across both computing environments and data processing tools. PEPs include a schema validator framework, allowing formal definition of required metadata attributes for data analysis broadly. We have implemented packages for reading PEPs in both Python and R to provide a language-agnostic interface for organizing project metadata. Conclusions The PEP specification is an important step toward unifying data annotation and processing tools in data-intensive biological research projects. Links to tools and documentation are available at http://pep.databio.org/.
Collapse
Affiliation(s)
- Nathan C Sheffield
- Center for Public Health Genomics, University of Virginia, VA 22908, USA.,Department of Public Health Sciences, University of Virginia, VA 22908, USA.,Department of Biomedical Engineering, University of Virginia, VA 22908, USA.,Department of Biochemistry and Molecular Genetics, University of Virginia, VA 22908, USA
| | - Michał Stolarczyk
- Center for Public Health Genomics, University of Virginia, VA 22908, USA
| | - Vincent P Reuter
- Center for Public Health Genomics, University of Virginia, VA 22908, USA.,Genomics and Computational Biology Graduate Group, University of Pennsylvania, PA 19087, USA
| | - André F Rendeiro
- Institute for Computational Biomedicine, Weill Cornell Medical College, NY 10021, USA.,Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medical College, NY 10021, USA
| |
Collapse
|
23
|
Johnson D, Batista D, Cochrane K, Davey RP, Etuk A, Gonzalez-Beltran A, Haug K, Izzo M, Larralde M, Lawson TN, Minotto A, Moreno P, Nainala VC, O'Donovan C, Pireddu L, Roger P, Shaw F, Steinbeck C, Weber RJM, Sansone SA, Rocca-Serra P. ISA API: An open platform for interoperable life science experimental metadata. Gigascience 2021; 10:giab060. [PMID: 34528664 PMCID: PMC8444265 DOI: 10.1093/gigascience/giab060] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 03/19/2021] [Accepted: 08/23/2021] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND The Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open source community specifications and software tools for enabling discovery, exchange, and publication of metadata from experiments in the life sciences. The original ISA software suite provided a set of user-facing Java tools for creating and manipulating the information structured in ISA-Tab-a now widely used tabular format. To make the ISA framework more accessible to machines and enable programmatic manipulation of experiment metadata, the JSON serialization ISA-JSON was developed. RESULTS In this work, we present the ISA API, a Python library for the creation, editing, parsing, and validating of ISA-Tab and ISA-JSON formats by using a common data model engineered as Python object classes. We describe the ISA API feature set, early adopters, and its growing user community. CONCLUSIONS The ISA API provides users with rich programmatic metadata-handling functionality to support automation, a common interface, and an interoperable medium between the 2 ISA formats, as well as with other life science data formats required for depositing data in public databases.
Collapse
Affiliation(s)
- David Johnson
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
- Department of Informatics and Media, Uppsala University, Box 513, 75120 Uppsala, Sweden
| | - Dominique Batista
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Keeva Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Robert P Davey
- Earlham Institute, Data infrastructure and algorithms, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Anthony Etuk
- Earlham Institute, Data infrastructure and algorithms, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Alejandra Gonzalez-Beltran
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
- Science and Technology Facilities Council, Scientific Computing Department, Rutherford Appleton Laboratory, Harwell Campus, Didcot, OX11 0QX, UK
| | - Kenneth Haug
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- Genome Research Limited, Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Saffron Walden, CB10 1RQ, UK
| | - Massimiliano Izzo
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Martin Larralde
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Thomas N Lawson
- School of Biosciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Alice Minotto
- Earlham Institute, Data infrastructure and algorithms, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Venkata Chandrasekhar Nainala
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Claire O'Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Luca Pireddu
- Distributed Computing Group, CRS4: Center for Advanced Studies, Research & Development in Sardinia, Pula 09050, Italy
| | - Pierrick Roger
- CEA, LIST, Laboratory for Data Analysis and Systems’ Intelligence, MetaboHUB, Gif-Sur-Yvette F-91191, France
| | - Felix Shaw
- Earlham Institute, Data infrastructure and algorithms, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Christoph Steinbeck
- Cheminformatics and Computational Metabolomics, Institute for Analytical Chemistry, Lessingstr. 8, 07743 Jena, Germany
| | - Ralf J M Weber
- School of Biosciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
- Phenome Centre Birmingham, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| |
Collapse
|
24
|
Leipzig J, Nüst D, Hoyt CT, Ram K, Greenberg J. The role of metadata in reproducible computational research. PATTERNS (NEW YORK, N.Y.) 2021; 2:100322. [PMID: 34553169 PMCID: PMC8441584 DOI: 10.1016/j.patter.2021.100322] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Reproducible computational research (RCR) is the keystone of the scientific method for in silico analyses, packaging the transformation of raw data to published results. In addition to its role in research integrity, improving the reproducibility of scientific studies can accelerate evaluation and reuse. This potential and wide support for the FAIR principles have motivated interest in metadata standards supporting reproducibility. Metadata provide context and provenance to raw data and methods and are essential to both discovery and validation. Despite this shared connection with scientific data, few studies have explicitly described how metadata enable reproducible computational research. This review employs a functional content analysis to identify metadata standards that support reproducibility across an analytic stack consisting of input data, tools, notebooks, pipelines, and publications. Our review provides background context, explores gaps, and discovers component trends of embeddedness and methodology weight from which we derive recommendations for future work.
Collapse
Affiliation(s)
- Jeremy Leipzig
- Metadata Research Center, College of Computing and Informatics, Drexel University, Philadelphia, PA, USA
| | - Daniel Nüst
- Institute for Geoinformatics, University of Münster, Münster, Germany
| | | | - Karthik Ram
- Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, CA, USA
| | - Jane Greenberg
- Metadata Research Center, College of Computing and Informatics, Drexel University, Philadelphia, PA, USA
| |
Collapse
|
25
|
Estaki M, Jiang L, Bokulich NA, McDonald D, González A, Kosciolek T, Martino C, Zhu Q, Birmingham A, Vázquez-Baeza Y, Dillon MR, Bolyen E, Caporaso JG, Knight R. QIIME 2 Enables Comprehensive End-to-End Analysis of Diverse Microbiome Data and Comparative Studies with Publicly Available Data. ACTA ACUST UNITED AC 2021; 70:e100. [PMID: 32343490 PMCID: PMC9285460 DOI: 10.1002/cpbi.100] [Citation(s) in RCA: 185] [Impact Index Per Article: 61.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
QIIME 2 is a completely re‐engineered microbiome bioinformatics platform based on the popular QIIME platform, which it has replaced. QIIME 2 facilitates comprehensive and fully reproducible microbiome data science, improving accessibility to diverse users by adding multiple user interfaces. QIIME 2 can be combined with Qiita, an open‐source web‐based platform, to re‐use available data for meta‐analysis. The following basic protocol describes how to install QIIME 2 on a single computer and analyze microbiome sequence data, from processing of raw DNA sequence reads through generating publishable interactive figures. These interactive figures allow readers of a study to interact with data with the same ease as its authors, advancing microbiome science transparency and reproducibility. We also show how plug‐ins developed by the community to add analysis capabilities can be installed and used with QIIME 2, enhancing various aspects of microbiome analyses—e.g., improving taxonomic classification accuracy. Finally, we illustrate how users can perform meta‐analyses combining different datasets using readily available public data through Qiita. In this tutorial, we analyze a subset of the Early Childhood Antibiotics and the Microbiome (ECAM) study, which tracked the microbiome composition and development of 43 infants in the United States from birth to 2 years of age, identifying microbiome associations with antibiotic exposure, delivery mode, and diet. For more information about QIIME 2, see https://qiime2.org. To troubleshoot or ask questions about QIIME 2 and microbiome analysis, join the active community at https://forum.qiime2.org. © 2020 The Authors. Basic Protocol: Using QIIME 2 with microbiome data Support Protocol: Further microbiome analyses
Collapse
Affiliation(s)
- Mehrbod Estaki
- Department of Pediatrics, University of California San Diego, La Jolla, California
| | - Lingjing Jiang
- Division of Biostatistics, University of California San Diego, La Jolla, California
| | - Nicholas A Bokulich
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona.,Department of Biological Sciences, Northern Arizona University, Flagstaff, Arizona
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, California
| | - Antonio González
- Department of Pediatrics, University of California San Diego, La Jolla, California
| | - Tomasz Kosciolek
- Department of Pediatrics, University of California San Diego, La Jolla, California.,Małopolska Centre of Biotechnology, Jagiellonian University, Kraków, Poland
| | - Cameron Martino
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, California.,Center for Microbiome Innovation, University of California San Diego, La Jolla, California
| | - Qiyun Zhu
- Department of Pediatrics, University of California San Diego, La Jolla, California
| | - Amanda Birmingham
- Center for Computational Biology and Bioinformatics, University of California San Diego, La Jolla, California
| | - Yoshiki Vázquez-Baeza
- Center for Microbiome Innovation, University of California San Diego, La Jolla, California.,Jacobs School of Engineering, University of California San Diego, La Jolla, California
| | - Matthew R Dillon
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona
| | - Evan Bolyen
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona
| | - J Gregory Caporaso
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona.,Department of Biological Sciences, Northern Arizona University, Flagstaff, Arizona
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, California.,Center for Microbiome Innovation, University of California San Diego, La Jolla, California.,Department of Computer Science and Engineering, University of California San Diego, La Jolla, California.,Department of Bioengineering, University of California San Diego, La Jolla, California
| |
Collapse
|
26
|
Dimitrova M, Meyer R, Buttigieg PL, Georgiev T, Zhelezov G, Demirov S, Smith V, Penev L. A streamlined workflow for conversion, peer review, and publication of genomics metadata as omics data papers. Gigascience 2021; 10:6275150. [PMID: 33983435 PMCID: PMC8117446 DOI: 10.1093/gigascience/giab034] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 11/30/2020] [Accepted: 04/20/2021] [Indexed: 12/31/2022] Open
Abstract
Background Data papers have emerged as a powerful instrument for open data publishing, obtaining credit, and establishing priority for datasets generated in scientific experiments. Academic publishing improves data and metadata quality through peer review and increases the impact of datasets by enhancing their visibility, accessibility, and reusability. Objective We aimed to establish a new type of article structure and template for omics studies: the omics data paper. To improve data interoperability and further incentivize researchers to publish well-described datasets, we created a prototype workflow for streamlined import of genomics metadata from the European Nucleotide Archive directly into a data paper manuscript. Methods An omics data paper template was designed by defining key article sections that encourage the description of omics datasets and methodologies. A metadata import workflow, based on REpresentational State Transfer services and Xpath, was prototyped to extract information from the European Nucleotide Archive, ArrayExpress, and BioSamples databases. Findings The template and workflow for automatic import of standard-compliant metadata into an omics data paper manuscript provide a mechanism for enhancing existing metadata through publishing. Conclusion The omics data paper structure and workflow for import of genomics metadata will help to bring genomic and other omics datasets into the spotlight. Promoting enhanced metadata descriptions and enforcing manuscript peer review and data auditing of the underlying datasets brings additional quality to datasets. We hope that streamlined metadata reuse for scholarly publishing encourages authors to create enhanced metadata descriptions in the form of data papers to improve both the quality of their metadata and its findability and accessibility.
Collapse
Affiliation(s)
- Mariya Dimitrova
- Pensoft Publishers, Prof. Georgi Zlatarski Street 12, 1700 Sofia, Bulgaria.,Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Acad. G. Bonchev St., Block 25A, 1113 Sofia, Bulgaria
| | - Raïssa Meyer
- Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar- und Meeresforschung, Am Handelshafen 12, 27570 Bremerhaven, Germany
| | - Pier Luigi Buttigieg
- Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar- und Meeresforschung, Am Handelshafen 12, 27570 Bremerhaven, Germany
| | - Teodor Georgiev
- Pensoft Publishers, Prof. Georgi Zlatarski Street 12, 1700 Sofia, Bulgaria
| | - Georgi Zhelezov
- Pensoft Publishers, Prof. Georgi Zlatarski Street 12, 1700 Sofia, Bulgaria
| | | | - Vincent Smith
- The Natural History Museum, Cromwell Rd, South Kensington, SW7 5BD London, UK
| | - Lyubomir Penev
- Pensoft Publishers, Prof. Georgi Zlatarski Street 12, 1700 Sofia, Bulgaria.,Institute of Biodiversity and Ecosystem Research, Bulgarian Academy of Sciences, 2 Gagarin St., 1113 Sofia, Bulgaria
| |
Collapse
|
27
|
Overbey EG, Saravia-Butler AM, Zhang Z, Rathi KS, Fogle H, da Silveira WA, Barker RJ, Bass JJ, Beheshti A, Berrios DC, Blaber EA, Cekanaviciute E, Costa HA, Davin LB, Fisch KM, Gebre SG, Geniza M, Gilbert R, Gilroy S, Hardiman G, Herranz R, Kidane YH, Kruse CPS, Lee MD, Liefeld T, Lewis NG, McDonald JT, Meller R, Mishra T, Perera IY, Ray S, Reinsch SS, Rosenthal SB, Strong M, Szewczyk NJ, Tahimic CGT, Taylor DM, Vandenbrink JP, Villacampa A, Weging S, Wolverton C, Wyatt SE, Zea L, Costes SV, Galazka JM. NASA GeneLab RNA-seq consensus pipeline: standardized processing of short-read RNA-seq data. iScience 2021; 24:102361. [PMID: 33870146 PMCID: PMC8044432 DOI: 10.1016/j.isci.2021.102361] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 10/30/2020] [Accepted: 03/23/2021] [Indexed: 12/15/2022] Open
Abstract
With the development of transcriptomic technologies, we are able to quantify precise changes in gene expression profiles from astronauts and other organisms exposed to spaceflight. Members of NASA GeneLab and GeneLab-associated analysis working groups (AWGs) have developed a consensus pipeline for analyzing short-read RNA-sequencing data from spaceflight-associated experiments. The pipeline includes quality control, read trimming, mapping, and gene quantification steps, culminating in the detection of differentially expressed genes. This data analysis pipeline and the results of its execution using data submitted to GeneLab are now all publicly available through the GeneLab database. We present here the full details and rationale for the construction of this pipeline in order to promote transparency, reproducibility, and reusability of pipeline data; to provide a template for data processing of future spaceflight-relevant datasets; and to encourage cross-analysis of data from other databases with the data available in GeneLab. Analysis of omics data from different spaceflight studies presents unique challenges A standardized pipeline for RNA-seq analysis eliminates data processing variation The GeneLab RNA-seq pipeline includes QC, trimming, mapping, quantification, and DGE Space-relevant data processed with this pipeline are available at genelab.nasa.gov
Collapse
Affiliation(s)
- Eliah G Overbey
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Amanda M Saravia-Butler
- Logyx, LLC, Mountain View, CA 94043, USA.,Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Zhe Zhang
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Komal S Rathi
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Homer Fogle
- The Bionetics Corporation, NASA Ames Research Center, Moffett Field, CA 94035, USA.,Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Willian A da Silveira
- Institute for Global Food Security (IGFS) & School of Biological Sciences, Queen's University Belfast, Belfast, UK
| | - Richard J Barker
- Department of Botany, University of Wisconsin, Madison, WI 53706, USA
| | - Joseph J Bass
- MRC Versus Arthritis Centre for Musculoskeletal Ageing Research, Royal Derby Hospital, University of Nottingham & National Institute for Health Research Nottingham Biomedical Research Centre, Derby DE22 3DT, UK
| | - Afshin Beheshti
- KBR, NASA Ames Research Center, Moffett Field, CA 94035, USA.,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Daniel C Berrios
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Elizabeth A Blaber
- Center for Biotechnology and Interdisciplinary Studies, Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Egle Cekanaviciute
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Helio A Costa
- Departments of Pathology, and of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Laurence B Davin
- Institute of Biological Chemistry, Washington State University, Pullman, WA 99164, USA
| | - Kathleen M Fisch
- Center for Computational Biology & Bioinformatics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Samrawit G Gebre
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA.,KBR, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | | | - Rachel Gilbert
- NASA Postdoctoral Program, Universities Space Research Association, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Simon Gilroy
- Department of Botany, University of Wisconsin, Madison, WI 53706, USA
| | - Gary Hardiman
- Institute for Global Food Security (IGFS) & School of Biological Sciences, Queen's University Belfast, Belfast, UK.,Medical University of South Carolina, Charleston, SC, USA
| | - Raúl Herranz
- Centro de Investigaciones Biológicas Margarita Salas (CSIC), Ramiro de Maeztu 9, 28040 Madrid, Spain
| | - Yared H Kidane
- Center for Pediatric Bone Biology and Translational Research, Texas Scottish Rite Hospital for Children, 2222 Welborn St., Dallas, TX 75219, USA
| | - Colin P S Kruse
- Los Alamos National Laboratory, Bioscience Division, Los Alamos, NM 87545, USA
| | - Michael D Lee
- Exobiology Branch, NASA Ames Research Center, Mountain View, CA 94035, USA.,Blue Marble Space Institute of Science, Seattle, WA 98154, USA
| | - Ted Liefeld
- Department of Medicine, University of California San Diego, San Diego, CA 92093, USA
| | - Norman G Lewis
- Institute of Biological Chemistry, Washington State University, Pullman, WA 99164, USA
| | - J Tyson McDonald
- Department of Radiation Medicine, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Robert Meller
- Department of Neurobiology and Pharmacology, Morehouse School of Medicine, Atlanta, GA 30310, USA
| | - Tejaswini Mishra
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Imara Y Perera
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC 27695, USA
| | - Shayoni Ray
- NGM Biopharmaceuticals, South San Francisco, CA 94080, USA
| | - Sigrid S Reinsch
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Sara Brin Rosenthal
- Center for Computational Biology & Bioinformatics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Michael Strong
- National Jewish Health, Center for Genes, Environment, and Health, 1400 Jackson Street, Denver, CO 80206, USA
| | - Nathaniel J Szewczyk
- Ohio Musculoskeletal and Neurological Institute and Department of Biomedical Sciences, Ohio University, Athens, OH 43147, USA
| | - Candice G T Tahimic
- Department of Biology, University of North Florida, Jacksonville, FL 32224, USA
| | - Deanne M Taylor
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia and the Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Alicia Villacampa
- Centro de Investigaciones Biológicas Margarita Salas (CSIC), Ramiro de Maeztu 9, 28040 Madrid, Spain
| | - Silvio Weging
- Institute of Computer Science, Martin-Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, Halle 06120, Germany
| | - Chris Wolverton
- Department of Botany and Microbiology, Ohio Wesleyan University, Delaware, OH, USA
| | - Sarah E Wyatt
- Department of Environmental and Plant Biology, Ohio University, Athens, OH 45701, USA.,Interdisciplinary Program in Molecular and Cellular Biology, Ohio University, Athens, OH 45701, USA
| | - Luis Zea
- BioServe Space Technologies, Aerospace Engineering Sciences Department, University of Colorado Boulder, Boulder 80303 USA
| | - Sylvain V Costes
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Jonathan M Galazka
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| |
Collapse
|
28
|
Psaroudakis D, Liu F, König P, Scholz U, Junker A, Lange M, Arend D. isa4j: a scalable Java library for creating ISA-Tab metadata. F1000Res 2021; 9. [PMID: 33728038 PMCID: PMC7941097 DOI: 10.12688/f1000research.27188.1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/17/2020] [Indexed: 11/26/2022] Open
Abstract
Experimental data is only useful to other researchers if it is findable, accessible, interoperable, and reusable (FAIR). The ISA-Tab framework enables scientists to publish metadata about their experiments in a plain text, machine-readable format that aims to confer that interoperability and reusability. A Python software package (isatools) is currently being developed to programmatically produce these metadata files. For Java-based environments, there is no equivalent solution yet. While the isatools package provides a lot of flexibility and a wealth of different features for the Python ecosystem, a package for JVM-based applications might offer the speed and scalability needed for writing very large ISA-Tab files, making the ISA framework available in an even wider range of situations and environments. Here we present a light-weight and scalable Java library (isa4j) for generating metadata files in the ISA-Tab format, which elegantly integrates into existing JVM applications and especially shines at generating very large files. It is modeled after the ISA core specifications and designed in keeping with isatools conventions, making it consistent and intuitive to use for the community. isa4j is implemented in Java (JDK11+) and freely available under the terms of the MIT license from the Central Maven Repository (
https://mvnrepository.com/artifact/de.ipk-gatersleben/isa4j). The source code, detailed documentation, usage examples and performance evaluations can be found at
https://github.com/IPK-BIT/isa4j.
Collapse
Affiliation(s)
- Dennis Psaroudakis
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany.,Hochschule Mittweida, University of Applied Sciences, Mittweida, 09648, Germany
| | - Feng Liu
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Patrick König
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| |
Collapse
|
29
|
Khan I, Shahaab A. A Peer-To-Peer Publication Model on Blockchain. FRONTIERS IN BLOCKCHAIN 2021. [DOI: 10.3389/fbloc.2021.615726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
In the past few decades, there has been a sharp rise of research irreproducibility and retraction, to a point that now is deemed as a crisis. Addressing this crisis, we present a peer-to-peer (P2P) publication model that utilizes blockchain and smart contract technologies. Focusing primarily on researchers and reviewers, the conceptual P2P publication model addresses the sociocultural and incentivization aspects of the irreproducibility crisis. In the P2P publication model, instead of a complete publication, a preapproved experimental design will be published on an incremental basis (unit-by-unit) and authorship will be shared with reviewers. The concept of the P2P publication model was inspired by the transformational journey the music publishing industry has undertaken as it traverses through vinyl age (complete albums) to the Spotify age (single-by-single), where there is a growing inclination among artists toward building an incremental album, taking account of feedback from fans and utilizing automated revenue collection and sharing systems. The ability to publish incrementally through the P2P publication model will relieve researchers from the burden of publishing complete and “good results” while simultaneously incentivizing reviewers to undertake rigorous review work to gain authorship credit in the research. The proposed P2P publication model aims to transform the century-old publication model and incentivization structure in alignment with open access publication ethos of the 21st century.
Collapse
|
30
|
Biomedical Repositories for Simulation Studies. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11684-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
|
31
|
Li M, Hensel G, Melzer M, Junker A, Tschiersch H, Ruwe H, Arend D, Kumlehn J, Börner T, Stein N. Mutation of the ALBOSTRIANS Ohnologous Gene HvCMF3 Impairs Chloroplast Development and Thylakoid Architecture in Barley. FRONTIERS IN PLANT SCIENCE 2021; 12:732608. [PMID: 34659298 PMCID: PMC8517540 DOI: 10.3389/fpls.2021.732608] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 09/10/2021] [Indexed: 05/12/2023]
Abstract
Gene pairs resulting from whole genome duplication (WGD), so-called ohnologous genes, are retained if at least one member of the pair undergoes neo- or sub-functionalization. Phylogenetic analyses of the ohnologous genes ALBOSTRIANS (HvAST/HvCMF7) and ALBOSTRIANS-LIKE (HvASL/HvCMF3) of barley (Hordeum vulgare) revealed them as members of a subfamily of genes coding for CCT motif (CONSTANS, CONSTANS-LIKE and TIMING OF CAB1) proteins characterized by a single CCT domain and a putative N-terminal chloroplast transit peptide. Recently, we showed that HvCMF7 is needed for chloroplast ribosome biogenesis. Here we demonstrate that mutations in HvCMF3 lead to seedlings delayed in development. They exhibit a yellowish/light green - xantha - phenotype and successively develop pale green leaves. Compared to wild type, plastids of mutant seedlings show a decreased PSII efficiency, impaired processing and reduced amounts of ribosomal RNAs; they contain less thylakoids and grana with a higher number of more loosely stacked thylakoid membranes. Site-directed mutagenesis of HvCMF3 identified a previously unknown functional domain, which is highly conserved within this subfamily of CCT domain containing proteins. HvCMF3:GFP fusion constructs were localized to plastids and nucleus. Hvcmf3Hvcmf7 double mutants exhibited a xantha-albino or albino phenotype depending on the strength of molecular lesion of the HvCMF7 allele. The chloroplast ribosome deficiency is discussed as the primary observed defect of the Hvcmf3 mutants. Based on our observations, the genes HvCMF3 and HvCMF7 have similar but not identical functions in chloroplast development of barley supporting our hypothesis of neo-/sub-functionalization between both ohnologous genes.
Collapse
Affiliation(s)
- Mingjiu Li
- Genomics of Genetic Resources, Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research, Seeland, Germany
| | - Goetz Hensel
- Plant Reproductive Biology, Department of Physiology and Cell Biology, Leibniz Institute of Plant Genetics and Crop Plant Research, Seeland, Germany
| | - Michael Melzer
- Structural Cell Biology, Department of Physiology and Cell Biology, Leibniz Institute of Plant Genetics and Crop Plant Research, Seeland, Germany
| | - Astrid Junker
- Acclimation Dynamics and Phenotyping, Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research, Seeland, Germany
| | - Henning Tschiersch
- Heterosis Research Group, Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research, Seeland, Germany
| | - Hannes Ruwe
- Molecular Genetics, Institute of Biology, Humboldt University, Berlin, Germany
| | - Daniel Arend
- Research Group Bioinformatics and Information Technology, Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research, Seeland, Germany
| | - Jochen Kumlehn
- Plant Reproductive Biology, Department of Physiology and Cell Biology, Leibniz Institute of Plant Genetics and Crop Plant Research, Seeland, Germany
| | - Thomas Börner
- Molecular Genetics, Institute of Biology, Humboldt University, Berlin, Germany
- *Correspondence: Thomas Börner,
| | - Nils Stein
- Genomics of Genetic Resources, Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research, Seeland, Germany
- Department of Crop Sciences, Center for Integrated Breeding Research, Georg-August-University, Göttingen, Germany
- Nils Stein,
| |
Collapse
|
32
|
LabPipe: an extensible bioinformatics toolkit to manage experimental data and metadata. BMC Bioinformatics 2020; 21:556. [PMID: 33267792 PMCID: PMC7709404 DOI: 10.1186/s12859-020-03908-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Accepted: 11/25/2020] [Indexed: 12/03/2022] Open
Abstract
Background Data handling in clinical bioinformatics is often inadequate. No freely available tools provide straightforward approaches for consistent, flexible metadata collection and linkage of related experimental data generated locally by vendor software. Results To address this problem, we created LabPipe, a flexible toolkit which is driven through a local client that runs alongside vendor software and connects to a light-weight server. The toolkit allows re-usable configurations to be defined for experiment metadata and local data collection, and handles metadata entry and linkage of data. LabPipe was piloted in a multi-site clinical breathomics study. Conclusions LabPipe provided a consistent, controlled approach for handling metadata and experimental data collection, collation and linkage in the exemplar study and was flexible enough to deal effectively with different data handling challenges.
Collapse
|
33
|
Facilitating author-driven, machine-readable descriptions with the new minISA metadata format. Sci Data 2020; 7:304. [PMID: 32934235 PMCID: PMC7493984 DOI: 10.1038/s41597-020-00641-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|
34
|
Stevens I, Mukarram AK, Hörtenhuber M, Meehan TF, Rung J, Daub CO. Ten simple rules for annotating sequencing experiments. PLoS Comput Biol 2020; 16:e1008260. [PMID: 33017400 PMCID: PMC7535046 DOI: 10.1371/journal.pcbi.1008260] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Affiliation(s)
- Irene Stevens
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
- Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
- * E-mail:
| | - Abdul Kadir Mukarram
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
| | - Matthias Hörtenhuber
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
| | - Terrence F. Meehan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Johan Rung
- Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Carsten O. Daub
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
- Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
35
|
Papoutsoglou EA, Faria D, Arend D, Arnaud E, Athanasiadis IN, Chaves I, Coppens F, Cornut G, Costa BV, Ćwiek-Kupczyńska H, Droesbeke B, Finkers R, Gruden K, Junker A, King GJ, Krajewski P, Lange M, Laporte MA, Michotey C, Oppermann M, Ostler R, Poorter H, Ramı Rez-Gonzalez R, Ramšak Ž, Reif JC, Rocca-Serra P, Sansone SA, Scholz U, Tardieu F, Uauy C, Usadel B, Visser RGF, Weise S, Kersey PJ, Miguel CM, Adam-Blondon AF, Pommier C. Enabling reusability of plant phenomic datasets with MIAPPE 1.1. THE NEW PHYTOLOGIST 2020. [PMID: 32171029 DOI: 10.15454/1yxvzv] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Enabling data reuse and knowledge discovery is increasingly critical in modern science, and requires an effort towards standardising data publication practices. This is particularly challenging in the plant phenotyping domain, due to its complexity and heterogeneity. We have produced the MIAPPE 1.1 release, which enhances the existing MIAPPE standard in coverage, to support perennial plants, in structure, through an explicit data model, and in clarity, through definitions and examples. We evaluated MIAPPE 1.1 by using it to express several heterogeneous phenotyping experiments in a range of different formats, to demonstrate its applicability and the interoperability between the various implementations. Furthermore, the extended coverage is demonstrated by the fact that one of the datasets could not have been described under MIAPPE 1.0. MIAPPE 1.1 marks a major step towards enabling plant phenotyping data reusability, thanks to its extended coverage, and especially the formalisation of its data model, which facilitates its implementation in different formats. Community feedback has been critical to this development, and will be a key part of ensuring adoption of the standard.
Collapse
Affiliation(s)
- Evangelia A Papoutsoglou
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Daniel Faria
- BioData.pt, Instituto Gulbenkian de Ciência, 2780-156, Oeiras, Portugal
- INESC-ID, 1000-029, Lisboa, Portugal
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Elizabeth Arnaud
- Bioversity International, Parc Scientifique Agropolis II, Montpellier Cedex 5, 34397, France
| | - Ioannis N Athanasiadis
- Geo-Information Science and Remote Sensing Laboratory, Wageningen University, Droevendaalsesteeg 3, Wageningen, 6708PB, the Netherlands
| | - Inês Chaves
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- Instituto de Biologia Experimental e Tecnológica (iBET), 2780-157, Oeiras, Portugal
| | - Frederik Coppens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | | | - Bruno V Costa
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | - Hanna Ćwiek-Kupczyńska
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland
| | - Bert Droesbeke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | - Richard Finkers
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Kristina Gruden
- Department of Biotechnology and Systems Biology, National Institute of Biology, SI1000, Ljubljana, Slovenia
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Graham J King
- Southern Cross Plant Science, Southern Cross University, Lismore, NSW 2577, Australia
| | - Paweł Krajewski
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Marie-Angélique Laporte
- Bioversity International, Parc Scientifique Agropolis II, Montpellier Cedex 5, 34397, France
| | - Célia Michotey
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France
| | - Markus Oppermann
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Richard Ostler
- Computational and Analytical Sciences, Rothamsted Research, Harpenden, AL5 2JQ, UK
| | - Hendrik Poorter
- Plant Sciences (IBG-2), Forschungszentrum Jülich GmbH, D-52425, Jülich, Germany
- Department of Biological Sciences, Macquarie University, North Ryde, NSW 2109, Australia
| | | | - Živa Ramšak
- Department of Biotechnology and Systems Biology, National Institute of Biology, SI1000, Ljubljana, Slovenia
| | - Jochen C Reif
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - François Tardieu
- INRA, Laboratoire d'Ecophysiologie des Plantes sous Stress Environnementaux, UMR759, Montpellier, 34060, France
| | - Cristobal Uauy
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Colney, Norwich, NR4 7UH, UK
| | - Björn Usadel
- Plant Sciences (IBG-2), Forschungszentrum Jülich GmbH, D-52425, Jülich, Germany
- Institute for Biology I, BioSC, RWTH Aachen University, Worringer Weg 3, 52074, Aachen, Germany
| | - Richard G F Visser
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Stephan Weise
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | | | - Célia M Miguel
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | | | - Cyril Pommier
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France
| |
Collapse
|
36
|
Papoutsoglou EA, Faria D, Arend D, Arnaud E, Athanasiadis IN, Chaves I, Coppens F, Cornut G, Costa BV, Ćwiek‐Kupczyńska H, Droesbeke B, Finkers R, Gruden K, Junker A, King GJ, Krajewski P, Lange M, Laporte M, Michotey C, Oppermann M, Ostler R, Poorter H, Ramírez‐Gonzalez R, Ramšak Ž, Reif JC, Rocca‐Serra P, Sansone S, Scholz U, Tardieu F, Uauy C, Usadel B, Visser RGF, Weise S, Kersey PJ, Miguel CM, Adam‐Blondon A, Pommier C. Enabling reusability of plant phenomic datasets with MIAPPE 1.1. THE NEW PHYTOLOGIST 2020; 227:260-273. [PMID: 32171029 PMCID: PMC7317793 DOI: 10.1111/nph.16544] [Citation(s) in RCA: 59] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 02/24/2020] [Indexed: 05/21/2023]
Abstract
Enabling data reuse and knowledge discovery is increasingly critical in modern science, and requires an effort towards standardising data publication practices. This is particularly challenging in the plant phenotyping domain, due to its complexity and heterogeneity. We have produced the MIAPPE 1.1 release, which enhances the existing MIAPPE standard in coverage, to support perennial plants, in structure, through an explicit data model, and in clarity, through definitions and examples. We evaluated MIAPPE 1.1 by using it to express several heterogeneous phenotyping experiments in a range of different formats, to demonstrate its applicability and the interoperability between the various implementations. Furthermore, the extended coverage is demonstrated by the fact that one of the datasets could not have been described under MIAPPE 1.0. MIAPPE 1.1 marks a major step towards enabling plant phenotyping data reusability, thanks to its extended coverage, and especially the formalisation of its data model, which facilitates its implementation in different formats. Community feedback has been critical to this development, and will be a key part of ensuring adoption of the standard.
Collapse
|
37
|
Papoutsoglou EA, Faria D, Arend D, Arnaud E, Athanasiadis IN, Chaves I, Coppens F, Cornut G, Costa BV, Ćwiek-Kupczyńska H, Droesbeke B, Finkers R, Gruden K, Junker A, King GJ, Krajewski P, Lange M, Laporte MA, Michotey C, Oppermann M, Ostler R, Poorter H, Ramı Rez-Gonzalez R, Ramšak Ž, Reif JC, Rocca-Serra P, Sansone SA, Scholz U, Tardieu F, Uauy C, Usadel B, Visser RGF, Weise S, Kersey PJ, Miguel CM, Adam-Blondon AF, Pommier C. Enabling reusability of plant phenomic datasets with MIAPPE 1.1. THE NEW PHYTOLOGIST 2020. [PMID: 32171029 DOI: 10.15454/ah6u4a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Enabling data reuse and knowledge discovery is increasingly critical in modern science, and requires an effort towards standardising data publication practices. This is particularly challenging in the plant phenotyping domain, due to its complexity and heterogeneity. We have produced the MIAPPE 1.1 release, which enhances the existing MIAPPE standard in coverage, to support perennial plants, in structure, through an explicit data model, and in clarity, through definitions and examples. We evaluated MIAPPE 1.1 by using it to express several heterogeneous phenotyping experiments in a range of different formats, to demonstrate its applicability and the interoperability between the various implementations. Furthermore, the extended coverage is demonstrated by the fact that one of the datasets could not have been described under MIAPPE 1.0. MIAPPE 1.1 marks a major step towards enabling plant phenotyping data reusability, thanks to its extended coverage, and especially the formalisation of its data model, which facilitates its implementation in different formats. Community feedback has been critical to this development, and will be a key part of ensuring adoption of the standard.
Collapse
Affiliation(s)
- Evangelia A Papoutsoglou
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Daniel Faria
- BioData.pt, Instituto Gulbenkian de Ciência, 2780-156, Oeiras, Portugal
- INESC-ID, 1000-029, Lisboa, Portugal
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Elizabeth Arnaud
- Bioversity International, Parc Scientifique Agropolis II, Montpellier Cedex 5, 34397, France
| | - Ioannis N Athanasiadis
- Geo-Information Science and Remote Sensing Laboratory, Wageningen University, Droevendaalsesteeg 3, Wageningen, 6708PB, the Netherlands
| | - Inês Chaves
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- Instituto de Biologia Experimental e Tecnológica (iBET), 2780-157, Oeiras, Portugal
| | - Frederik Coppens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | | | - Bruno V Costa
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | - Hanna Ćwiek-Kupczyńska
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland
| | - Bert Droesbeke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | - Richard Finkers
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Kristina Gruden
- Department of Biotechnology and Systems Biology, National Institute of Biology, SI1000, Ljubljana, Slovenia
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Graham J King
- Southern Cross Plant Science, Southern Cross University, Lismore, NSW 2577, Australia
| | - Paweł Krajewski
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Marie-Angélique Laporte
- Bioversity International, Parc Scientifique Agropolis II, Montpellier Cedex 5, 34397, France
| | - Célia Michotey
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France
| | - Markus Oppermann
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Richard Ostler
- Computational and Analytical Sciences, Rothamsted Research, Harpenden, AL5 2JQ, UK
| | - Hendrik Poorter
- Plant Sciences (IBG-2), Forschungszentrum Jülich GmbH, D-52425, Jülich, Germany
- Department of Biological Sciences, Macquarie University, North Ryde, NSW 2109, Australia
| | | | - Živa Ramšak
- Department of Biotechnology and Systems Biology, National Institute of Biology, SI1000, Ljubljana, Slovenia
| | - Jochen C Reif
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - François Tardieu
- INRA, Laboratoire d'Ecophysiologie des Plantes sous Stress Environnementaux, UMR759, Montpellier, 34060, France
| | - Cristobal Uauy
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Colney, Norwich, NR4 7UH, UK
| | - Björn Usadel
- Plant Sciences (IBG-2), Forschungszentrum Jülich GmbH, D-52425, Jülich, Germany
- Institute for Biology I, BioSC, RWTH Aachen University, Worringer Weg 3, 52074, Aachen, Germany
| | - Richard G F Visser
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Stephan Weise
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | | | - Célia M Miguel
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | | | - Cyril Pommier
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France
| |
Collapse
|
38
|
König P, Beier S, Basterrechea M, Schüler D, Arend D, Mascher M, Stein N, Scholz U, Lange M. BRIDGE - A Visual Analytics Web Tool for Barley Genebank Genomics. FRONTIERS IN PLANT SCIENCE 2020; 11:701. [PMID: 32595658 PMCID: PMC7300248 DOI: 10.3389/fpls.2020.00701] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 05/04/2020] [Indexed: 05/05/2023]
Abstract
Genebanks harbor a large treasure trove of untapped plant genetic diversity. A growing world population and a changing climate require an increase in the production and development of stress resistant plant cultivars while decreasing the acreage. These requirements for improved plant cultivars can be supported by the broader exploitation of plant genetic resources (PGR) as inputs for genomics-assisted breeding. To support this process we have developed BRIDGE, a data warehouse and exploratory data analysis tool for genebank genomics of barley (Hordeum vulgare L.). Using efficient technologies for data storage, data transfer and web development, we facilitate access to digital genebank resources of barley by prioritizing the interactive and visual analysis of integrated genotypic and phenotypic data. The underlying data resulted from a barley genebank genomics study cataloging sequence and morphological data of 22,626 barley accessions, mainly from the German Federal ex situ genebank. BRIDGE consists of interactively coupled modules to visualize integrated, curated and quality checked data, such as variation data, results of dimensionality reduction and genome wide association studies (GWAS), phenotyping results, passport data as well as the geographic distribution of germplasm samples. The core component is a manager for custom collections of germplasm. A search module to find and select germplasm by passport and phenotypic attributes is included as well as modules to export genotypic data in gzip-compressed variant call format (VCF) files and phenotypic data in MIAPPE-compliant ISA-Tab files. BRIDGE is accessible at the following URL: https://bridge.ipk-gatersleben.de.
Collapse
Affiliation(s)
- Patrick König
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
| | - Sebastian Beier
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
| | - Martin Basterrechea
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
| | - Danuta Schüler
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Leipzig, Germany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
- Center for Integrated Breeding Research, Georg-August University, Göttingen, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
| |
Collapse
|
39
|
Gonzalez-Beltran AN, Masuzzo P, Ampe C, Bakker GJ, Besson S, Eibl RH, Friedl P, Gunzer M, Kittisopikul M, Dévédec SEL, Leo S, Moore J, Paran Y, Prilusky J, Rocca-Serra P, Roudot P, Schuster M, Sergeant G, Strömblad S, Swedlow JR, van Erp M, Van Troys M, Zaritsky A, Sansone SA, Martens L. Community standards for open cell migration data. Gigascience 2020; 9:giaa041. [PMID: 32396199 PMCID: PMC7317087 DOI: 10.1093/gigascience/giaa041] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2019] [Revised: 04/02/2020] [Accepted: 04/02/2020] [Indexed: 01/08/2023] Open
Abstract
Cell migration research has become a high-content field. However, the quantitative information encapsulated in these complex and high-dimensional datasets is not fully exploited owing to the diversity of experimental protocols and non-standardized output formats. In addition, typically the datasets are not open for reuse. Making the data open and Findable, Accessible, Interoperable, and Reusable (FAIR) will enable meta-analysis, data integration, and data mining. Standardized data formats and controlled vocabularies are essential for building a suitable infrastructure for that purpose but are not available in the cell migration domain. We here present standardization efforts by the Cell Migration Standardisation Organisation (CMSO), an open community-driven organization to facilitate the development of standards for cell migration data. This work will foster the development of improved algorithms and tools and enable secondary analysis of public datasets, ultimately unlocking new knowledge of the complex biological process of cell migration.
Collapse
Affiliation(s)
- Alejandra N Gonzalez-Beltran
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford OX1 3QG, Oxford, UK
| | - Paola Masuzzo
- VIB-UGent Center for Medical Biotechnology, VIB, A. Baertsoenkaai 3, B-9000, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, A. Baertsoenkaai 3, B-9000, Ghent, Belgium
- Institute for Globally Distributed Open Research and Education (IGDORE), Kabupaten Gianyar, Bali 80571, Indonesia
| | - Christophe Ampe
- Department of Biomolecular Medicine, Ghent University, A. Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Gert-Jan Bakker
- Department of Cell Biology, Radboud Institute for Molecular Life Sciences, Geert Grooteplein 28 6525 GA Nijmegen, The Netherlands
| | - Sébastien Besson
- Centre for Gene Regulation & Expression & Division of Computational Biology, University of Dundee, School of Life Sciences, Dow St Dundee DD1 5EH, Scotland, UK
| | - Robert H Eibl
- German Cancer Research Center, DKFZ Alumni Association, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Peter Friedl
- Department of Cell Biology, Radboud Institute for Molecular Life Sciences, Geert Grooteplein 28 6525 GA Nijmegen, The Netherlands
- David H. Koch Center for Applied Genitourinary Medicine, UT MD Anderson Cancer Center, 6767 Bertner Ave, Mitchell Basic Science Research Building, 77030 Houston, TX, USA
- Cancer Genomics Center, Universiteitsweg 100, 3584 CG Utrecht, The Netherlands
| | - Matthias Gunzer
- Institute for Experimental Immunology and Imaging, University Hospital, University Duisburg-Essen, Universitätsstr. 2, 45141 Essen, Germany
- Leibniz Institute for Analytical Sciences, ISAS, Bunsen-Kirchhoff-Straße 11, 44139 Dortmund, Germany
| | - Mark Kittisopikul
- Department of Biophysics, UT Southwestern Medical Center, 5323 Harry Hines Blvd. Dallas, TX 75390, USA
- Department of Cell and Developmental Biology, Feinberg School of Medicine, Northwestern University, 303 E. Chicago Ave, Chicago, IL 60611, USA
| | - Sylvia E Le Dévédec
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, PO box 9502 2300 RA Leiden, The Netherlands
| | - Simone Leo
- Centre for Gene Regulation & Expression & Division of Computational Biology, University of Dundee, School of Life Sciences, Dow St Dundee DD1 5EH, Scotland, UK
- Center for Advanced Studies, Research, and Development in Sardinia (CRS4), Loc. Piscina Manna, Edificio 1, 09050 Pula (CA) , Italy
| | - Josh Moore
- Centre for Gene Regulation & Expression & Division of Computational Biology, University of Dundee, School of Life Sciences, Dow St Dundee DD1 5EH, Scotland, UK
| | - Yael Paran
- IDEA Bio-Medical Ltd, 2 Prof. Bergman St., Rehovot 76705, Israel
| | - Jaime Prilusky
- Life Science Core Facilities, Weizmann Institute of Science, P.O. Box 26 Rehovot 76100, Israel
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford OX1 3QG, Oxford, UK
| | - Philippe Roudot
- Lyda Hill Department of Bioinformatics, UT Southwestern Medical Center, 5323 Harry Hines Blvd. Dallas, TX 75390, USA
| | - Marc Schuster
- Institute for Experimental Immunology and Imaging, University Hospital, University Duisburg-Essen, Universitätsstr. 2, 45141 Essen, Germany
| | - Gwendolien Sergeant
- Department of Biomolecular Medicine, Ghent University, A. Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Staffan Strömblad
- Department of Biosciences and Nutrition, Karolinska Institutet, Neo, SE-141 83 Huddinge, Sweden
| | - Jason R Swedlow
- Centre for Gene Regulation & Expression & Division of Computational Biology, University of Dundee, School of Life Sciences, Dow St Dundee DD1 5EH, Scotland, UK
| | - Merijn van Erp
- Department of Cell Biology, Radboud Institute for Molecular Life Sciences, Geert Grooteplein 28 6525 GA Nijmegen, The Netherlands
| | - Marleen Van Troys
- Department of Biomolecular Medicine, Ghent University, A. Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Assaf Zaritsky
- Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, P.O.B. 653, 8410501 Beer-Sheva, Israel
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford OX1 3QG, Oxford, UK
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, A. Baertsoenkaai 3, B-9000, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, A. Baertsoenkaai 3, B-9000, Ghent, Belgium
| |
Collapse
|
40
|
Watt M, Fiorani F, Usadel B, Rascher U, Muller O, Schurr U. Phenotyping: New Windows into the Plant for Breeders. ANNUAL REVIEW OF PLANT BIOLOGY 2020; 71:689-712. [PMID: 32097567 DOI: 10.1146/annurev-arplant-042916-041124] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Plant phenotyping enables noninvasive quantification of plant structure and function and interactions with environments. High-capacity phenotyping reaches hitherto inaccessible phenotypic characteristics. Diverse, challenging, and valuable applications of phenotyping have originated among scientists, prebreeders, and breeders as they study the phenotypic diversity of genetic resources and apply increasingly complex traits to crop improvement. Noninvasive technologies are used to analyze experimental and breeding populations. We cover the most recent research in controlled-environment and field phenotyping for seed, shoot, and root traits. Select field phenotyping technologies have become state of the art and show promise for speeding up the breeding process in early generations. We highlight the technologies behind the rapid advances in proximal and remote sensing of plants in fields. We conclude by discussing the new disciplines working with the phenotyping community: data science, to address the challenge of generating FAIR (findable, accessible, interoperable, and reusable) data, and robotics, to apply phenotyping directly on farms.
Collapse
Affiliation(s)
- Michelle Watt
- IBG-2: Plant Sciences, Institute of Bio- and Geosciences, Forschungszentrum Jülich, 52425 Jülich, Germany; ,
| | - Fabio Fiorani
- IBG-2: Plant Sciences, Institute of Bio- and Geosciences, Forschungszentrum Jülich, 52425 Jülich, Germany; ,
| | - Björn Usadel
- IBG-2: Plant Sciences, Institute of Bio- and Geosciences, Forschungszentrum Jülich, 52425 Jülich, Germany; ,
- Institute for Botany and Molecular Genetics, BioSC, RWTH Aachen University, 52074 Aachen, Germany
| | - Uwe Rascher
- IBG-2: Plant Sciences, Institute of Bio- and Geosciences, Forschungszentrum Jülich, 52425 Jülich, Germany; ,
| | - Onno Muller
- IBG-2: Plant Sciences, Institute of Bio- and Geosciences, Forschungszentrum Jülich, 52425 Jülich, Germany; ,
| | - Ulrich Schurr
- IBG-2: Plant Sciences, Institute of Bio- and Geosciences, Forschungszentrum Jülich, 52425 Jülich, Germany; ,
| |
Collapse
|
41
|
Vos RA, Katayama T, Mishima H, Kawano S, Kawashima S, Kim JD, Moriya Y, Tokimatsu T, Yamaguchi A, Yamamoto Y, Wu H, Amstutz P, Antezana E, Aoki NP, Arakawa K, Bolleman JT, Bolton E, Bonnal RJP, Bono H, Burger K, Chiba H, Cohen KB, Deutsch EW, Fernández-Breis JT, Fu G, Fujisawa T, Fukushima A, García A, Goto N, Groza T, Hercus C, Hoehndorf R, Itaya K, Juty N, Kawashima T, Kim JH, Kinjo AR, Kotera M, Kozaki K, Kumagai S, Kushida T, Lütteke T, Matsubara M, Miyamoto J, Mohsen A, Mori H, Naito Y, Nakazato T, Nguyen-Xuan J, Nishida K, Nishida N, Nishide H, Ogishima S, Ohta T, Okuda S, Paten B, Perret JL, Prathipati P, Prins P, Queralt-Rosinach N, Shinmachi D, Suzuki S, Tabata T, Takatsuki T, Taylor K, Thompson M, Uchiyama I, Vieira B, Wei CH, Wilkinson M, Yamada I, Yamanaka R, Yoshitake K, Yoshizawa AC, Dumontier M, Kosaki K, Takagi T. BioHackathon 2015: Semantics of data for life sciences and reproducible research. F1000Res 2020; 9:136. [PMID: 32308977 PMCID: PMC7141167 DOI: 10.12688/f1000research.18236.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/05/2020] [Indexed: 01/08/2023] Open
Abstract
We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.
Collapse
Affiliation(s)
- Rutger A. Vos
- Institute of Biology Leiden, Leiden University, Leiden, The Netherlands
- Naturalis Biodiversity Center, Leiden, The Netherlands
| | | | - Hiroyuki Mishima
- Department of Human Genetics, Nagasaki University Graduate School of Biomedical Sciences, Nagasaki, Japan
| | - Shin Kawano
- Database Center for Life Science, Tokyo, Japan
| | | | | | - Yuki Moriya
- Database Center for Life Science, Tokyo, Japan
| | | | | | | | - Hongyan Wu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | | | - Erick Antezana
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Nobuyuki P. Aoki
- Faculty of Science and Engineering, SOKA University, Tokyo, Japan
| | - Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Tokyo, Japan
| | - Jerven T. Bolleman
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Lausanne, Switzerland
| | - Evan Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | - Raoul J. P. Bonnal
- Istituto Nazionale Genetica Molecolare, Romeo ed Enrica Invernizzi, Milan, Italy
| | | | - Kees Burger
- Dutch Techcentre for Life Sciences, Utrecht, The Netherlands
| | - Hirokazu Chiba
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Kevin B. Cohen
- Computational Bioscience Program, University of Colorado School of Medicine, Denver, USA
- Université Paris-Saclay, LIMSI, CNRS, Paris, France
| | | | | | - Gang Fu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | | | | | | | - Naohisa Goto
- Research Institute for Microbial Diseases, Osaka University, Osaka, Japan
| | - Tudor Groza
- St Vincent's Clinical School, Faculty of Medicine, University of New South Wales, Darlinghurst, Australia
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, Australia
| | - Colin Hercus
- Novocraft Technologies Sdn. Bhd., Selangor, Malaysia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Kotone Itaya
- Institute for Advanced Biosciences, Keio University, Tokyo, Japan
| | - Nick Juty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | - Jee-Hyub Kim
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Akira R. Kinjo
- Institute for Protein Research, Osaka University, Osaka, Japan
| | - Masaaki Kotera
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Kouji Kozaki
- The Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan
| | | | - Tatsuya Kushida
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo, Japan
| | - Thomas Lütteke
- Institute of Veterinary Physiology and Biochemistry, Justus-Liebig University Giessen, Giessen, Germany
- Gesellschaft für innovative Personalwirtschaftssysteme mbH (GIP GmbH), Offenbach, Germany
| | | | | | - Attayeb Mohsen
- National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Hiroshi Mori
- Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Yuki Naito
- Database Center for Life Science, Tokyo, Japan
| | | | | | | | - Naoki Nishida
- Department of Systems Science, Osaka University, Osaka, Japan
| | - Hiroyo Nishide
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Soichi Ogishima
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Tazro Ohta
- Database Center for Life Science, Tokyo, Japan
| | - Shujiro Okuda
- Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, USA
| | | | - Philip Prathipati
- National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Pjotr Prins
- University Medical Center Utrecht, Utrecht, The Netherlands
- University of Tennessee Health Science Center, Memphis, USA
| | - Núria Queralt-Rosinach
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | | | - Shinya Suzuki
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Tsuyosi Tabata
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, Japan
| | | | - Kieron Taylor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Mark Thompson
- Leiden University Medical Center, Leiden, The Netherlands
| | - Ikuo Uchiyama
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Bruno Vieira
- WurmLab, School of Biological & Chemical Sciences, Queen Mary University of London, London, UK
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | - Mark Wilkinson
- Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid, Madrid, Spain
| | | | | | - Kazutoshi Yoshitake
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | | | - Michel Dumontier
- Institute of Data Science, Maastricht University, Maastricht, The Netherlands
| | - Kenjiro Kosaki
- Center for Medical Genetics, Keio University School of Medicine, Tokyo, Japan
| | - Toshihisa Takagi
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo, Japan
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
42
|
Open-Source Software Tools, Databases, and Resources for Single-Cell and Single-Cell-Type Metabolomics. Methods Mol Biol 2020; 2064:191-217. [PMID: 31565776 DOI: 10.1007/978-1-4939-9831-9_15] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
In this age of -omics data-guided big data revolution, metabolomics has received significant attention as compared to genomics, transcriptomics, and proteomics for its proximity to the phenotype, the promises it makes and the challenges it throws. Although metabolomes of entire organisms, organs, biofluids, and tissues are of immense interest, a cell-specific resolution is deemed critical for biomedical applications where a granular understanding of cellular metabolism at cell-type and subcellular resolution is desirable. Mass spectrometry (MS) is a versatile technique that is used to analyze a broad range of compounds from different species and cell-types, with high accuracy, resolution, sensitivity, selectivity, and fast data acquisition speeds. With recent advances in MS and spectroscopy-based platforms, the research community is able to generate high-throughput data sets from single cells. However, it is challenging to handle, store, process, analyze, and interpret data in a routine manner. In this treatise, I present a workflow of metabolomics data generation from single cells and single-cell types to their analysis, visualization, and interpretation for obtaining biological insights.
Collapse
|
43
|
Rocca-Serra P, Sansone SA. Experiment design driven FAIRification of omics data matrices, an exemplar. Sci Data 2019; 6:271. [PMID: 31831744 PMCID: PMC6908569 DOI: 10.1038/s41597-019-0286-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Accepted: 10/31/2019] [Indexed: 01/03/2023] Open
Abstract
We outline a principled approach to data FAIRification rooted in the notions of experimental design, and whose main intent is to clarify the semantics of data matrices. Using two related metabolomics datasets associated to journal articles, we perform retrospective data and metadata curation and re-annotation, using community, open, interoperability standards. The results are semantically-anchored data matrices, deposited in public archives, which are readable by software agents for data-level queries, and which can support the reproducibility and reuse of the data underpinning the publications.
Collapse
Affiliation(s)
- Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, United Kingdom.
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, United Kingdom.
| |
Collapse
|
44
|
Wist J. HastaLaVista, a web-based user interface for NMR-based untargeted metabolic profiling analysis in biomedical sciences: towards a new publication standard. J Cheminform 2019; 11:75. [PMID: 33430999 PMCID: PMC6896291 DOI: 10.1186/s13321-019-0399-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 11/27/2019] [Indexed: 02/19/2023] Open
Abstract
Metabolic profiling has been shown to be useful to improve our understanding of complex metabolic processes. Shared data are key to the analysis and validation of metabolic profiling and untargeted spectral analysis and may increase the pace of new discovery. Improving the existing portfolio of open software may increase the fraction of shared data by decreasing the amount of effort required to publish them in a manner that is useful to others. However, a weakness of open software, when compared to commercial ones, is the lack of user-friendly graphical interface that may discourage inexperienced researchers. Here, a web-browser-oriented solution is presented and demonstrated for metabolic profiling analysis that combines the power of R for back-end statistical analyses and of JavaScript for front-end visualisations and user interactivity. This unique combination of statistical programming and web-browser visualisation brings enhanced data interoperability and interactivity into the open source realm. It is exemplified by characterizing the extent to which bariatric surgery perturbs the metabolisms of rats, showing the value of the approach in iterative analysis by the end-user to establish a deeper understanding of the system perturbation. HastaLaVista is available at: (https://github.com/jwist/hastaLaVista, 10.5281/zenodo.3544800) under MIT license. The approach described in this manuscript can be extended to connect the interface to other scripting languages such as Python, and to create interfaces for other types of data analysis.
Collapse
Affiliation(s)
- Julien Wist
- Chemistry Department, Universidad del Valle, Cali, 76001, Valle del Cauca, Colombia.
| |
Collapse
|
45
|
Bucher E, Claunch CJ, Hee D, Smith RL, Devlin K, Thompson W, Korkola JE, Heiser LM. Annot: a Django-based sample, reagent, and experiment metadata tracking system. BMC Bioinformatics 2019; 20:542. [PMID: 31675914 PMCID: PMC6824123 DOI: 10.1186/s12859-019-3147-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2019] [Accepted: 10/02/2019] [Indexed: 11/11/2022] Open
Abstract
Background In biological experiments, comprehensive experimental metadata tracking – which comprises experiment, reagent, and protocol annotation with controlled vocabulary from established ontologies – remains a challenge, especially when the experiment involves multiple laboratory scientists who execute different steps of the protocol. Here we describe Annot, a novel web application designed to provide a flexible solution for this task. Results Annot enforces the use of controlled vocabulary for sample and reagent annotation while enabling robust investigation, study, and protocol tracking. The cornerstone of Annot’s implementation is a json syntax-compatible file format, which can capture detailed metadata for all aspects of complex biological experiments. Data stored in this json file format can easily be ported into spreadsheet or data frame files that can be loaded into R (https://www.r-project.org/) or Pandas, Python’s data analysis library (https://pandas.pydata.org/). Annot is implemented in Python3 and utilizes the Django web framework, Postgresql, Nginx, and Debian. It is deployed via Docker and supports all major browsers. Conclusions Annot offers a robust solution to annotate samples, reagents, and experimental protocols for established assays where multiple laboratory scientists are involved. Further, it provides a framework to store and retrieve metadata for data analysis and integration, and therefore ensures that data generated in different experiments can be integrated and jointly analyzed. This type of solution to metadata tracking can enhance the utility of large-scale datasets, which we demonstrate here with a large-scale microenvironment microarray study.
Collapse
|
46
|
Venayak N, Raj K, Mahadevan R. Impact framework: A python package for writing data analysis workflows to interpret microbial physiology. Metab Eng Commun 2019; 9:e00089. [PMID: 31011536 PMCID: PMC6462781 DOI: 10.1016/j.mec.2019.e00089] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Revised: 03/19/2019] [Accepted: 03/19/2019] [Indexed: 12/26/2022] Open
Abstract
Microorganisms can be genetically engineered to solve a range of challenges in diverse including health, environmental protection and sustainability. The natural complexity of biological systems makes this an iterative cycle, perturbing metabolism and making stepwise progress toward a desired phenotype through four major stages: design, build, test, and data interpretation. This cycle has been accelerated by advances in molecular biology (e.g. robust DNA synthesis and assembly techniques), liquid handling automation and scale-down characterization platforms, generating large heterogeneous data sets. Here, we present an extensible Python package for scientists and engineers working with large biological data sets to interpret, model, and visualize data: the IMPACT (Integrated Microbial Physiology: Analysis, Characterization and Translation) framework. Impact aims to ease the development of Python-based data analysis workflows for a range of stakeholders in the bioengineering process, offering open-source tools for data analysis, physiology characterization and translation to visualization. Using this framework, biologists and engineers can opt for reproducible and extensible programmatic data analysis workflows, mediating a bottleneck limiting the throughput of microbial engineering. The Impact framework is available at https://github.com/lmse/impact.
Collapse
Affiliation(s)
- Naveen Venayak
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, 200 College Street, Toronto, ON, M5S 3E5, Canada
| | - Kaushik Raj
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, 200 College Street, Toronto, ON, M5S 3E5, Canada
| | - Radhakrishnan Mahadevan
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, 200 College Street, Toronto, ON, M5S 3E5, Canada
- Institute of Biomaterials and Biomedical Engineering, University of Toronto, 164 College Street, Toronto, ON, M5S 3G9, Canada
| |
Collapse
|
47
|
The Empusa code generator and its application to GBOL, an extendable ontology for genome annotation. Sci Data 2019; 6:254. [PMID: 31685817 PMCID: PMC6828702 DOI: 10.1038/s41597-019-0263-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 10/11/2019] [Indexed: 11/08/2022] Open
Abstract
The RDF data model facilitates integration of diverse data available in structured and semi-structured formats. To obtain a coherent RDF graph the chosen ontology must be consistently applied. However, addition of new diverse data causes the ontology to evolve, which could lead to accumulation of unintended erroneous composites. Thus, there is a need for a gate keeping system that compares the intended content described in the ontology with the actual content of the resource. The Empusa code generator facilitates creation of composite RDF resources from disparate sources. Empusa can convert a schema into an associated application programming interface (API), that can be used to perform data consistency checks and generates Markdown documentation to make persistent URLs resolvable. Using Empusa consistency is ensured within and between the ontology and the content of the resource. As an illustration of the potential of Empusa, we present the Genome Biology Ontology Language (GBOL). GBOL uses and extends current ontologies to provide a formal representation of genomic entities, along with their properties, relations and provenance.
Collapse
|
48
|
Zielinski T, Hay J, Millar AJ. The grant is dead, long live the data - migration as a pragmatic exit strategy for research data preservation. Wellcome Open Res 2019; 4:104. [PMID: 31363499 PMCID: PMC6652102 DOI: 10.12688/wellcomeopenres.15341.2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/18/2019] [Indexed: 11/24/2022] Open
Abstract
Open research, data sharing and data re-use have become a priority for publicly- and charity-funded research. Efficient data management naturally requires computational resources that assist in data description, preservation and discovery. While it is possible to fund development of data management systems, currently it is more difficult to sustain data resources beyond the original grants. That puts the safety of the data at risk and undermines the very purpose of data gathering. PlaSMo stands for ‘Plant Systems-biology Modelling’ and the PlaSMo model repository was envisioned by the plant systems biology community in 2005 with the initial funding lasting until 2010. We addressed the sustainability of the PlaSMo repository and assured preservation of these data by implementing an exit strategy. For our exit strategy we migrated data to an alternative, public repository with secured funding. We describe details of our decision process and aspects of the implementation. Our experience may serve as an example for other projects in a similar situation. We share our reflections on the sustainability of biological data management and the future outcomes of its funding. We expect it to be a useful input for funding bodies.
Collapse
Affiliation(s)
- Tomasz Zielinski
- SynthSys and School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3BF, UK
| | - Johnny Hay
- EPCC, University of Edinburgh, Edinburgh, EH9 3FD, UK
| | - Andrew J Millar
- SynthSys and School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3BF, UK
| |
Collapse
|
49
|
Macklin P. Key challenges facing data-driven multicellular systems biology. Gigascience 2019; 8:giz127. [PMID: 31648301 PMCID: PMC6812467 DOI: 10.1093/gigascience/giz127] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2018] [Revised: 09/27/2019] [Accepted: 09/30/2019] [Indexed: 12/17/2022] Open
Abstract
Increasingly sophisticated experiments, coupled with large-scale computational models, have the potential to systematically test biological hypotheses to drive our understanding of multicellular systems. In this short review, we explore key challenges that must be overcome to achieve robust, repeatable data-driven multicellular systems biology. If these challenges can be solved, we can grow beyond the current state of isolated tools and datasets to a community-driven ecosystem of interoperable data, software utilities, and computational modeling platforms. Progress is within our grasp, but it will take community (and financial) commitment.
Collapse
Affiliation(s)
- Paul Macklin
- Department of Intelligent Systems Engineering, Indiana University, 700 N Woodlawn Ave, Bloomington, IN 47408, USA
| |
Collapse
|
50
|
Hoffmann N, Hartler J, Ahrends R. jmzTab-M: A Reference Parser, Writer, and Validator for the Proteomics Standards Initiative mzTab 2.0 Metabolomics Standard. Anal Chem 2019; 91:12615-12618. [PMID: 31525911 DOI: 10.1021/acs.analchem.9b01987] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
mzTab 2.0 for metabolomics (mzTab-M) is the most recent standard format developed in collaboration by the Proteomics and Metabolomics Standards Initiatives including contributions by the recently founded Lipidomics Standards Initiative. mzTab-M is a redesign of the original mzTab format which was geared toward reporting of proteomics results and, as such, provided only limited support for metabolites. As a tab-delimited, spreadsheet-like format, mzTab-M captures experimental metadata, summary information on small molecules across assays, MS features as a basis for quantitation, and evidence to support the reporting of individual or feature group identifications. Here, we present the Java reference implementation for reading, writing, and validating mzTab-M files. Furthermore, we provide a web application for conveniently validating mzTab-M files by a graphical user interface, and a command line validator that accompanies the library. The jmzTab-M library, version 1.0.4 ( https://doi.org/10.5281/zenodo.3362151 ), is available at https://github.com/lifs-tools/jmzTab-m and from Maven Central at https://search.maven.org/search?q=jmztabm under the terms of the open source Apache License 2.0. The web application as well as the Python and R implementations are available at https://github.com/lifs-tools . The respective Web sites link to additional API documentation, as well as to usage examples.
Collapse
Affiliation(s)
- Nils Hoffmann
- Leibniz-Institut für Analytische Wissenschaften-ISAS-e.V. , Otto-Hahn-Straße 6b , 44227 Dortmund , Germany
| | - Jürgen Hartler
- Department of Pharmacology , University of California San Diego , 9500 Gilman Drive , La Jolla , California 92093 , United States.,Institute of Computational Biotechnology , Graz University of Technology , Petersgasse 14 , 8010 Graz , Austria
| | - Robert Ahrends
- Leibniz-Institut für Analytische Wissenschaften-ISAS-e.V. , Otto-Hahn-Straße 6b , 44227 Dortmund , Germany.,Department of Analytical Chemistry, Faculty of Chemistry , University of Vienna , Währinger Straße 38 , 1090 Wien , Austria
| |
Collapse
|