1
|
Leigh DM, Vandergast AG, Hunter ME, Crandall ED, Funk WC, Garroway CJ, Hoban S, Oyler-McCance SJ, Rellstab C, Segelbacher G, Schmidt C, Vázquez-Domínguez E, Paz-Vinas I. Best practices for genetic and genomic data archiving. Nat Ecol Evol 2024:10.1038/s41559-024-02423-7. [PMID: 38789640 DOI: 10.1038/s41559-024-02423-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 04/25/2024] [Indexed: 05/26/2024]
Abstract
Genetic and genomic data are collected for a vast array of scientific and applied purposes. Despite mandates for public archiving, data are typically used only by the generating authors. The reuse of genetic and genomic datasets remains uncommon because it is difficult, if not impossible, due to non-standard archiving practices and lack of contextual metadata. But as the new field of macrogenetics is demonstrating, if genetic data and their metadata were more accessible and FAIR (findable, accessible, interoperable and reusable) compliant, they could be reused for many additional purposes. We discuss the main challenges with existing genetic and genomic data archives, and suggest best practices for archiving genetic and genomic data. Recognizing that this is a longstanding issue due to little formal data management training within the fields of ecology and evolution, we highlight steps that research institutions and publishers could take to improve data archiving.
Collapse
Affiliation(s)
- Deborah M Leigh
- Swiss Federal Research Institute WSL, Birmensdorf, Switzerland.
| | - Amy G Vandergast
- US Geological Survey, Western Ecological Research Center, San Diego, CA, USA
| | - Margaret E Hunter
- US Geological Survey, Wetland & Aquatic Research Center, Gainesville, FL, USA
| | - Eric D Crandall
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - W Chris Funk
- Department of Biology, Graduate Degree Program in Ecology, Colorado State University, Fort Collins, CO, USA
| | - Colin J Garroway
- Department of Biological Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Sean Hoban
- Center for Tree Science, The Morton Arboretum, Lisle, IL, USA
| | | | | | | | - Chloé Schmidt
- German Centre for Integrative Biodiversity Research Halle-Jena-Leipzig, Leipzig, Germany
| | - Ella Vázquez-Domínguez
- Departamento de Ecología de la Biodiversidad, Instituto de Ecología, Universidad Nacional Autónoma de México, Coyoacán, Ciudad de México, México
| | - Ivan Paz-Vinas
- Department of Biology, Graduate Degree Program in Ecology, Colorado State University, Fort Collins, CO, USA
- Universite Claude Bernard Lyon 1, LEHNA UMR 5023, CNRS, ENTPE, Villeurbanne, France
| |
Collapse
|
2
|
Seep L, Grein S, Splichalova I, Ran D, Mikhael M, Hildebrand S, Lauterbach M, Hiller K, Ribeiro DJS, Sieckmann K, Kardinal R, Huang H, Yu J, Kallabis S, Behrens J, Till A, Peeva V, Strohmeyer A, Bruder J, Blum T, Soriano-Arroquia A, Tischer D, Kuellmer K, Li Y, Beyer M, Gellner AK, Fromme T, Wackerhage H, Klingenspor M, Fenske WK, Scheja L, Meissner F, Schlitzer A, Mass E, Wachten D, Latz E, Pfeifer A, Hasenauer J. From Planning Stage Towards FAIR Data: A Practical Metadatasheet For Biomedical Scientists. Sci Data 2024; 11:524. [PMID: 38778016 PMCID: PMC11111677 DOI: 10.1038/s41597-024-03349-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 05/08/2024] [Indexed: 05/25/2024] Open
Abstract
Datasets consist of measurement data and metadata. Metadata provides context, essential for understanding and (re-)using data. Various metadata standards exist for different methods, systems and contexts. However, relevant information resides at differing stages across the data-lifecycle. Often, this information is defined and standardized only at publication stage, which can lead to data loss and workload increase. In this study, we developed Metadatasheet, a metadata standard based on interviews with members of two biomedical consortia and systematic screening of data repositories. It aligns with the data-lifecycle allowing synchronous metadata recording within Microsoft Excel, a widespread data recording software. Additionally, we provide an implementation, the Metadata Workbook, that offers user-friendly features like automation, dynamic adaption, metadata integrity checks, and export options for various metadata standards. By design and due to its extensive documentation, the proposed metadata standard simplifies recording and structuring of metadata for biomedical scientists, promoting practicality and convenience in data management. This framework can accelerate scientific progress by enhancing collaboration and knowledge transfer throughout the intermediate steps of data creation.
Collapse
Affiliation(s)
- Lea Seep
- Computational Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Stephan Grein
- Computational Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Iva Splichalova
- Developmental Biology of the Immune System, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Danli Ran
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
| | - Mickel Mikhael
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
| | - Staffan Hildebrand
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
| | - Mario Lauterbach
- Department of Bioinformatics and Biochemistry, Technical University Braunschweig, Braunschweig, Germany
| | - Karsten Hiller
- Department of Bioinformatics and Biochemistry, Technical University Braunschweig, Braunschweig, Germany
| | | | - Katharina Sieckmann
- Institute of Innate Immunity, University Hospital Bonn, University of Bonn, Bonn, Germany
| | - Ronja Kardinal
- Institute of Innate Immunity, University Hospital Bonn, University of Bonn, Bonn, Germany
| | - Hao Huang
- Developmental Biology of the Immune System, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Jiangyan Yu
- Computational Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
- Quantitative Systems Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Sebastian Kallabis
- Systems Immunology and Proteomics, Institute of Innate Immunity, Medical Faculty, University of Bonn, Bonn, Germany
| | - Janina Behrens
- Department of Biochemistry and Molecular Cell Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Andreas Till
- Department of Internal Medicine I, Division of Endocrinology, Diabetes and Metabolism, University Medical Center Bonn, Bonn, Germany
| | - Viktoriya Peeva
- Department of Internal Medicine I, Division of Endocrinology, Diabetes and Metabolism, University Medical Center Bonn, Bonn, Germany
| | - Akim Strohmeyer
- Chair of Molecular Nutritional Medicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Johanna Bruder
- Chair of Molecular Nutritional Medicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Tobias Blum
- Immunology and Environment, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Ana Soriano-Arroquia
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
| | - Dominik Tischer
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
| | - Katharina Kuellmer
- Chair of Molecular Nutritional Medicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Yuanfang Li
- Immunogenomics & Neurodegeneration, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
| | - Marc Beyer
- Immunogenomics & Neurodegeneration, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
- PRECISE, Platform for Single Cell Genomics and Epigenomics at the German Center for Neurodegenerative Diseases and the University of Bonn, Bonn, Germany
| | - Anne-Kathrin Gellner
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany
- Institute of Physiology II, Medical Faculty, University of Bonn, Bonn, Germany
| | - Tobias Fromme
- Chair of Molecular Nutritional Medicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Henning Wackerhage
- School for Medicine and Health, Faculty of Sport and Health Sciences, Technical University of Munich, Munich, Germany
| | - Martin Klingenspor
- Chair of Molecular Nutritional Medicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- EKFZ-Else Kröner-Fresenius Center for Nutritional Medicine, Technical University of Munich, Freising, Germany
- ZIEL Institute for Food & Health, Technical University of Munich, Freising, Germany
| | - Wiebke K Fenske
- Department of Internal Medicine I, Division of Endocrinology, Diabetes and Metabolism, University Medical Center Bonn, Bonn, Germany
- Department of Internal Medicine I - Endocrinology, Diabetology and Metabolism, Gastroenterology and Hepatology, University Hospital Bergmannsheil, Bochum, Germany
| | - Ludger Scheja
- Department of Biochemistry and Molecular Cell Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Felix Meissner
- Systems Immunology and Proteomics, Institute of Innate Immunity, Medical Faculty, University of Bonn, Bonn, Germany
- Experimental Systems Immunology, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Andreas Schlitzer
- Quantitative Systems Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Elvira Mass
- Developmental Biology of the Immune System, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Dagmar Wachten
- Institute of Innate Immunity, University Hospital Bonn, University of Bonn, Bonn, Germany
| | - Eicke Latz
- Institute of Innate Immunity, University Hospital Bonn, University of Bonn, Bonn, Germany
| | - Alexander Pfeifer
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
- PharmaCenter Bonn, University of Bonn, Bonn, Germany
| | - Jan Hasenauer
- Computational Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany.
- Helmholtz Center Munich, German Research Center for Environmental Health, Computational Health Center, Munich, Germany.
| |
Collapse
|
3
|
Dumschott K, Dörpholz H, Laporte MA, Brilhaus D, Schrader A, Usadel B, Neumann S, Arnaud E, Kranz A. Ontologies for increasing the FAIRness of plant research data. FRONTIERS IN PLANT SCIENCE 2023; 14:1279694. [PMID: 38098789 PMCID: PMC10720748 DOI: 10.3389/fpls.2023.1279694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/15/2023] [Indexed: 12/17/2023]
Abstract
The importance of improving the FAIRness (findability, accessibility, interoperability, reusability) of research data is undeniable, especially in the face of large, complex datasets currently being produced by omics technologies. Facilitating the integration of a dataset with other types of data increases the likelihood of reuse, and the potential of answering novel research questions. Ontologies are a useful tool for semantically tagging datasets as adding relevant metadata increases the understanding of how data was produced and increases its interoperability. Ontologies provide concepts for a particular domain as well as the relationships between concepts. By tagging data with ontology terms, data becomes both human- and machine- interpretable, allowing for increased reuse and interoperability. However, the task of identifying ontologies relevant to a particular research domain or technology is challenging, especially within the diverse realm of fundamental plant research. In this review, we outline the ontologies most relevant to the fundamental plant sciences and how they can be used to annotate data related to plant-specific experiments within metadata frameworks, such as Investigation-Study-Assay (ISA). We also outline repositories and platforms most useful for identifying applicable ontologies or finding ontology terms.
Collapse
Affiliation(s)
- Kathryn Dumschott
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Hannah Dörpholz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Marie-Angélique Laporte
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Dominik Brilhaus
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Andrea Schrader
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), University of Cologne, Cologne, Germany
| | - Björn Usadel
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
- Institute for Biological Data Science & Cluster of Excellence on Plant Sciences (CEPLAS), Faculty of Mathematics and Life Sciences, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Steffen Neumann
- Program Center MetaCom, Leibniz Institute of Plant Biochemistry, Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| | - Elizabeth Arnaud
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Angela Kranz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| |
Collapse
|
4
|
Chicco D, Cumbo F, Angione C. Ten quick tips for avoiding pitfalls in multi-omics data integration analyses. PLoS Comput Biol 2023; 19:e1011224. [PMID: 37410704 DOI: 10.1371/journal.pcbi.1011224] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023] Open
Abstract
Data are the most important elements of bioinformatics: Computational analysis of bioinformatics data, in fact, can help researchers infer new knowledge about biology, chemistry, biophysics, and sometimes even medicine, influencing treatments and therapies for patients. Bioinformatics and high-throughput biological data coming from different sources can even be more helpful, because each of these different data chunks can provide alternative, complementary information about a specific biological phenomenon, similar to multiple photos of the same subject taken from different angles. In this context, the integration of bioinformatics and high-throughput biological data gets a pivotal role in running a successful bioinformatics study. In the last decades, data originating from proteomics, metabolomics, metagenomics, phenomics, transcriptomics, and epigenomics have been labelled -omics data, as a unique name to refer to them, and the integration of these omics data has gained importance in all biological areas. Even if this omics data integration is useful and relevant, due to its heterogeneity, it is not uncommon to make mistakes during the integration phases. We therefore decided to present these ten quick tips to perform an omics data integration correctly, avoiding common mistakes we experienced or noticed in published studies in the past. Even if we designed our ten guidelines for beginners, by using a simple language that (we hope) can be understood by anyone, we believe our ten recommendations should be taken into account by all the bioinformaticians performing omics data integration, including experts.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Fabio Cumbo
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Claudio Angione
- School of Computing Engineering and Digital Technologies, Teesside University, Middlesbrough, United Kingdom
| |
Collapse
|
5
|
Richter J, Lange F, Scheper T, Solle D, Beutel S. Digitale Zwillinge in der Bioprozesstechnik – Chancen und Möglichkeiten. CHEM-ING-TECH 2022. [DOI: 10.1002/cite.202200166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Jannik Richter
- Leibniz Universität Hannover Institut für Technische Chemie Callinstraße 5 30167 Hannover Deutschland
| | - Ferdinand Lange
- Leibniz Universität Hannover Institut für Technische Chemie Callinstraße 5 30167 Hannover Deutschland
| | - Thomas Scheper
- Leibniz Universität Hannover Institut für Technische Chemie Callinstraße 5 30167 Hannover Deutschland
| | - Dörte Solle
- Leibniz Universität Hannover Institut für Technische Chemie Callinstraße 5 30167 Hannover Deutschland
| | - Sascha Beutel
- Leibniz Universität Hannover Institut für Technische Chemie Callinstraße 5 30167 Hannover Deutschland
| |
Collapse
|
6
|
Burgin J, Ahamed A, Cummins C, Devraj R, Gueye K, Gupta D, Gupta V, Haseeb M, Ihsan M, Ivanov E, Jayathilaka S, Balavenkataraman Kadhirvelu V, Kumar M, Lathi A, Leinonen R, Mansurova M, McKinnon J, O’Cathail C, Paupério J, Pesant S, Rahman N, Rinck G, Selvakumar S, Suman S, Vijayaraja S, Waheed Z, Woollard P, Yuan D, Zyoud A, Burdett T, Cochrane G. The European Nucleotide Archive in 2022. Nucleic Acids Res 2022; 51:D121-D125. [PMID: 36399492 PMCID: PMC9825583 DOI: 10.1093/nar/gkac1051] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/21/2022] [Accepted: 10/25/2022] [Indexed: 11/19/2022] Open
Abstract
The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena), maintained by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), offers those producing data an open and supported platform for the management, archiving, publication, and dissemination of data; and to the scientific community as a whole, it offers a globally comprehensive data set through a host of data discovery and retrieval tools. Here, we describe recent updates to the ENA's submission and retrieval services as well as focused efforts to improve connectivity, reusability, and interoperability of ENA data and metadata.
Collapse
Affiliation(s)
- Josephine Burgin
- To whom correspondence should be addressed. Tel: +44 1223 49 4246; Fax: +44 1223 494 468;
| | - Alisha Ahamed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carla Cummins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rajkumar Devraj
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Khadim Gueye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dipayan Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Vikas Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Muhammad Haseeb
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Maira Ihsan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Eugene Ivanov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Suran Jayathilaka
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Manish Kumar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ankur Lathi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rasko Leinonen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Milena Mansurova
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jasmine McKinnon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Colman O’Cathail
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Joana Paupério
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stéphane Pesant
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nadim Rahman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Gabriele Rinck
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sandeep Selvakumar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Swati Suman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Senthilnathan Vijayaraja
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zahra Waheed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter Woollard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David Yuan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ahmad Zyoud
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Guy Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
7
|
Shaw F, Minotto A, McTaggart S, Providence A, Harrison P, Paupério J, Rajan J, Burgin J, Cochrane G, Kilias E, Lawniczak M, Davey R. Managing sample metadata for biodiversity: considerations from the Darwin Tree of Life project. Wellcome Open Res 2022. [DOI: 10.12688/wellcomeopenres.18499.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Large-scale reference genome sequencing projects for all of biodiversity are underway and common standards have been in place for some years to enable the understanding and sharing of sequence data. However, the metadata that describes the collection, processing and management of samples, and link to the associated sequencing and genome data, are not yet adequately developed and standardised for these projects. At the time of writing, the Darwin Tree of Life (DToL) Project is over two years into its ten-year ambition to sequence all described eukaryotic species in Britain and Ireland. We have sought consensus from a wide range of scientists across taxonomic domains to determine the minimal set of metadata that we collectively deem as critically important to accompany each sequenced specimen. These metadata are made available throughout the subsequent laboratory processes, and once collected, need to be adequately managed to fulfil the requirements of good data management practice. Due to the size and scale of management required, software tools are needed. These tools need to implement rigorous development pathways and change management procedures to ensure that effective research data management of key project and sample metadata is maintained. Tracking of sample properties through the sequencing process is handled by Lab Information Management Systems (LIMS), so publication of the sequenced data is achieved via technical integration of LIMS and data management tools. Discussions with community members on how metadata standards need to be managed within large-scale programmes is a priority in the planning process. Here we report on the standards we developed with respect to a robust and reusable mechanism of metadata collection, in the hopes that other projects forthcoming or underway will adopt these practices for metadata.
Collapse
|
8
|
pISA-tree - a data management framework for life science research projects using a standardised directory tree. Sci Data 2022; 9:685. [DOI: 10.1038/s41597-022-01805-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 10/24/2022] [Indexed: 11/12/2022] Open
Abstract
AbstractWe developed pISA-tree, a straightforward and flexible data management solution for organisation of life science project-associated research data and metadata. pISA-tree was initiated by end-user requirements thus its strong points are practicality and low maintenance cost. It enables on-the-fly creation of enriched directory tree structure (project/Investigation/Study/Assay) based on the ISA model, in a standardised manner via consecutive batch files. Templates-based metadata is generated in parallel at each level enabling guided submission of experiment metadata. pISA-tree is complemented by two R packages, pisar and seekr. pisar facilitates integration of pISA-tree datasets into bioinformatic pipelines and generation of ISA-Tab exports. seekr enables synchronisation with the FAIRDOMHub repository. Applicability of pISA-tree was demonstrated in several national and international multi-partner projects. The system thus supports findable, accessible, interoperable and reusable (FAIR) research and is in accordance with the Open Science initiative. Source code and documentation of pISA-tree are available at https://github.com/NIB-SI/pISA-tree.
Collapse
|
9
|
Waterhouse RM, Adam-Blondon AF, Agosti D, Baldrian P, Balech B, Corre E, Davey RP, Lantz H, Pesole G, Quast C, Glöckner FO, Raes N, Sandionigi A, Santamaria M, Addink W, Vohradsky J, Nunes-Jorge A, Willassen NP, Lanfear J. Recommendations for connecting molecular sequence and biodiversity research infrastructures through ELIXIR. F1000Res 2022; 10. [PMID: 35999898 PMCID: PMC9360911 DOI: 10.12688/f1000research.73825.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/27/2022] [Indexed: 12/03/2022] Open
Abstract
Threats to global biodiversity are increasingly recognised by scientists and the public as a critical challenge. Molecular sequencing technologies offer means to catalogue, explore, and monitor the richness and biogeography of life on Earth. However, exploiting their full potential requires tools that connect biodiversity infrastructures and resources. As a research infrastructure developing services and technical solutions that help integrate and coordinate life science resources across Europe, ELIXIR is a key player. To identify opportunities, highlight priorities, and aid strategic thinking, here we survey approaches by which molecular technologies help inform understanding of biodiversity. We detail example use cases to highlight how DNA sequencing is: resolving taxonomic issues; Increasing knowledge of marine biodiversity; helping understand how agriculture and biodiversity are critically linked; and playing an essential role in ecological studies. Together with examples of national biodiversity programmes, the use cases show where progress is being made but also highlight common challenges and opportunities for future enhancement of underlying technologies and services that connect molecular and wider biodiversity domains. Based on emerging themes, we propose key recommendations to guide future funding for biodiversity research: biodiversity and bioinformatic infrastructures need to collaborate closely and strategically; taxonomic efforts need to be aligned and harmonised across domains; metadata needs to be standardised and common data management approaches widely adopted; current approaches need to be scaled up dramatically to address the anticipated explosion of molecular data; bioinformatics support for biodiversity research needs to be enabled and sustained; training for end users of biodiversity research infrastructures needs to be prioritised; and community initiatives need to be proactive and focused on enabling solutions. For sequencing data to deliver their full potential they must be connected to knowledge: together, molecular sequence data collection initiatives and biodiversity research infrastructures can advance global efforts to prevent further decline of Earth’s biodiversity.
Collapse
Affiliation(s)
- Robert M. Waterhouse
- Department of Ecology and Evolution and Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Vaud, 1015, Switzerland
| | | | | | - Petr Baldrian
- Institute of Microbiology of the Czech Academy of Sciences, Praha, 142 20, Czech Republic
| | - Bachir Balech
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, CNR, Bari, 70126, Italy
| | - Erwan Corre
- CNRS/Sorbonne Université, Station Biologique de Roscoff, Roscoff, 29680, France
| | | | - Henrik Lantz
- Department of Medical Biochemistry and Microbiology/NBIS, Uppsala University, Uppsala, Sweden
| | - Graziano Pesole
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, CNR, Bari, 70126, Italy
- Department of Biosciences. Biotechnology and Biopharmaceutics, University of Bari “A. Moro”, Bari, 70126, Italy
| | - Christian Quast
- Life Sciences & Chemistry, Jacobs University Bremen gGmbH, Bremen, Germany
| | - Frank Oliver Glöckner
- MARUM - Center for Marine Environmental Sciences, University of Bremen, Bremerhaven, 27570, Germany
- Alfred Wegener Institute, Helmholtz Center for Polar- and Marine Research, Bremerhaven, 27570, Germany
| | - Niels Raes
- NLBIF - Netherlands Biodiversity Information Facility, Naturalis Biodiversity Center, Leiden, 2300 RA, The Netherlands
| | | | - Monica Santamaria
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, CNR, Bari, 70126, Italy
| | - Wouter Addink
- DiSSCo - Distributed System of Scientific Collections, Naturalis Biodiversity Center, Leiden, 2300 RA, The Netherlands
| | - Jiri Vohradsky
- Laboratory of Bioinformatics, Institute of Microbiology, Prague, 142 20, Czech Republic
| | | | | | - Jerry Lanfear
- ELIXIR Hub, Wellcome Genome Campus, Cambridge, CB10 1SD, UK
| |
Collapse
|
10
|
Chaerony Siffa I, Schäfer J, Becker MM. Adamant: a JSON schema-based metadata editor for research data management workflows. F1000Res 2022; 11:475. [PMID: 35707001 PMCID: PMC9178528 DOI: 10.12688/f1000research.110875.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/18/2022] [Indexed: 11/20/2022] Open
Abstract
The web tool Adamant has been developed to systematically collect research metadata as early as the conception of the experiment. Adamant enables a continuous, consistent, and transparent research data management (RDM) process, which is a key element of good scientific practice ensuring the path to Findable, Accessible, Interoperable, Reusable (FAIR) research data. It simplifies the creation of on-demand metadata schemas and the collection of metadata according to established or new standards. The approach is based on JavaScript Object Notation (JSON) schema, where any valid schema can be presented as an interactive web-form. Furthermore, Adamant eases the integration of numerous available RDM methods and software tools into the everyday research activities of especially small independent laboratories. A programming interface allows programmatic integration with other software tools such as electronic lab books or repositories. The user interface (UI) of Adamant is designed to be as user friendly as possible. Each UI element is self-explanatory and intuitive to use, which makes it accessible for users that have little to no experience with JSON format and programming in general. Several examples of research data management workflows that can be implemented using Adamant are introduced. Adamant (client-only version) is available from: https://plasma-mds.github.io/adamant.
Collapse
Affiliation(s)
- Ihda Chaerony Siffa
- Leibniz Institute for Plasma Science and Technology (INP), Greifswald, Felix-Hausdorff-Straße 2, 17489, Germany
| | - Jan Schäfer
- Leibniz Institute for Plasma Science and Technology (INP), Greifswald, Felix-Hausdorff-Straße 2, 17489, Germany
| | - Markus M. Becker
- Leibniz Institute for Plasma Science and Technology (INP), Greifswald, Felix-Hausdorff-Straße 2, 17489, Germany
| |
Collapse
|
11
|
Lawniczak MK, Davey RP, Rajan J, Pereira-da-Conceicoa LL, Kilias E, Hollingsworth PM, Barnes I, Allen H, Blaxter M, Burgin J, Broad GR, Crowley LM, Gaya E, Holroyd N, Lewis OT, McTaggart S, Mieszkowska N, Minotto A, Shaw F, Richards TA, Sivess LA. Specimen and sample metadata standards for biodiversity genomics: a proposal from the Darwin Tree of Life project. Wellcome Open Res 2022. [DOI: 10.12688/wellcomeopenres.17605.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The vision of the Earth BioGenome Project1 is to complete reference genomes for all of the planet’s ~2M described eukaryotic species in the coming decade. To contribute to this global endeavour, the Darwin Tree of Life Project (DToL2) was launched in 2019 with the aim of generating complete genomes for the ~70k described eukaryotic species that can be found in Britain and Ireland. One of the early tasks of the DToL project was to determine, define, and standardise the important metadata that must accompany every sample contributing to this ambitious project. This ensures high-quality contextual information is available for the associated data, enabling a richer set of information upon which to search and filter datasets as well as enabling interoperability between datasets used for downstream analysis. Here we describe some of the key factors we considered in the process of determining, defining, and documenting the metadata required for DToL project samples. The manifest and Standard Operating Procedure that are referred to throughout this paper are likely to be useful for other projects, and we encourage re-use while maintaining the standards and rules set out here.
Collapse
|
12
|
Vazquez P, Hirayama-Shoji K, Novik S, Krauss S, Rayner S. Globally Accessible Distributed Data Sharing (GADDS): a decentralized FAIR platform to facilitate data sharing in the life sciences. Bioinformatics 2022; 38:3812-3817. [PMID: 35639939 PMCID: PMC9344842 DOI: 10.1093/bioinformatics/btac362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 04/12/2022] [Accepted: 05/24/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Technical advances have revolutionized the life sciences and researchers commonly face challenges associated with handling large amounts of heterogeneous digital data. The Findable, Accessible, Interoperable and Reusable (FAIR) principles provide a framework to support effective data management. However, implementing this framework is beyond the means of most researchers in terms of resources and expertise, requiring awareness of metadata, policies, community agreements, and other factors such as vocabularies and ontologies. RESULTS We have developed the Globally Accessible Distributed Data Sharing (GADDS) platform to facilitate FAIR-like data-sharing in cross-disciplinary research collaborations. The platform consists of (i) a blockchain based metadata quality control system, (ii) a private cloud-like storage system and (iii) a version control system. GADDS is built with containerized technologies, providing minimal hardware standards and easing scalability, and offers decentralized trust via transparency of metadata, facilitating data exchange and collaboration. As a use case, we provide an example implementation in engineered living material technology within the Hybrid Technology Hub at the University of Oslo. AVAILABILITY Demo version available at https://github.com/pavelvazquez/GADDS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pavel Vazquez
- Hybrid Technology Hub - Centre of Excellence, Institute of Basic Medical Sciences, University of Oslo, P.O. Box 1110 Blindern 0317, Oslo, Norway
| | - Kayoko Hirayama-Shoji
- Hybrid Technology Hub - Centre of Excellence, Institute of Basic Medical Sciences, University of Oslo, P.O. Box 1110 Blindern 0317, Oslo, Norway
| | - Steffen Novik
- Department of Informatics, Faculty of Mathematics and Natural Sciences, University of Oslo, P.O. Box 1032 Blindern N-0315, Oslo, Norway
| | - Stefan Krauss
- Hybrid Technology Hub - Centre of Excellence, Institute of Basic Medical Sciences, University of Oslo, P.O. Box 1110 Blindern 0317, Oslo, Norway.,Department of Immunology and Transfusion Medicine, Oslo University Hospital, P.O. Box 4950 Nydalen, 0424, Oslo, Norway
| | - Simon Rayner
- Hybrid Technology Hub - Centre of Excellence, Institute of Basic Medical Sciences, University of Oslo, P.O. Box 1110 Blindern 0317, Oslo, Norway.,Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| |
Collapse
|
13
|
Formenti G, Theissinger K, Fernandes C, Bista I, Bombarely A, Bleidorn C, Ciofi C, Crottini A, Godoy JA, Höglund J, Malukiewicz J, Mouton A, Oomen RA, Paez S, Palsbøll PJ, Pampoulie C, Ruiz-López MJ, Svardal H, Theofanopoulou C, de Vries J, Waldvogel AM, Zhang G, Mazzoni CJ, Jarvis ED, Bálint M. The era of reference genomes in conservation genomics. Trends Ecol Evol 2022; 37:197-202. [PMID: 35086739 DOI: 10.1016/j.tree.2021.11.008] [Citation(s) in RCA: 87] [Impact Index Per Article: 43.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 11/10/2021] [Accepted: 11/16/2021] [Indexed: 02/08/2023]
Abstract
Progress in genome sequencing now enables the large-scale generation of reference genomes. Various international initiatives aim to generate reference genomes representing global biodiversity. These genomes provide unique insights into genomic diversity and architecture, thereby enabling comprehensive analyses of population and functional genomics, and are expected to revolutionize conservation genomics.
Collapse
Affiliation(s)
- Giulio Formenti
- The Rockefeller University, 1230 York Ave, New York, NY 10065, USA
| | - Kathrin Theissinger
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Georg-Voigt-Str. 14-16, 60325 Frankfurt/Main, Germany; University of Koblenz-Landau, Institute for Environmental Sciences, Fortstrasse 7, 76829 Landau, Germany; Senckenberg Biodiversity and Climate Research Centre, Georg-Voigt-Str. 14-16, 60325 Frankfurt/Main, Germany
| | - Carlos Fernandes
- CE3C - Centre for Ecology, Evolution and Environmental Changes, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal; Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013 Lisboa, Portugal
| | - Iliana Bista
- University of Cambridge, Department of Genetics, Cambridge CB2 3EH, UK; Wellcome Sanger Institute, CB10 1SA, Hinxton, UK
| | | | - Christoph Bleidorn
- University of Göttingen, Department of Animal Evolution and Biodiversity, Untere Karspüle, 2, 37073, Germany
| | - Claudio Ciofi
- University of Florence, Department of Biology, Via Madonna del Piano 6, Sesto Fiorentino (FI) 50019, Italy
| | - Angelica Crottini
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal
| | - José A Godoy
- Estación Biológica de Doñana, Consejo Superior de Investigaciones Científicas, Av. Américo Vespucio, 26, 41092, Spain
| | - Jacob Höglund
- Dept. of Ecology and Genetics, Uppsala University, Norbyvägen 18D, 75246, Sweden
| | | | - Alice Mouton
- InBios - Conservation Genetics Lab, University of Liege, Chemin de la Vallée 4, 4000, Belgium
| | - Rebekah A Oomen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Blindernveien 31, 0371 Oslo, Norway; Centre for Coastal Research, University of Agder, Gimlemoen 25j, 4630 Kristiansand, Norway
| | - Sadye Paez
- The Rockefeller University, 1230 York Ave, New York, NY 10065, USA
| | - Per J Palsbøll
- Groningen Institute of Evolutionary Life Sciences University of Groningen Nijenborgh, 9747, AG, Groningen, the Netherlands; Center for Coastal Studies, 5 Holway Avenue, Provincetown, MA 02657, USA
| | - Christophe Pampoulie
- Marine and Freshwater Research Institute, Fornubúðir, 5, 220 Hanafjörður, Iceland
| | - María J Ruiz-López
- Estación Biológica de Doñana, Consejo Superior de Investigaciones Científicas, Av. Américo Vespucio, 26, 41092, Spain
| | - Hannes Svardal
- Department of Biology, University of Antwerp, Groenenborgerlaan 171, 2020, Belgium
| | | | - Jan de Vries
- University of Göttingen, Institute for Microbiology and Genetics, Dept. of Applied Bioinformatics, Goettingen Center for Molecular Biosciences (GZMB), Campus Institute Data Science (CIDAS), Goldschmidtstr. 1, 37077, Germany
| | - Ann-Marie Waldvogel
- Institute of Zoology, University of Cologne, Zülpicherstrasse 47b, D-50674, Germany
| | - Guojie Zhang
- Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Denmark, Build 3, Universitetsparken 15, Copenhagen 2100, Denmark; China National Genebank, BGI-Shenzhen, Jinsha Road, Dapeng District, Shenzhen 518083, China
| | - Camila J Mazzoni
- Leibniz Institute for Zoo and Wildlife Research (IZW), Alfred-Kowalke-Str 17, 10315 Berlin, Germany
| | - Erich D Jarvis
- The Rockefeller University, 1230 York Ave, New York, NY 10065, USA
| | - Miklós Bálint
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Georg-Voigt-Str. 14-16, 60325 Frankfurt/Main, Germany; Senckenberg Biodiversity and Climate Research Centre, Georg-Voigt-Str. 14-16, 60325 Frankfurt/Main, Germany; Institute for Insect Biotechnology, Justus-Liebig University Gießen, Heinrich-Buff-Ring 26-32, 35392 Giessen, Germany.
| | | |
Collapse
|
14
|
Johnson D, Batista D, Cochrane K, Davey RP, Etuk A, Gonzalez-Beltran A, Haug K, Izzo M, Larralde M, Lawson TN, Minotto A, Moreno P, Nainala VC, O'Donovan C, Pireddu L, Roger P, Shaw F, Steinbeck C, Weber RJM, Sansone SA, Rocca-Serra P. ISA API: An open platform for interoperable life science experimental metadata. Gigascience 2021; 10:giab060. [PMID: 34528664 PMCID: PMC8444265 DOI: 10.1093/gigascience/giab060] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 03/19/2021] [Accepted: 08/23/2021] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND The Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open source community specifications and software tools for enabling discovery, exchange, and publication of metadata from experiments in the life sciences. The original ISA software suite provided a set of user-facing Java tools for creating and manipulating the information structured in ISA-Tab-a now widely used tabular format. To make the ISA framework more accessible to machines and enable programmatic manipulation of experiment metadata, the JSON serialization ISA-JSON was developed. RESULTS In this work, we present the ISA API, a Python library for the creation, editing, parsing, and validating of ISA-Tab and ISA-JSON formats by using a common data model engineered as Python object classes. We describe the ISA API feature set, early adopters, and its growing user community. CONCLUSIONS The ISA API provides users with rich programmatic metadata-handling functionality to support automation, a common interface, and an interoperable medium between the 2 ISA formats, as well as with other life science data formats required for depositing data in public databases.
Collapse
Affiliation(s)
- David Johnson
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
- Department of Informatics and Media, Uppsala University, Box 513, 75120 Uppsala, Sweden
| | - Dominique Batista
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Keeva Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Robert P Davey
- Earlham Institute, Data infrastructure and algorithms, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Anthony Etuk
- Earlham Institute, Data infrastructure and algorithms, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Alejandra Gonzalez-Beltran
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
- Science and Technology Facilities Council, Scientific Computing Department, Rutherford Appleton Laboratory, Harwell Campus, Didcot, OX11 0QX, UK
| | - Kenneth Haug
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- Genome Research Limited, Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Saffron Walden, CB10 1RQ, UK
| | - Massimiliano Izzo
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Martin Larralde
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Thomas N Lawson
- School of Biosciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Alice Minotto
- Earlham Institute, Data infrastructure and algorithms, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Venkata Chandrasekhar Nainala
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Claire O'Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Luca Pireddu
- Distributed Computing Group, CRS4: Center for Advanced Studies, Research & Development in Sardinia, Pula 09050, Italy
| | - Pierrick Roger
- CEA, LIST, Laboratory for Data Analysis and Systems’ Intelligence, MetaboHUB, Gif-Sur-Yvette F-91191, France
| | - Felix Shaw
- Earlham Institute, Data infrastructure and algorithms, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Christoph Steinbeck
- Cheminformatics and Computational Metabolomics, Institute for Analytical Chemistry, Lessingstr. 8, 07743 Jena, Germany
| | - Ralf J M Weber
- School of Biosciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
- Phenome Centre Birmingham, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| |
Collapse
|
15
|
Williamson HF, Brettschneider J, Caccamo M, Davey RP, Goble C, Kersey PJ, May S, Morris RJ, Ostler R, Pridmore T, Rawlings C, Studholme D, Tsaftaris SA, Leonelli S. Data management challenges for artificial intelligence in plant and agricultural research. F1000Res 2021; 10:324. [PMID: 36873457 PMCID: PMC9975417 DOI: 10.12688/f1000research.52204.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/12/2023] [Indexed: 01/19/2023] Open
Abstract
Artificial Intelligence (AI) is increasingly used within plant science, yet it is far from being routinely and effectively implemented in this domain. Particularly relevant to the development of novel food and agricultural technologies is the development of validated, meaningful and usable ways to integrate, compare and visualise large, multi-dimensional datasets from different sources and scientific approaches. After a brief summary of the reasons for the interest in data science and AI within plant science, the paper identifies and discusses eight key challenges in data management that must be addressed to further unlock the potential of AI in crop and agronomic research, and particularly the application of Machine Learning (AI) which holds much promise for this domain.
Collapse
Affiliation(s)
- Hugh F Williamson
- Exeter Centre for the Study of the Life Sciences & Institute for Data Science and Artificial Intelligence, University of Exeter, Exeter, UK
| | | | - Mario Caccamo
- NIAB, National Research Institute of Brewing, East Malling, UK
| | | | - Carole Goble
- Department of Computer Science, University of Manchester, Manchester, UK
| | | | - Sean May
- School of Biosciences, University of Nottingham, Loughborough, UK
| | | | - Richard Ostler
- Department of Computational and Analytical Sciences, Rothamsted Research, Harpendem, UK
| | - Tony Pridmore
- School of Computer Science, University of Nottingham, Nottingham, UK
| | - Chris Rawlings
- Department of Computational and Analytical Sciences, Rothamsted Research, Harpendem, UK
| | | | - Sotirios A Tsaftaris
- Institute of Digital Communications, University of Edinburgh, Edinburgh, UK.,Alan Turing Institute, London, UK
| | - Sabina Leonelli
- Exeter Centre for the Study of the Life Sciences & Institute for Data Science and Artificial Intelligence, University of Exeter, Exeter, UK.,Alan Turing Institute, London, UK
| |
Collapse
|
16
|
Arnaud E, Laporte MA, Kim S, Aubert C, Leonelli S, Miro B, Cooper L, Jaiswal P, Kruseman G, Shrestha R, Buttigieg PL, Mungall CJ, Pietragalla J, Agbona A, Muliro J, Detras J, Hualla V, Rathore A, Das RR, Dieng I, Bauchet G, Menda N, Pommier C, Shaw F, Lyon D, Mwanzia L, Juarez H, Bonaiuti E, Chiputwa B, Obileye O, Auzoux S, Yeumo ED, Mueller LA, Silverstein K, Lafargue A, Antezana E, Devare M, King B. The Ontologies Community of Practice: A CGIAR Initiative for Big Data in Agrifood Systems. PATTERNS (NEW YORK, N.Y.) 2020; 1:100105. [PMID: 33205138 PMCID: PMC7660444 DOI: 10.1016/j.patter.2020.100105] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 05/28/2020] [Accepted: 08/24/2020] [Indexed: 12/15/2022]
Abstract
Heterogeneous and multidisciplinary data generated by research on sustainable global agriculture and agrifood systems requires quality data labeling or annotation in order to be interoperable. As recommended by the FAIR principles, data, labels, and metadata must use controlled vocabularies and ontologies that are popular in the knowledge domain and commonly used by the community. Despite the existence of robust ontologies in the Life Sciences, there is currently no comprehensive full set of ontologies recommended for data annotation across agricultural research disciplines. In this paper, we discuss the added value of the Ontologies Community of Practice (CoP) of the CGIAR Platform for Big Data in Agriculture for harnessing relevant expertise in ontology development and identifying innovative solutions that support quality data annotation. The Ontologies CoP stimulates knowledge sharing among stakeholders, such as researchers, data managers, domain experts, experts in ontology design, and platform development teams.
Collapse
Affiliation(s)
- Elizabeth Arnaud
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Marie-Angélique Laporte
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Soonho Kim
- Markets, Trade and Institutions Division (MTID), International Food Policy Research Institute (IFPRI), Washington, DC, USA
| | - Céline Aubert
- Environment and Production Technology Division (EPTD), International Food Policy Research Institute (IFPRI), Washington, DC, USA
| | - Sabina Leonelli
- Department of Sociology, Philosophy and Anthropology & Exeter Centre for the Study of the Life Sciences (Egenis), University of Exeter, Exeter, UK
| | - Berta Miro
- Agrifood Policy Platform, International Rice Research Institute (IRRI), Los Baños, Laguna, Philippines
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Gideon Kruseman
- Socio-Economics Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, State of México, Mexico
| | - Rosemary Shrestha
- Genetic Resources Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, State of México, México
| | - Pier Luigi Buttigieg
- Helmholtz Metadata Collaboration, GEOMAR Helmholtz Centre for Ocean Research, Kiel, Germany
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | | - Afolabi Agbona
- Cassava Breeding Program, International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
| | | | - Jeffrey Detras
- Bioinformatics Cluster, Strategic Innovation Platform, International Rice Research Institute (IRRI), Los Baños, Laguna, Philippines
| | - Vilma Hualla
- Research Informatics Unit (RIU), International Potato Center (CIP), Lima, Peru
| | - Abhishek Rathore
- Statistics, Bioinformatics & Data Management (SBDM) Theme, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, Telangana, India
| | - Roma Rani Das
- Statistics, Bioinformatics & Data Management (SBDM) Theme, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, Telangana, India
| | - Ibnou Dieng
- Biometrics Unit, International Institute of Tropical Agriculture (IITA), Ibadan, Oyo State, Nigeria
| | - Guillaume Bauchet
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | - Naama Menda
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | - Cyril Pommier
- BioinfOmics, Plant Bioinformatics Facility, Université Paris-Saclay, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Versailles, France
| | - Felix Shaw
- Digital Biology, Earlham Institute, Norwich, Norfolk, UK
| | - David Lyon
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | - Leroy Mwanzia
- Performance, Innovation and Strategic Analysis, International Center for Tropical Agriculture (CIAT), Regional Office for Africa, Nairobi, Kenya
| | - Henry Juarez
- Research Informatics Unit (RIU), International Potato Center (CIP), Lima, Peru
| | - Enrico Bonaiuti
- Monitoring, Evaluation and Learning Team, International Center for Agricultural Research in the Dry Areas (ICARDA), Beirut, Lebanon
| | - Brian Chiputwa
- Research Methods Group (RMG), World Agroforestry (ICRAF), Nairobi, Kenya
| | - Olatunbosun Obileye
- Data Management Section, International Institute of Tropical Agriculture (IITA), Ibadan, Oyo State, Nigeria
| | - Sandrine Auzoux
- UPR AIDA, The French Agricultural Research Centre for International Development (CIRAD), Sainte-Clotilde, Réunion, France
- Université de Montpellier, Montpellier, France
| | - Esther Dzalé Yeumo
- Unité Délégation à l’Information Scientifique et Technique - DIST, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Versailles, France
| | - Lukas A. Mueller
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | | | | | - Erick Antezana
- Bayer Crop Science SA-NV, Diegem, Belgium
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Medha Devare
- Environment and Production Technology Division (EPTD), International Food Policy Research Institute (IFPRI), Washington, DC, USA
| | - Brian King
- CGIAR Platform for Big Data in Agriculture, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| |
Collapse
|