1
|
Leipzig J, Nüst D, Hoyt CT, Ram K, Greenberg J. The role of metadata in reproducible computational research. PATTERNS (NEW YORK, N.Y.) 2021; 2:100322. [PMID: 34553169 PMCID: PMC8441584 DOI: 10.1016/j.patter.2021.100322] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Reproducible computational research (RCR) is the keystone of the scientific method for in silico analyses, packaging the transformation of raw data to published results. In addition to its role in research integrity, improving the reproducibility of scientific studies can accelerate evaluation and reuse. This potential and wide support for the FAIR principles have motivated interest in metadata standards supporting reproducibility. Metadata provide context and provenance to raw data and methods and are essential to both discovery and validation. Despite this shared connection with scientific data, few studies have explicitly described how metadata enable reproducible computational research. This review employs a functional content analysis to identify metadata standards that support reproducibility across an analytic stack consisting of input data, tools, notebooks, pipelines, and publications. Our review provides background context, explores gaps, and discovers component trends of embeddedness and methodology weight from which we derive recommendations for future work.
Collapse
Affiliation(s)
- Jeremy Leipzig
- Metadata Research Center, College of Computing and Informatics, Drexel University, Philadelphia, PA, USA
| | - Daniel Nüst
- Institute for Geoinformatics, University of Münster, Münster, Germany
| | | | - Karthik Ram
- Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, CA, USA
| | - Jane Greenberg
- Metadata Research Center, College of Computing and Informatics, Drexel University, Philadelphia, PA, USA
| |
Collapse
|
2
|
Ison J, Ienasescu H, Rydza E, Chmura P, Rapacki K, Gaignard A, Schwämmle V, van Helden J, Kalaš M, Ménager H. biotoolsSchema: a formalized schema for bioinformatics software description. Gigascience 2021; 10:giaa157. [PMID: 33506265 PMCID: PMC7842104 DOI: 10.1093/gigascience/giaa157] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 11/10/2020] [Accepted: 12/07/2020] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Life scientists routinely face massive and heterogeneous data analysis tasks and must find and access the most suitable databases or software in a jungle of web-accessible resources. The diversity of information used to describe life-scientific digital resources presents an obstacle to their utilization. Although several standardization efforts are emerging, no information schema has been sufficiently detailed to enable uniform semantic and syntactic description-and cataloguing-of bioinformatics resources. FINDINGS Here we describe biotoolsSchema, a formalized information model that balances the needs of conciseness for rapid adoption against the provision of rich technical information and scientific context. biotoolsSchema results from a series of community-driven workshops and is deployed in the bio.tools registry, providing the scientific community with >17,000 machine-readable and human-understandable descriptions of software and other digital life-science resources. We compare our approach to related initiatives and provide alignments to foster interoperability and reusability. CONCLUSIONS biotoolsSchema supports the formalized, rigorous, and consistent specification of the syntax and semantics of bioinformatics resources, and enables cataloguing efforts such as bio.tools that help scientists to find, comprehend, and compare resources. The use of biotoolsSchema in bio.tools promotes the FAIRness of research software, a key element of open and reproducible developments for data-intensive sciences.
Collapse
Affiliation(s)
- Jon Ison
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
| | - Hans Ienasescu
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800 Kongens Lyngby, Denmark
| | - Emil Rydza
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200 København, Denmark
| | - Piotr Chmura
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200 København, Denmark
| | - Kristoffer Rapacki
- Department of Health Technology, Ørsteds Plads, Building 345C, DK-2800 Kongens, Lyngby, Denmark
| | - Alban Gaignard
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- L'institut du Thorax, INSERM, CNRS, University of Nantes, 44007 Nantes, France
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Jacques van Helden
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- Département de Biologie, Aix-Marseille Université (AMU), 3 place Victor Hugo, 13003 Marseille, France
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, N-5008 Bergen, Norway
| | - Hervé Ménager
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- Hub de Bioinformatique et Biostatistique–Département Biologie Computationnelle, Institut Pasteur, USR 3756, CNRS, Paris 75015, France
| |
Collapse
|
3
|
Milton M, Thorne N. aCLImatise: Automated generation of tool definitions for bioinformatics workflows. Bioinformatics 2020; 36:5556-5557. [PMID: 33325479 PMCID: PMC8016486 DOI: 10.1093/bioinformatics/btaa1033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 11/25/2020] [Accepted: 12/03/2020] [Indexed: 11/23/2022] Open
Abstract
Summary aCLImatise is a utility for automatically generating tool definitions compatible with bioinformatics workflow languages, by parsing command-line help output. aCLImatise also has an associated database called the aCLImatise Base Camp, which provides thousands of pre-computed tool definitions. Availability and implementation The latest aCLImatise source code is available within a GitHub organisation, under the GPL-3.0 license: https://github.com/aCLImatise. In particular, documentation for the aCLImatise Python package is available at https://aclimatise.github.io/CliHelpParser/, and the aCLImatise Base Camp is available at https://aclimatise.github.io/BaseCamp/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michael Milton
- Innovation and Technology, Melbourne Genomics Health Alliance, Parkville, 3052, Australia.,Walter and Eliza Hall Institute of Medical Research, Parkville, 3052, Australia
| | - Natalie Thorne
- Innovation and Technology, Melbourne Genomics Health Alliance, Parkville, 3052, Australia.,Walter and Eliza Hall Institute of Medical Research, Parkville, 3052, Australia.,Murdoch Children's Research Institute, Parkville, 3052, Australia.,The University of Melbourne, Parkville, 3052, Australia
| |
Collapse
|
4
|
Joppich M, Zimmer R. From command-line bioinformatics to bioGUI. PeerJ 2019; 7:e8111. [PMID: 31772845 PMCID: PMC6875409 DOI: 10.7717/peerj.8111] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 10/28/2019] [Indexed: 12/02/2022] Open
Abstract
Bioinformatics is a highly interdisciplinary field providing (bioinformatics) applications for scientists from many disciplines. Installing and starting applications on the command-line (CL) is inconvenient and/or inefficient for many scientists. Nonetheless, most methods are implemented with a command-line interface only. Providing a graphical user interface (GUI) for bioinformatics applications is one step toward routinely making CL-only applications available to more scientists and, thus, toward a more effective interdisciplinary work. With our bioGUI framework we address two main problems of using CL bioinformatics applications: First, many tools work on UNIX-systems only, while many scientists use Microsoft Windows. Second, scientists refrain from using CL tools which, however, could well support them in their research. With bioGUI install modules and templates, installing and using CL tools is made possible for most scientists-even on Windows, due to bioGUI's support for Windows Subsystem for Linux. In addition, bioGUI templates can easily be created, making the bioGUI framework highly rewarding for developers. From the bioGUI repository it is possible to download, install and use bioinformatics tools with just a few clicks.
Collapse
Affiliation(s)
- Markus Joppich
- Department of Informatics, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Ralf Zimmer
- Department of Informatics, Ludwig-Maximilians-Universität München, Munich, Germany
| |
Collapse
|
5
|
Ison J, Ménager H, Brancotte B, Jaaniso E, Salumets A, Raček T, Lamprecht AL, Palmblad M, Kalaš M, Chmura P, Hancock JM, Schwämmle V, Ienasescu HI. Community curation of bioinformatics software and data resources. Brief Bioinform 2019; 21:1697-1705. [PMID: 31624831 PMCID: PMC7947956 DOI: 10.1093/bib/bbz075] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 05/13/2019] [Accepted: 05/30/2019] [Indexed: 11/13/2022] Open
Abstract
The corpus of bioinformatics resources is huge and expanding rapidly, presenting life scientists with a growing challenge in selecting tools that fit the desired purpose. To address this, the European Infrastructure for Biological Information is supporting a systematic approach towards a comprehensive registry of tools and databases for all domains of bioinformatics, provided under a single portal (https://bio.tools). We describe here the practical means by which scientific communities, including individual developers and projects, through major service providers and research infrastructures, can describe their own bioinformatics resources and share these via bio.tools.
Collapse
Affiliation(s)
- Jon Ison
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800 Kongens Lyngby, Denmark
| | - Hervé Ménager
- Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Bryan Brancotte
- Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Erik Jaaniso
- ELIXIR-EE, Institute of Computer Science, University of Tartu. J Liivi 2, Tartu, Estonia
| | - Ahto Salumets
- ELIXIR-EE, Institute of Computer Science, University of Tartu. J Liivi 2, Tartu, Estonia
| | - Tomáš Raček
- CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno-Bohunice, Czech Republic.,Faculty of Informatics, Masaryk University, Botanická 68a, 602 00 Brno, Czech Republic
| | - Anna-Lena Lamprecht
- Department of Information and Computing Sciences, Utrecht University, Utrecht, Netherlands
| | - Magnus Palmblad
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, Netherlands
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, N-5020 Bergen, Norway
| | - Piotr Chmura
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen
| | - John M Hancock
- ELIXIR-Hub, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Hans-Ioan Ienasescu
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800 Kongens Lyngby, Denmark
| |
Collapse
|
6
|
Andrio P, Hospital A, Conejero J, Jordá L, Del Pino M, Codo L, Soiland-Reyes S, Goble C, Lezzi D, Badia RM, Orozco M, Gelpi JL. BioExcel Building Blocks, a software library for interoperable biomolecular simulation workflows. Sci Data 2019; 6:169. [PMID: 31506435 PMCID: PMC6736963 DOI: 10.1038/s41597-019-0177-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Accepted: 08/16/2019] [Indexed: 12/26/2022] Open
Abstract
In the recent years, the improvement of software and hardware performance has made biomolecular simulations a mature tool for the study of biological processes. Simulation length and the size and complexity of the analyzed systems make simulations both complementary and compatible with other bioinformatics disciplines. However, the characteristics of the software packages used for simulation have prevented the adoption of the technologies accepted in other bioinformatics fields like automated deployment systems, workflow orchestration, or the use of software containers. We present here a comprehensive exercise to bring biomolecular simulations to the “bioinformatics way of working”. The exercise has led to the development of the BioExcel Building Blocks (BioBB) library. BioBB’s are built as Python wrappers to provide an interoperable architecture. BioBB’s have been integrated in a chain of usual software management tools to generate data ontologies, documentation, installation packages, software containers and ways of integration with workflow managers, that make them usable in most computational environments.
Collapse
Affiliation(s)
- Pau Andrio
- Barcelona Supercomputing Center (BSC), Jordi Girona 29, 08034, Barcelona, Spain
| | - Adam Hospital
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Baldiri Reixac 10, Barcelona, 08028, Spain
| | - Javier Conejero
- Barcelona Supercomputing Center (BSC), Jordi Girona 29, 08034, Barcelona, Spain
| | - Luis Jordá
- Barcelona Supercomputing Center (BSC), Jordi Girona 29, 08034, Barcelona, Spain
| | - Marc Del Pino
- Barcelona Supercomputing Center (BSC), Jordi Girona 29, 08034, Barcelona, Spain
| | - Laia Codo
- Barcelona Supercomputing Center (BSC), Jordi Girona 29, 08034, Barcelona, Spain
| | - Stian Soiland-Reyes
- School of Computer Science, The University of Manchester, Manchester, United Kingdom
| | - Carole Goble
- School of Computer Science, The University of Manchester, Manchester, United Kingdom
| | - Daniele Lezzi
- Barcelona Supercomputing Center (BSC), Jordi Girona 29, 08034, Barcelona, Spain
| | - Rosa M Badia
- Barcelona Supercomputing Center (BSC), Jordi Girona 29, 08034, Barcelona, Spain
| | - Modesto Orozco
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Baldiri Reixac 10, Barcelona, 08028, Spain.,Department Biochemistry and Molecular Biomedicine, University of Barcelona, Barcelona, Spain
| | - Josep Ll Gelpi
- Barcelona Supercomputing Center (BSC), Jordi Girona 29, 08034, Barcelona, Spain. .,Department Biochemistry and Molecular Biomedicine, University of Barcelona, Barcelona, Spain.
| |
Collapse
|
7
|
Ison J, Ienasescu H, Chmura P, Rydza E, Ménager H, Kalaš M, Schwämmle V, Grüning B, Beard N, Lopez R, Duvaud S, Stockinger H, Persson B, Vařeková RS, Raček T, Vondrášek J, Peterson H, Salumets A, Jonassen I, Hooft R, Nyrönen T, Valencia A, Capella S, Gelpí J, Zambelli F, Savakis B, Leskošek B, Rapacki K, Blanchet C, Jimenez R, Oliveira A, Vriend G, Collin O, van Helden J, Løngreen P, Brunak S. The bio.tools registry of software tools and data resources for the life sciences. Genome Biol 2019; 20:164. [PMID: 31405382 PMCID: PMC6691543 DOI: 10.1186/s13059-019-1772-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 07/22/2019] [Indexed: 11/28/2022] Open
Abstract
Bioinformaticians and biologists rely increasingly upon workflows for the flexible utilization of the many life science tools that are needed to optimally convert data into knowledge. We outline a pan-European enterprise to provide a catalogue ( https://bio.tools ) of tools and databases that can be used in these workflows. bio.tools not only lists where to find resources, but also provides a wide variety of practical information.
Collapse
Affiliation(s)
- Jon Ison
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800, Kongens Lyngby, Denmark.
| | - Hans Ienasescu
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800, Kongens Lyngby, Denmark
| | - Piotr Chmura
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen, Denmark
| | - Emil Rydza
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen, Denmark
| | - Hervé Ménager
- Hub de Bioinformatique et de Biostatistiques, Institut Pasteur, C3BI USR, 3756 IP CNRS, Paris, France
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, N-5020, Bergen, Norway
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, 5230, Odense, Denmark
| | - Björn Grüning
- Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Niall Beard
- School of Computer Science, The University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Rodrigo Lopez
- The EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Severine Duvaud
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Batiment Amphipole, CH-1015, Lausanne, Switzerland
| | - Heinz Stockinger
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Batiment Amphipole, CH-1015, Lausanne, Switzerland
| | - Bengt Persson
- Bioinformatics Infrastructure for Life Sciences, Science for Life Laboratory, Dept of Cell and Molecular Biology, Uppsala University, S-75124, Uppsala, Sweden
| | - Radka Svobodová Vařeková
- CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00, Brno-Bohunice, Czech Republic
| | - Tomáš Raček
- CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00, Brno-Bohunice, Czech Republic
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Flemingovo namesti 2, 160 00, Prague, Czech Republic
| | - Hedi Peterson
- ELIXIR-EE, Institute of Computer Science, University of Tartu. J Liivi 2, Tartu, Estonia
| | - Ahto Salumets
- ELIXIR-EE, Institute of Computer Science, University of Tartu. J Liivi 2, Tartu, Estonia
| | - Inge Jonassen
- Computational Biology Unit, Department of Informatics, University of Bergen, N-5020, Bergen, Norway
| | - Rob Hooft
- Dutch Techcentre for Life Sciences, Jaarbeursplein 6, 3521, AL, Utrecht, The Netherlands
| | - Tommi Nyrönen
- CSC - IT Center for Science, PO BOX 405, FI-02101, Espoo, Finland
| | - Alfonso Valencia
- Barcelona Supercomputing Centre (BSC), 08034, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluıs Companys 23, 08010, Barcelona, Spain
| | | | - Josep Gelpí
- Barcelona Supercomputing Centre (BSC), 08034, Barcelona, Spain
- Department of Biochemistry and Molecular Biomedicine, University of Barcelona, INB / BSC-CNS, Barcelona, Spain
| | - Federico Zambelli
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), via Amendola 165/A, Bari, Italy
- Department of Biosciences, University of Milano, Via Celoria 26, Milan, Italy
| | - Babis Savakis
- Biomedical Sciences Research Centre, Alexander Fleming 34 Al. Fleming Str, 16672, Vari, Greece
| | - Brane Leskošek
- Faculty of Medicine / ELIXIR-SI, University of Ljubljana, Vrazov trg 2, SI-1000, Ljubljana, Slovenia
| | - Kristoffer Rapacki
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800, Kongens Lyngby, Denmark
| | - Christophe Blanchet
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000, Evry, France
| | - Rafael Jimenez
- ELIXIR-Hub, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Arlindo Oliveira
- INESC-ID / Instituto Superior Técnico, R. Alves Redol 9, Lisbon, Portugal
| | - Gert Vriend
- Radboud University Medical Centre, CMBI, Postbus 9101, 6500 HB, Nijmegen, Netherlands
| | - Olivier Collin
- Plateforme GenOuest Univ Rennes, Inria, CNRS, IRISA, F-35000, Rennes, France
| | - Jacques van Helden
- Aix-Marseille Univ, INSERM, lab. Theory and Approaches of Genome Complexity (TAGC), Marseille, France
| | - Peter Løngreen
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800, Kongens Lyngby, Denmark
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen, Denmark
- Department of Bio and Health Informatics, Technical University of Denmark, Building 208, DK-2800, Kongens Lyngby, Denmark
| |
Collapse
|