Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gruening B, Sallou O, Moreno P, da Veiga Leprevost F, Ménager H, Søndergaard D, Röst H, Sachsenberg T, O'Connor B, Madeira F, Dominguez Del Angel V, Crusoe MR, Varma S, Blankenberg D, Jimenez RC, Perez-Riverol Y. Recommendations for the packaging and containerizing of bioinformatics software. F1000Res 2018;7. [PMID: 31543945 PMCID: PMC6738188 DOI: 10.12688/f1000research.15140.2] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/18/2019] [Indexed: 11/22/2022] Open

For:	Gruening B, Sallou O, Moreno P, da Veiga Leprevost F, Ménager H, Søndergaard D, Röst H, Sachsenberg T, O'Connor B, Madeira F, Dominguez Del Angel V, Crusoe MR, Varma S, Blankenberg D, Jimenez RC, Perez-Riverol Y. Recommendations for the packaging and containerizing of bioinformatics software. F1000Res 2018;7. [PMID: 31543945 PMCID: PMC6738188 DOI: 10.12688/f1000research.15140.2] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/18/2019] [Indexed: 11/22/2022] Open

Number

Cited by Other Article(s)

Niehues A, de Visser C, Hagenbeek FA, Kulkarni P, Pool R, Karu N, Kindt ASD, Singh G, Vermeiren RRJM, Boomsma DI, van Dongen J, ’t Hoen PAC, van Gool AJ. A multi-omics data analysis workflow packaged as a FAIR Digital Object. Gigascience 2024;13:giad115. [PMID: 38217405 PMCID: PMC10787363 DOI: 10.1093/gigascience/giad115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 11/14/2023] [Accepted: 12/10/2023] [Indexed: 01/15/2024] Open

Affiliation(s)

Anna Niehues Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, 6525 GA Nijmegen, the Netherlands
Casper de Visser Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
Fiona A Hagenbeek Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
Purva Kulkarni Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, 6525 GA Nijmegen, the Netherlands Department of Human Genetics, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
René Pool Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
Naama Karu Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden University, 2333 AL Leiden, The Netherlands
Alida S D Kindt Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden University, 2333 AL Leiden, The Netherlands
Gurnoor Singh Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
Robert R J M Vermeiren Department of Child and Adolescent Psychiatry, LUMC-Curium, Leiden University Medical Center, 2342 AK Oegstgeest, The Netherlands
Dorret I Boomsma Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands Amsterdam Reproduction & Development (AR&D) Research Institute, 1081 BT Amsterdam, The Netherlands
Jenny van Dongen Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands Amsterdam Reproduction & Development (AR&D) Research Institute, 1081 BT Amsterdam, The Netherlands
Peter A C ’t Hoen Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
Alain J van Gool Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, 6525 GA Nijmegen, the Netherlands

Collapse

Rodeiro J, Vidaña-Vila E, Navarro J, Mallol R. CloMet: A Novel Open-Source and Modular Software Platform That Connects Established Metabolomics Repositories and Data Analysis Resources. J Proteome Res 2023;22:2540-2547. [PMID: 37428859 PMCID: PMC10857572 DOI: 10.1021/acs.jproteome.2c00602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Indexed: 07/12/2023]

Chen J, Basting PJ, Han S, Garfinkel DJ, Bergman CM. Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast. Mob DNA 2023;14:8. [PMID: 37452430 PMCID: PMC10347736 DOI: 10.1186/s13100-023-00296-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/09/2023] [Indexed: 07/18/2023] Open

Abstract

BACKGROUND

Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors.

RESULTS

We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide consistent estimates of [Formula: see text]50 non-reference TE insertions per strain and that Ty2 has the highest number of non-reference TE insertions in a species-wide panel of [Formula: see text]1000 yeast genomes. Finally, we show that best-in-class predictors for yeast applied to resequencing data have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences revealed previously for experimentally-induced Ty1 insertions to spontaneous insertions for other copia-superfamily retrotransposons in yeast.

CONCLUSION

McClintock ( https://github.com/bergmanlab/mcclintock/ ) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors in other species.

Collapse

Sonrel A, Luetge A, Soneson C, Mallona I, Germain PL, Knyazev S, Gilis J, Gerber R, Seurinck R, Paul D, Sonder E, Crowell HL, Fanaswala I, Al-Ajami A, Heidari E, Schmeing S, Milosavljevic S, Saeys Y, Mangul S, Robinson MD. Meta-analysis of (single-cell method) benchmarks reveals the need for extensibility and interoperability. Genome Biol 2023;24:119. [PMID: 37198712 DOI: 10.1186/s13059-023-02962-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 05/06/2023] [Indexed: 05/19/2023] Open

Affiliation(s)

Anthony Sonrel Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
Almut Luetge Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
Charlotte Soneson SIB Swiss Institute of Bioinformatics, Zurich, Switzerland Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
Izaskun Mallona Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland SIB Swiss Institute of Bioinformatics, Zurich, Switzerland Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
Pierre-Luc Germain Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland SIB Swiss Institute of Bioinformatics, Zurich, Switzerland D-HEST Institute for Neuroscience, ETH Zürich, Zurich, Switzerland
Sergey Knyazev Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, USA
Jeroen Gilis Department of Applied Mathematics, Computer Science & Statistics, Ghent University, Ghent, Belgium Data Mining and Modeling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
Reto Gerber Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
Ruth Seurinck Department of Applied Mathematics, Computer Science & Statistics, Ghent University, Ghent, Belgium Data Mining and Modeling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium
Dominique Paul Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
Emanuel Sonder Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland SIB Swiss Institute of Bioinformatics, Zurich, Switzerland D-HEST Institute for Neuroscience, ETH Zürich, Zurich, Switzerland
Helena L Crowell Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
Imran Fanaswala Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
Ahmad Al-Ajami Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
Elyas Heidari Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
Stephan Schmeing Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
Stefan Milosavljevic Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland SIB Swiss Institute of Bioinformatics, Zurich, Switzerland Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
Yvan Saeys Department of Applied Mathematics, Computer Science & Statistics, Ghent University, Ghent, Belgium Data Mining and Modeling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium
Serghei Mangul Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, USA
Mark D Robinson Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland. SIB Swiss Institute of Bioinformatics, Zurich, Switzerland.

Collapse

Player RA, Aguinaldo AM, Merritt BB, Maszkiewicz LN, Adeyemo OE, Forsyth ER, Verratti KJ, Chee BW, Grady SL, Bradburne CE. The META tool optimizes metagenomic analyses across sequencing platforms and classifiers. FRONTIERS IN BIOINFORMATICS 2023;2:969247. [PMID: 36685333 PMCID: PMC9852826 DOI: 10.3389/fbinf.2022.969247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 12/14/2022] [Indexed: 01/09/2023] Open

Mendes CI, Vila-Cerqueira P, Motro Y, Moran-Gilad J, Carriço JA, Ramirez M. LMAS: evaluating metagenomic short de novo assembly methods through defined communities. Gigascience 2022;12:6963325. [PMID: 36576131 PMCID: PMC9795473 DOI: 10.1093/gigascience/giac122] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 09/26/2022] [Accepted: 11/16/2022] [Indexed: 12/29/2022] Open

König P, Beier S, Mascher M, Stein N, Lange M, Scholz U. DivBrowse-interactive visualization and exploratory data analysis of variant call matrices. Gigascience 2022;12:giad025. [PMID: 37083938 PMCID: PMC10120423 DOI: 10.1093/gigascience/giad025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 01/23/2023] [Accepted: 03/23/2023] [Indexed: 04/22/2023] Open

Hou Q, Waury K, Gogishvili D, Feenstra KA. Ten quick tips for sequence-based prediction of protein properties using machine learning. PLoS Comput Biol 2022;18:e1010669. [PMID: 36454728 PMCID: PMC9714715 DOI: 10.1371/journal.pcbi.1010669] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

Kadri S, Sboner A, Sigaras A, Roy S. Containers in Bioinformatics: Applications, Practical Considerations, and Best Practices in Molecular Pathology. J Mol Diagn 2022;24:442-454. [PMID: 35189355 DOI: 10.1016/j.jmoldx.2022.01.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 11/15/2021] [Accepted: 01/21/2022] [Indexed: 12/19/2022] Open

van der Putten BCL, Mendes CI, Talbot BM, de Korne-Elenbaas J, Mamede R, Vila-Cerqueira P, Coelho LP, Gulvik CA, Katz LS, The Asm Ngs Hackathon Participants. Software testing in microbial bioinformatics: a call to action. Microb Genom 2022;8. [PMID: 35259087 PMCID: PMC9176277 DOI: 10.1099/mgen.0.000790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Allain F, Roméjon J, La Rosa P, Jarlier F, Servant N, Hupé P. Geniac: Automatic Configuration GENerator and Installer for nextflow pipelines. OPEN RESEARCH EUROPE 2022;1:76. [PMID: 37645091 PMCID: PMC10445886 DOI: 10.12688/openreseurope.13861.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/11/2022] [Indexed: 08/31/2023]

Abstract

With the advent of high-throughput biotechnological platforms and their ever-growing capacity, life science has turned into a digitized, computational and data-intensive discipline. As a consequence, standard analysis with a bioinformatics pipeline in the context of routine production has become a challenge such that the data can be processed in real-time and delivered to the end-users as fast as possible. The usage of workflow management systems along with packaging systems and containerization technologies offer an opportunity to tackle this challenge. While very powerful, they can be used and combined in many multiple ways which may differ from one developer to another. Therefore, promoting the homogeneity of the workflow implementation requires guidelines and protocols which detail how the source code of the bioinformatics pipeline should be written and organized to ensure its usability, maintainability, interoperability, sustainability, portability, reproducibility, scalability and efficiency. Capitalizing on Nextflow, Conda, Docker, Singularity and the nf-core initiative, we propose a set of best practices along the development life cycle of the bioinformatics pipeline and deployment for production operations which target different expert communities including i) the bioinformaticians and statisticians ii) the software engineers and iii) the data managers and core facility engineers. We implemented Geniac (Automatic Configuration GENerator and Installer for nextflow pipelines) which consists of a toolbox with three components: i) a technical documentation available at https://geniac.readthedocs.io to detail coding guidelines for the bioinformatics pipeline with Nextflow, ii) a command line interface with a linter to check that the code respects the guidelines, and iii) an add-on to generate configuration files, build the containers and deploy the pipeline. The Geniac toolbox aims at the harmonization of development practices across developers and automation of the generation of configuration files and containers by parsing the source code of the Nextflow pipeline.

Collapse

Piccolo SR, Ence ZE, Anderson EC, Chang JT, Bild AH. Simplifying the development of portable, scalable, and reproducible workflows. eLife 2021;10:71069. [PMID: 34643507 PMCID: PMC8514239 DOI: 10.7554/elife.71069] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 09/27/2021] [Indexed: 12/30/2022] Open

Combining Multiple RNA-Seq Data Analysis Algorithms Using Machine Learning Improves Differential Isoform Expression Analysis. Methods Protoc 2021;4:mps4040068. [PMID: 34698224 PMCID: PMC8544431 DOI: 10.3390/mps4040068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 08/22/2021] [Accepted: 09/24/2021] [Indexed: 12/13/2022] Open

Paul-Gilloteaux P, Tosi S, Hériché JK, Gaignard A, Ménager H, Marée R, Baecker V, Klemm A, Kalaš M, Zhang C, Miura K, Colombelli J. Bioimage analysis workflows: community resources to navigate through a complex ecosystem. F1000Res 2021;10:320. [PMID: 34136134 PMCID: PMC8182692 DOI: 10.12688/f1000research.52569.1] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/14/2021] [Indexed: 11/20/2022] Open

Nüst D, Sochat V, Marwick B, Eglen SJ, Head T, Hirst T, Evans BD. Ten simple rules for writing Dockerfiles for reproducible data science. PLoS Comput Biol 2020;16:e1008316. [PMID: 33170857 PMCID: PMC7654784 DOI: 10.1371/journal.pcbi.1008316] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Föll MC, Moritz L, Wollmann T, Stillger MN, Vockert N, Werner M, Bronsert P, Rohr K, Grüning BA, Schilling O. Accessible and reproducible mass spectrometry imaging data analysis in Galaxy. Gigascience 2019;8:giz143. [PMID: 31816088 PMCID: PMC6901077 DOI: 10.1093/gigascience/giz143] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Revised: 09/10/2019] [Accepted: 11/10/2019] [Indexed: 02/06/2023] Open

Affiliation(s)

Melanie Christine Föll Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany Faculty of Biology, University of Freiburg, Schänzlestraße 1, 79104 Freiburg, Germany
Lennart Moritz Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
Thomas Wollmann Biomedical Computer Vision Group, BioQuant, IPMB, Heidelberg University, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
Maren Nicole Stillger Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany Faculty of Biology, University of Freiburg, Schänzlestraße 1, 79104 Freiburg, Germany Institute of Molecular Medicine and Cell Research, Faculty of Medicine, University of Freiburg, Stefan-Meier-Straße 17, 79104 Freiburg, Germany
Niklas Vockert Biomedical Computer Vision Group, BioQuant, IPMB, Heidelberg University, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
Martin Werner Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany Faculty of Medicine - University of Freiburg, Breisacher Straße 153, 79110 Freiburg, Germany Tumorbank Comprehensive Cancer Center Freiburg, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Hugstetter Straße 55, 79106 Freiburg, Germany
Peter Bronsert Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany Faculty of Medicine - University of Freiburg, Breisacher Straße 153, 79110 Freiburg, Germany Tumorbank Comprehensive Cancer Center Freiburg, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Hugstetter Straße 55, 79106 Freiburg, Germany
Karl Rohr Biomedical Computer Vision Group, BioQuant, IPMB, Heidelberg University, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
Björn Andreas Grüning Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
Oliver Schilling Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany Faculty of Medicine - University of Freiburg, Breisacher Straße 153, 79110 Freiburg, Germany German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Hugstetter Straße 55, 79106 Freiburg, Germany

Collapse

Khan FZ, Soiland-Reyes S, Sinnott RO, Lonie A, Goble C, Crusoe MR. Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv. Gigascience 2019;8:giz095. [PMID: 31675414 PMCID: PMC6824458 DOI: 10.1093/gigascience/giz095] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Revised: 05/23/2019] [Accepted: 07/17/2019] [Indexed: 01/22/2023] Open

Abstract

BACKGROUND

The automation of data analysis in the form of scientific workflows has become a widely adopted practice in many fields of research. Computationally driven data-intensive experiments using workflows enable automation, scaling, adaptation, and provenance support. However, there are still several challenges associated with the effective sharing, publication, and reproducibility of such workflows due to the incomplete capture of provenance and lack of interoperability between different technical (software) platforms.

RESULTS

Based on best-practice recommendations identified from the literature on workflow design, sharing, and publishing, we define a hierarchical provenance framework to achieve uniformity in provenance and support comprehensive and fully re-executable workflows equipped with domain-specific information. To realize this framework, we present CWLProv, a standard-based format to represent any workflow-based computational analysis to produce workflow output artefacts that satisfy the various levels of provenance. We use open source community-driven standards, interoperable workflow definitions in Common Workflow Language (CWL), structured provenance representation using the W3C PROV model, and resource aggregation and sharing as workflow-centric research objects generated along with the final outputs of a given workflow enactment. We demonstrate the utility of this approach through a practical implementation of CWLProv and evaluation using real-life genomic workflows developed by independent groups.

CONCLUSIONS

The underlying principles of the standards utilized by CWLProv enable semantically rich and executable research objects that capture computational workflows with retrospective provenance such that any platform supporting CWL will be able to understand the analysis, reuse the methods for partial reruns, or reproduce the analysis to validate the published findings.

Collapse

Georgeson P, Syme A, Sloggett C, Chung J, Dashnow H, Milton M, Lonsdale A, Powell D, Seemann T, Pope B. Bionitio: demonstrating and facilitating best practices for bioinformatics command-line software. Gigascience 2019;8:giz109. [PMID: 31544213 PMCID: PMC6755254 DOI: 10.1093/gigascience/giz109] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 07/16/2019] [Accepted: 08/13/2019] [Indexed: 11/14/2022] Open

Abstract

BACKGROUND

Bioinformatics software tools are often created ad hoc, frequently by people without extensive training in software development. In particular, for beginners, the barrier to entry in bioinformatics software development is high, especially if they want to adopt good programming practices. Even experienced developers do not always follow best practices. This results in the proliferation of poorer-quality bioinformatics software, leading to limited scalability and inefficient use of resources; lack of reproducibility, usability, adaptability, and interoperability; and erroneous or inaccurate results.

FINDINGS

We have developed Bionitio, a tool that automates the process of starting new bioinformatics software projects following recommended best practices. With a single command, the user can create a new well-structured project in 1 of 12 programming languages. The resulting software is functional, carrying out a prototypical bioinformatics task, and thus serves as both a working example and a template for building new tools. Key features include command-line argument parsing, error handling, progress logging, defined exit status values, a test suite, a version number, standardized building and packaging, user documentation, code documentation, a standard open source software license, software revision control, and containerization.

CONCLUSIONS

Bionitio serves as a learning aid for beginner-to-intermediate bioinformatics programmers and provides an excellent starting point for new projects. This helps developers adopt good programming practices from the beginning of a project and encourages high-quality tools to be developed more rapidly. This also benefits users because tools are more easily installed and consistent in their usage. Bionitio is released as open source software under the MIT License and is available at https://github.com/bionitio-team/bionitio.

Collapse

Affiliation(s)

Peter Georgeson Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053 Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Victorian Comprehensive Cancer Centre, 305 Grattan Street, Melbourne, Victoria, Australia 3000
Anna Syme Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053 Royal Botanic Gardens Victoria, Birdwood Avenue, Melbourne, Victoria, Australia 3004
Clare Sloggett Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
Jessica Chung Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
Harriet Dashnow Bioinformatics, Murdoch Children's Research Institute, Royal Children's Hospital, Flemington Road, Parkville, Victoria, Australia 3052 School of BioSciences, The University of Melbourne, Royal Parade, Parkville, Victoria, Australia 3052
Michael Milton Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053 Melbourne Genomics Health Alliance, Walter and Eliza Hall Institute, 1G Royal Parade, Parkville, Victoria, Australia 3052
Andrew Lonsdale Bioinformatics, Murdoch Children's Research Institute, Royal Children's Hospital, Flemington Road, Parkville, Victoria, Australia 3052 ARC Centre of Excellence in Plant Cell Walls, School of BioSciences, The University of Melbourne, Royal Parade, Parkville, Victoria, Australia 3052
David Powell Monash Bioinformatics Platform, Biomedicine Discovery Institute, Faculty of Medicine, Nursing and Health Sciences, 15 Innovation Walk, Monash University, Clayton, Victoria, Australia 3800
Torsten Seemann Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053 Department of Microbiology and Immunology, Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street Melbourne, Victoria, Australia 3000
Bernard Pope Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053 Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Victorian Comprehensive Cancer Centre, 305 Grattan Street, Melbourne, Victoria, Australia 3000 Department of Medicine, Central Clinical School, Monash University, Clayton, Victoria, Australia 3800

Collapse

Sélem-Mojica N, Aguilar C, Gutiérrez-García K, Martínez-Guerrero CE, Barona-Gómez F. EvoMining reveals the origin and fate of natural product biosynthetic enzymes. Microb Genom 2019;5. [PMID: 30946645 PMCID: PMC6939163 DOI: 10.1099/mgen.0.000260] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

Abstract

Natural products (NPs), or specialized metabolites, are important for medicine and agriculture alike, and for the fitness of the organisms that produce them. NP genome-mining aims at extracting biosynthetic information from the genomes of microbes presumed to produce these compounds. Typically, canonical enzyme sequences from known biosynthetic systems are identified after sequence similarity searches. Despite this being an efficient process, the likelihood of identifying truly novel systems by this approach is low. To overcome this limitation, we previously introduced EvoMining, a genome-mining approach that incorporates evolutionary principles. Here, we release and use our latest EvoMining version, which includes novel visualization features and customizable databases, to analyse 42 central metabolic enzyme families (EFs) conserved throughout Actinobacteria, Cyanobacteria, Pseudomonas and Archaea. We found that expansion-and-recruitment profiles of these 42 families are lineage specific, opening the metabolic space related to ‘shell’ enzymes. These enzymes, which have been overlooked, are EFs with orthologues present in most of the genomes of a taxonomic group, but not in all. As a case study of canonical shell enzymes, we characterized the expansion and recruitment of glutamate dehydrogenase and acetolactate synthase into scytonemin biosynthesis, and into other central metabolic pathways driving Archaea and Bacteria adaptive evolution. By defining the origin and fate of enzymes, EvoMining complements traditional genome-mining approaches as an unbiased strategy and opens the door to gaining insights into the evolution of NP biosynthesis. We anticipate that EvoMining will be broadly used for evolutionary studies, and for generating predictions of unprecedented chemical scaffolds and new antibiotics. This article contains data hosted by Microreact.

Collapse