1
|
Li W, Zhang Z, Xie B, He Y, He K, Qiu H, Lu Z, Jiang C, Pan X, He Y, Hu W, Liu W, Que T, Hu Y. HiOmics: A cloud-based one-stop platform for the comprehensive analysis of large-scale omics data. Comput Struct Biotechnol J 2024; 23:659-668. [PMID: 38292471 PMCID: PMC10824657 DOI: 10.1016/j.csbj.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 01/01/2024] [Accepted: 01/02/2024] [Indexed: 02/01/2024] Open
Abstract
Analyzing the vast amount of omics data generated comprehensively by high-throughput sequencing technology is of utmost importance for scientists. In this context, we propose HiOmics, a cloud-based platform equipped with nearly 300 plugins designed for the comprehensive analysis and visualization of omics data. HiOmics utilizes the Element Plus framework to craft a user-friendly interface and harnesses Docker container technology to ensure the reliability and reproducibility of data analysis results. Furthermore, HiOmics employs the Workflow Description Language and Cromwell engine to construct workflows, ensuring the portability of data analysis and simplifying the examination of intricate data. Additionally, HiOmics has developed DataCheck, a tool based on Golang, which verifies and converts data formats. Finally, by leveraging the object storage technology and batch computing capabilities of public cloud platforms, HiOmics enables the storage and processing of large-scale data while maintaining resource independence among users.
Collapse
Affiliation(s)
- Wen Li
- Life Sciences Institute, Guangxi Medical University, Nanning, Guangxi, China
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Guangxi Medical University, Nanning, Guangxi, China
- Key Laboratory of Biological Molecular Medicine Research (Guangxi Medical University), Education Department of Guangxi Zhuang Autonomous Region, Nanning, Guangxi, China
| | - Zhining Zhang
- Guangxi Henbio Biotechnology Co., Ltd., Nanning, Guangxi, China
| | - Bo Xie
- Life Sciences Institute, Guangxi Medical University, Nanning, Guangxi, China
| | - Yunlin He
- Guangxi Henbio Biotechnology Co., Ltd., Nanning, Guangxi, China
| | - Kangming He
- Guangxi Henbio Biotechnology Co., Ltd., Nanning, Guangxi, China
| | - Hong Qiu
- Life Sciences Institute, Guangxi Medical University, Nanning, Guangxi, China
- Guangxi Henbio Biotechnology Co., Ltd., Nanning, Guangxi, China
| | - Zhiwei Lu
- Guangxi Henbio Biotechnology Co., Ltd., Nanning, Guangxi, China
| | - Chunlan Jiang
- Guangxi Henbio Biotechnology Co., Ltd., Nanning, Guangxi, China
| | - Xuanyu Pan
- School of Basic Medicine, Guangxi Medical University, Nanning, Guangxi, China
| | - Yuxiao He
- Life Sciences Institute, Guangxi Medical University, Nanning, Guangxi, China
| | - Wenyu Hu
- Guangxi Henbio Biotechnology Co., Ltd., Nanning, Guangxi, China
| | - Wenjian Liu
- Faculty of Data Science, City University of Macau, Macau, China
| | - Tengcheng Que
- Faculty of Data Science, City University of Macau, Macau, China
- Youjiang Medical University for Nationalities, Baise, Guangxi, China
- Guangxi Zhuang Autonomous Terrestrial Wildlife Rescue Research and Epidemic Diseases Monitoring Center, Nanning, Guangxi, China
| | - Yanling Hu
- Life Sciences Institute, Guangxi Medical University, Nanning, Guangxi, China
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Guangxi Medical University, Nanning, Guangxi, China
- Key Laboratory of Biological Molecular Medicine Research (Guangxi Medical University), Education Department of Guangxi Zhuang Autonomous Region, Nanning, Guangxi, China
- Guangxi Henbio Biotechnology Co., Ltd., Nanning, Guangxi, China
- Faculty of Data Science, City University of Macau, Macau, China
| |
Collapse
|
2
|
Leo S, Crusoe MR, Rodríguez-Navas L, Sirvent R, Kanitz A, De Geest P, Wittner R, Pireddu L, Garijo D, Fernández JM, Colonnelli I, Gallo M, Ohta T, Suetake H, Capella-Gutierrez S, de Wit R, Kinoshita BP, Soiland-Reyes S. Recording provenance of workflow runs with RO-Crate. PLoS One 2024; 19:e0309210. [PMID: 39255315 PMCID: PMC11386446 DOI: 10.1371/journal.pone.0309210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 08/08/2024] [Indexed: 09/12/2024] Open
Abstract
Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated objects (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.
Collapse
Affiliation(s)
- Simone Leo
- Center for Advanced Studies, Research, and Development in Sardinia (CRS4), Pula (CA), Italy
| | - Michael R. Crusoe
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- DTL Projects, Utrecht, The Netherlands
- Forschungszentrum Jülich, Jülich, Germany
| | | | - Raül Sirvent
- Barcelona Supercomputing Center, Barcelona, Spain
| | - Alexander Kanitz
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Rudolf Wittner
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
- Institute of Computer Science, Masaryk University, Brno, Czech Republic
- BBMRI-ERIC, Graz, Austria
| | - Luca Pireddu
- Center for Advanced Studies, Research, and Development in Sardinia (CRS4), Pula (CA), Italy
| | - Daniel Garijo
- Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain
| | | | - Iacopo Colonnelli
- Computer Science Department, Università degli Studi di Torino, Torino, Italy
| | - Matej Gallo
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
| | - Tazro Ohta
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Shizuoka, Japan
- Institute for Advanced Academic Research, Chiba University, Chiba, Japan
| | | | | | - Renske de Wit
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | | | - Stian Soiland-Reyes
- Department of Computer Science, The University of Manchester, Manchester, United Kingdom
- Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
3
|
Ziemann M, Poulain P, Bora A. The five pillars of computational reproducibility: bioinformatics and beyond. Brief Bioinform 2023; 24:bbad375. [PMID: 37870287 PMCID: PMC10591307 DOI: 10.1093/bib/bbad375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 09/26/2023] [Accepted: 09/30/2023] [Indexed: 10/24/2023] Open
Abstract
Computational reproducibility is a simple premise in theory, but is difficult to achieve in practice. Building upon past efforts and proposals to maximize reproducibility and rigor in bioinformatics, we present a framework called the five pillars of reproducible computational research. These include (1) literate programming, (2) code version control and sharing, (3) compute environment control, (4) persistent data sharing and (5) documentation. These practices will ensure that computational research work can be reproduced quickly and easily, long into the future. This guide is designed for bioinformatics data analysts and bioinformaticians in training, but should be relevant to other domains of study.
Collapse
Affiliation(s)
- Mark Ziemann
- Deakin University, School of Life and Environmental Sciences, Geelong, Australia
- Burnet Institute, Melbourne, Australia
| | - Pierre Poulain
- Université Paris Cité, CNRS, Institut Jacques Monod, Paris, France
| | - Anusuiya Bora
- Deakin University, School of Life and Environmental Sciences, Geelong, Australia
| |
Collapse
|