1
|
Sachdeva S, Bhatia S, Al Harrasi A, Shah YA, Anwer K, Philip AK, Shah SFA, Khan A, Ahsan Halim S. Unraveling the role of cloud computing in health care system and biomedical sciences. Heliyon 2024; 10:e29044. [PMID: 38601602 PMCID: PMC11004887 DOI: 10.1016/j.heliyon.2024.e29044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 03/24/2024] [Accepted: 03/28/2024] [Indexed: 04/12/2024] Open
Abstract
Cloud computing has emerged as a transformative force in healthcare and biomedical sciences, offering scalable, on-demand resources for managing vast amounts of data. This review explores the integration of cloud computing within these fields, highlighting its pivotal role in enhancing data management, security, and accessibility. We examine the application of cloud computing in various healthcare domains, including electronic medical records, telemedicine, and personalized patient care, as well as its impact on bioinformatics research, particularly in genomics, proteomics, and metabolomics. The review also addresses the challenges and ethical considerations associated with cloud-based healthcare solutions, such as data privacy and cybersecurity. By providing a comprehensive overview, we aim to assist readers in understanding the significance of cloud computing in modern medical applications and its potential to revolutionize both patient care and biomedical research.
Collapse
Affiliation(s)
| | - Saurabh Bhatia
- Natural & Medical Sciences Research Center, University of Nizwa, P.O. Box 33, 616 Birkat Al Mauz, Nizwa, Oman
- School of Health Science, University of Petroleum and Energy Studies, Prem Nagar, Dehradun, Uttarakhand, 248007, India
| | - Ahmed Al Harrasi
- Natural & Medical Sciences Research Center, University of Nizwa, P.O. Box 33, 616 Birkat Al Mauz, Nizwa, Oman
| | - Yasir Abbas Shah
- Natural & Medical Sciences Research Center, University of Nizwa, P.O. Box 33, 616 Birkat Al Mauz, Nizwa, Oman
| | - Khalid Anwer
- Department of Pharmaceutics, College of Pharmacy, Prince Sattam Bin Abdulaziz University, Al-Kharj, 11942, Saudi Arabia
| | - Anil K. Philip
- School of Pharmacy, University of Nizwa, Birkat Al Mouz, Nizwa, 616, Oman
| | - Syed Faisal Abbas Shah
- Faculty of Computer Science & Information Technology, Virtual University of Pakistan, Lahore, 54000, Pakistan
| | - Ajmal Khan
- Natural & Medical Sciences Research Center, University of Nizwa, P.O. Box 33, 616 Birkat Al Mauz, Nizwa, Oman
| | - Sobia Ahsan Halim
- Natural & Medical Sciences Research Center, University of Nizwa, P.O. Box 33, 616 Birkat Al Mauz, Nizwa, Oman
| |
Collapse
|
2
|
Zulfiqar M, Crusoe MR, König-Ries B, Steinbeck C, Peters K, Gadelha L. Implementation of FAIR Practices in Computational Metabolomics Workflows-A Case Study. Metabolites 2024; 14:118. [PMID: 38393009 PMCID: PMC10891576 DOI: 10.3390/metabo14020118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 01/30/2024] [Accepted: 02/07/2024] [Indexed: 02/25/2024] Open
Abstract
Scientific workflows facilitate the automation of data analysis tasks by integrating various software and tools executed in a particular order. To enable transparency and reusability in workflows, it is essential to implement the FAIR principles. Here, we describe our experiences implementing the FAIR principles for metabolomics workflows using the Metabolome Annotation Workflow (MAW) as a case study. MAW is specified using the Common Workflow Language (CWL), allowing for the subsequent execution of the workflow on different workflow engines. MAW is registered using a CWL description on WorkflowHub. During the submission process on WorkflowHub, a CWL description is used for packaging MAW using the Workflow RO-Crate profile, which includes metadata in Bioschemas. Researchers can use this narrative discussion as a guideline to commence using FAIR practices for their bioinformatics or cheminformatics workflows while incorporating necessary amendments specific to their research area.
Collapse
Affiliation(s)
- Mahnoor Zulfiqar
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University Jena, 07743 Jena, Germany;
- Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, 07743 Jena, Germany;
| | - Michael R. Crusoe
- ELIXIR (The European Life-Sciences Infrastructure for Biological Information) Germany, Institute of Bio- and Geosciences (IBG-5)—Computational Metagenomics, Forschungszentrum Jülich GmbH, 52428 Jülich, Germany;
| | - Birgitta König-Ries
- Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, 07743 Jena, Germany;
- Institute for Informatics, Friedrich Schiller University Jena, 07743 Jena, Germany
- iDiv—German Centre for Integrative Biodiversity Research, Halle-Jena-Leipzig, 04103 Leipzig, Germany;
| | - Christoph Steinbeck
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University Jena, 07743 Jena, Germany;
- Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, 07743 Jena, Germany;
| | - Kristian Peters
- iDiv—German Centre for Integrative Biodiversity Research, Halle-Jena-Leipzig, 04103 Leipzig, Germany;
- Geobotany and Botanical Gardens, Martin-Luther University of Halle-Wittenberg, 06108 Halle, Germany
- Leibniz Institute of Plant Biochemistry, 06120 Halle, Germany
| | - Luiz Gadelha
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University Jena, 07743 Jena, Germany;
- Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, 07743 Jena, Germany;
- Institute for Informatics, Friedrich Schiller University Jena, 07743 Jena, Germany
- German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| |
Collapse
|
3
|
Niehues A, de Visser C, Hagenbeek FA, Kulkarni P, Pool R, Karu N, Kindt ASD, Singh G, Vermeiren RRJM, Boomsma DI, van Dongen J, 't Hoen PAC, van Gool AJ. A multi-omics data analysis workflow packaged as a FAIR Digital Object. Gigascience 2024; 13:giad115. [PMID: 38217405 PMCID: PMC10787363 DOI: 10.1093/gigascience/giad115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 11/14/2023] [Accepted: 12/10/2023] [Indexed: 01/15/2024] Open
Abstract
BACKGROUND Applying good data management and FAIR (Findable, Accessible, Interoperable, and Reusable) data principles in research projects can help disentangle knowledge discovery, study result reproducibility, and data reuse in future studies. Based on the concepts of the original FAIR principles for research data, FAIR principles for research software were recently proposed. FAIR Digital Objects enable discovery and reuse of Research Objects, including computational workflows for both humans and machines. Practical examples can help promote the adoption of FAIR practices for computational workflows in the research community. We developed a multi-omics data analysis workflow implementing FAIR practices to share it as a FAIR Digital Object. FINDINGS We conducted a case study investigating shared patterns between multi-omics data and childhood externalizing behavior. The analysis workflow was implemented as a modular pipeline in the workflow manager Nextflow, including containers with software dependencies. We adhered to software development practices like version control, documentation, and licensing. Finally, the workflow was described with rich semantic metadata, packaged as a Research Object Crate, and shared via WorkflowHub. CONCLUSIONS Along with the packaged multi-omics data analysis workflow, we share our experiences adopting various FAIR practices and creating a FAIR Digital Object. We hope our experiences can help other researchers who develop omics data analysis workflows to turn FAIR principles into practice.
Collapse
Affiliation(s)
- Anna Niehues
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, 6525 GA Nijmegen, the Netherlands
| | - Casper de Visser
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Fiona A Hagenbeek
- Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
| | - Purva Kulkarni
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, 6525 GA Nijmegen, the Netherlands
- Department of Human Genetics, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - René Pool
- Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
| | - Naama Karu
- Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden University, 2333 AL Leiden, The Netherlands
| | - Alida S D Kindt
- Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden University, 2333 AL Leiden, The Netherlands
| | - Gurnoor Singh
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Robert R J M Vermeiren
- Department of Child and Adolescent Psychiatry, LUMC-Curium, Leiden University Medical Center, 2342 AK Oegstgeest, The Netherlands
| | - Dorret I Boomsma
- Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
- Amsterdam Reproduction & Development (AR&D) Research Institute, 1081 BT Amsterdam, The Netherlands
| | - Jenny van Dongen
- Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
- Amsterdam Reproduction & Development (AR&D) Research Institute, 1081 BT Amsterdam, The Netherlands
| | - Peter A C 't Hoen
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Alain J van Gool
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, 6525 GA Nijmegen, the Netherlands
| |
Collapse
|
4
|
Dumschott K, Dörpholz H, Laporte MA, Brilhaus D, Schrader A, Usadel B, Neumann S, Arnaud E, Kranz A. Ontologies for increasing the FAIRness of plant research data. FRONTIERS IN PLANT SCIENCE 2023; 14:1279694. [PMID: 38098789 PMCID: PMC10720748 DOI: 10.3389/fpls.2023.1279694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/15/2023] [Indexed: 12/17/2023]
Abstract
The importance of improving the FAIRness (findability, accessibility, interoperability, reusability) of research data is undeniable, especially in the face of large, complex datasets currently being produced by omics technologies. Facilitating the integration of a dataset with other types of data increases the likelihood of reuse, and the potential of answering novel research questions. Ontologies are a useful tool for semantically tagging datasets as adding relevant metadata increases the understanding of how data was produced and increases its interoperability. Ontologies provide concepts for a particular domain as well as the relationships between concepts. By tagging data with ontology terms, data becomes both human- and machine- interpretable, allowing for increased reuse and interoperability. However, the task of identifying ontologies relevant to a particular research domain or technology is challenging, especially within the diverse realm of fundamental plant research. In this review, we outline the ontologies most relevant to the fundamental plant sciences and how they can be used to annotate data related to plant-specific experiments within metadata frameworks, such as Investigation-Study-Assay (ISA). We also outline repositories and platforms most useful for identifying applicable ontologies or finding ontology terms.
Collapse
Affiliation(s)
- Kathryn Dumschott
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Hannah Dörpholz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Marie-Angélique Laporte
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Dominik Brilhaus
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Andrea Schrader
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), University of Cologne, Cologne, Germany
| | - Björn Usadel
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
- Institute for Biological Data Science & Cluster of Excellence on Plant Sciences (CEPLAS), Faculty of Mathematics and Life Sciences, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Steffen Neumann
- Program Center MetaCom, Leibniz Institute of Plant Biochemistry, Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| | - Elizabeth Arnaud
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Angela Kranz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| |
Collapse
|
5
|
Mehta S, Bernt M, Chambers M, Fahrner M, Föll MC, Gruening B, Horro C, Johnson JE, Loux V, Rajczewski AT, Schilling O, Vandenbrouck Y, Gustafsson OJR, Thang WCM, Hyde C, Price G, Jagtap PD, Griffin TJ. A Galaxy of informatics resources for MS-based proteomics. Expert Rev Proteomics 2023; 20:251-266. [PMID: 37787106 DOI: 10.1080/14789450.2023.2265062] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/06/2023] [Indexed: 10/04/2023]
Abstract
INTRODUCTION Continuous advances in mass spectrometry (MS) technologies have enabled deeper and more reproducible proteome characterization and a better understanding of biological systems when integrated with other 'omics data. Bioinformatic resources meeting the analysis requirements of increasingly complex MS-based proteomic data and associated multi-omic data are critically needed. These requirements included availability of software that would span diverse types of analyses, scalability for large-scale, compute-intensive applications, and mechanisms to ease adoption of the software. AREAS COVERED The Galaxy ecosystem meets these requirements by offering a multitude of open-source tools for MS-based proteomics analyses and applications, all in an adaptable, scalable, and accessible computing environment. A thriving global community maintains these software and associated training resources to empower researcher-driven analyses. EXPERT OPINION The community-supported Galaxy ecosystem remains a crucial contributor to basic biological and clinical studies using MS-based proteomics. In addition to the current status of Galaxy-based resources, we describe ongoing developments for meeting emerging challenges in MS-based proteomic informatics. We hope this review will catalyze increased use of Galaxy by researchers employing MS-based proteomics and inspire software developers to join the community and implement new tools, workflows, and associated training content that will add further value to this already rich ecosystem.
Collapse
Affiliation(s)
- Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Matthias Bernt
- Helmholtz Centre for Environmental Research - UFZ, Department Computational Biology, Leipzig, Germany
| | | | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Melanie Christine Föll
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Bjoern Gruening
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Carlos Horro
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
| | - Valentin Loux
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
- Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, Jouy-en-Josas, France
| | - Andrew T Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | | | - W C Mike Thang
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Cameron Hyde
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Sippy Downs, University of the Sunshine Coast, Australia
| | - Gareth Price
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
6
|
Feraud M, O'Brien JW, Samanipour S, Dewapriya P, van Herwerden D, Kaserzon S, Wood I, Rauert C, Thomas KV. InSpectra - A platform for identifying emerging chemical threats. JOURNAL OF HAZARDOUS MATERIALS 2023; 455:131486. [PMID: 37172382 DOI: 10.1016/j.jhazmat.2023.131486] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 04/20/2023] [Accepted: 04/23/2023] [Indexed: 05/14/2023]
Abstract
Non-target analysis (NTA) employing high-resolution mass spectrometry (HRMS) coupled with liquid chromatography is increasingly being used to identify chemicals of biological relevance. HRMS datasets are large and complex making the identification of potentially relevant chemicals extremely challenging. As they are recorded in vendor-specific formats, interpreting them is often reliant on vendor-specific software that may not accommodate advancements in data processing. Here we present InSpectra, a vendor independent automated platform for the systematic detection of newly identified emerging chemical threats. InSpectra is web-based, open-source/access and modular providing highly flexible and extensible NTA and suspect screening workflows. As a cloud-based platform, InSpectra exploits parallel computing and big data archiving capabilities with a focus for sharing and community curation of HRMS data. InSpectra offers a reproducible and transparent approach for the identification, tracking and prioritisation of emerging chemical threats.
Collapse
Affiliation(s)
- Mathieu Feraud
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia
| | - Jake W O'Brien
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia; Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Netherlands.
| | - Saer Samanipour
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia; Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Netherlands; UvA Data Science Center, University of Amsterdam, Netherlands.
| | - Pradeep Dewapriya
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia
| | - Denice van Herwerden
- Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Netherlands
| | - Sarit Kaserzon
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia
| | - Ian Wood
- School of Mathematics and Physics, The University of Queensland, Australia
| | - Cassandra Rauert
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia
| | - Kevin V Thomas
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia
| |
Collapse
|
7
|
Fu J, Zhu F, Xu CJ, Li Y. Metabolomics meets systems immunology. EMBO Rep 2023; 24:e55747. [PMID: 36916532 PMCID: PMC10074123 DOI: 10.15252/embr.202255747] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 12/24/2022] [Accepted: 02/24/2023] [Indexed: 03/16/2023] Open
Abstract
Metabolic processes play a critical role in immune regulation. Metabolomics is the systematic analysis of small molecules (metabolites) in organisms or biological samples, providing an opportunity to comprehensively study interactions between metabolism and immunity in physiology and disease. Integrating metabolomics into systems immunology allows the exploration of the interactions of multilayered features in the biological system and the molecular regulatory mechanism of these features. Here, we provide an overview on recent technological developments of metabolomic applications in immunological research. To begin, two widely used metabolomics approaches are compared: targeted and untargeted metabolomics. Then, we provide a comprehensive overview of the analysis workflow and the computational tools available, including sample preparation, raw spectra data preprocessing, data processing, statistical analysis, and interpretation. Third, we describe how to integrate metabolomics with other omics approaches in immunological studies using available tools. Finally, we discuss new developments in metabolomics and its prospects for immunology research. This review provides guidance to researchers using metabolomics and multiomics in immunity research, thus facilitating the application of systems immunology to disease research.
Collapse
Affiliation(s)
- Jianbo Fu
- Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz Centre for Infection Research (HZI) and Hannover Medical School (MHH), Hannover, Germany.,TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Helmholtz Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Cheng-Jian Xu
- Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz Centre for Infection Research (HZI) and Hannover Medical School (MHH), Hannover, Germany.,TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Helmholtz Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany.,Department of Internal Medicine and Radboud Center for Infectious Diseases, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Yang Li
- Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz Centre for Infection Research (HZI) and Hannover Medical School (MHH), Hannover, Germany.,TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Helmholtz Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany.,Department of Internal Medicine and Radboud Center for Infectious Diseases, Radboud University Medical Center, Nijmegen, The Netherlands
| |
Collapse
|
8
|
Zulfiqar M, Gadelha L, Steinbeck C, Sorokina M, Peters K. MAW: the reproducible Metabolome Annotation Workflow for untargeted tandem mass spectrometry. J Cheminform 2023; 15:32. [PMID: 36871033 PMCID: PMC9985203 DOI: 10.1186/s13321-023-00695-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 02/06/2023] [Indexed: 03/06/2023] Open
Abstract
Mapping the chemical space of compounds to chemical structures remains a challenge in metabolomics. Despite the advancements in untargeted liquid chromatography-mass spectrometry (LC-MS) to achieve a high-throughput profile of metabolites from complex biological resources, only a small fraction of these metabolites can be annotated with confidence. Many novel computational methods and tools have been developed to enable chemical structure annotation to known and unknown compounds such as in silico generated spectra and molecular networking. Here, we present an automated and reproducible Metabolome Annotation Workflow (MAW) for untargeted metabolomics data to further facilitate and automate the complex annotation by combining tandem mass spectrometry (MS2) input data pre-processing, spectral and compound database matching with computational classification, and in silico annotation. MAW takes the LC-MS2 spectra as input and generates a list of putative candidates from spectral and compound databases. The databases are integrated via the R package Spectra and the metabolite annotation tool SIRIUS as part of the R segment of the workflow (MAW-R). The final candidate selection is performed using the cheminformatics tool RDKit in the Python segment (MAW-Py). Furthermore, each feature is assigned a chemical structure and can be imported to a chemical structure similarity network. MAW is following the FAIR (Findable, Accessible, Interoperable, Reusable) principles and has been made available as the docker images, maw-r and maw-py. The source code and documentation are available on GitHub ( https://github.com/zmahnoor14/MAW ). The performance of MAW is evaluated on two case studies. MAW can improve candidate ranking by integrating spectral databases with annotation tools like SIRIUS which contributes to an efficient candidate selection procedure. The results from MAW are also reproducible and traceable, compliant with the FAIR guidelines. Taken together, MAW could greatly facilitate automated metabolite characterization in diverse fields such as clinical metabolomics and natural product discovery.
Collapse
Affiliation(s)
- Mahnoor Zulfiqar
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, 07743, Jena, Germany.
| | - Luiz Gadelha
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, 07743, Jena, Germany
| | - Christoph Steinbeck
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, 07743, Jena, Germany.
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, 07743, Jena, Germany.,Data Science and Artificial Intelligence, Research and Development, Bayer Pharmaceuticals, 13353, Berlin, Germany
| | - Kristian Peters
- iDiv - German Centre for Integrative Biodiversity Research, Halle-Jena-Leipzig, Leipzig, 04103, Germany. .,Geobotany and Botanical Gardens, Martin-Luther University of Halle-Wittenberg, 06108, Halle, Germany. .,Leibniz Institute of Plant Biochemistry, 06120, Halle, Germany.
| |
Collapse
|
9
|
Reference bioimaging to assess the phenotypic trait diversity of bryophytes within the family Scapaniaceae. Sci Data 2022; 9:598. [PMID: 36195605 PMCID: PMC9532418 DOI: 10.1038/s41597-022-01691-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 09/08/2022] [Indexed: 11/18/2022] Open
Abstract
Macro- and microscopic images of organisms are pivotal in biodiversity research. Despite that bioimages have manifold applications such as assessing the diversity of form and function, FAIR bioimaging data in the context of biodiversity are still very scarce, especially for difficult taxonomic groups such as bryophytes. Here, we present a high-quality reference dataset containing macroscopic and bright-field microscopic images documenting various phenotypic characters of the species belonging to the liverwort family of Scapaniaceae occurring in Europe. To encourage data reuse in biodiversity and adjacent research areas, we annotated the imaging data with machine-actionable metadata using community-accepted semantics. Furthermore, raw imaging data are retained and any contextual image processing like multi-focus image fusion and stitching were documented to foster good scientific practices through source tracking and provenance. The information contained in the raw images are also of particular interest for machine learning and image segmentation used in bioinformatics and computational ecology. We expect that this richly annotated reference dataset will encourage future studies to follow our principles. Measurement(s) | phenotype | Technology Type(s) | bright-field microscopy | Factor Type(s) | taxonomic identification of different species | Sample Characteristic - Organism | Scapaniaceae |
Collapse
|
10
|
Tzanakis K, Nattkemper TW, Niehaus K, Albaum SP. MetHoS: a platform for large-scale processing, storage and analysis of metabolomics data. BMC Bioinformatics 2022; 23:267. [PMID: 35804309 PMCID: PMC9270834 DOI: 10.1186/s12859-022-04793-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 06/14/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Modern mass spectrometry has revolutionized the detection and analysis of metabolites but likewise, let the data skyrocket with repositories for metabolomics data filling up with thousands of datasets. While there are many software tools for the analysis of individual experiments with a few to dozens of chromatograms, we see a demand for a contemporary software solution capable of processing and analyzing hundreds or even thousands of experiments in an integrative manner with standardized workflows. RESULTS Here, we introduce MetHoS as an automated web-based software platform for the processing, storage and analysis of great amounts of mass spectrometry-based metabolomics data sets originating from different metabolomics studies. MetHoS is based on Big Data frameworks to enable parallel processing, distributed storage and distributed analysis of even larger data sets across clusters of computers in a highly scalable manner. It has been designed to allow the processing and analysis of any amount of experiments and samples in an integrative manner. In order to demonstrate the capabilities of MetHoS, thousands of experiments were downloaded from the MetaboLights database and used to perform a large-scale processing, storage and statistical analysis in a proof-of-concept study. CONCLUSIONS MetHoS is suitable for large-scale processing, storage and analysis of metabolomics data aiming at untargeted metabolomic analyses. It is freely available at: https://methos.cebitec.uni-bielefeld.de/ . Users interested in analyzing their own data are encouraged to apply for an account.
Collapse
Affiliation(s)
- Konstantinos Tzanakis
- International Research Training Group "Computational Methods for the Analysis of the Diversity and Dynamics of Genomes", Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| | - Tim W Nattkemper
- Biodata Mining Group, Center for Biotechnology (CeBiTec), Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Karsten Niehaus
- Proteome and Metabolome Research, Center for Biotechnology (CeBiTec), Faculty of Biology, Bielefeld University, Bielefeld, Germany
| | - Stefan P Albaum
- Bioinformatics Resource Facility, Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| |
Collapse
|
11
|
Hall RD, D'Auria JC, Silva Ferreira AC, Gibon Y, Kruszka D, Mishra P, van de Zedde R. High-throughput plant phenotyping: a role for metabolomics? TRENDS IN PLANT SCIENCE 2022; 27:549-563. [PMID: 35248492 DOI: 10.1016/j.tplants.2022.02.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 01/18/2022] [Accepted: 02/02/2022] [Indexed: 05/17/2023]
Abstract
High-throughput (HTP) plant phenotyping approaches are developing rapidly and are already helping to bridge the genotype-phenotype gap. However, technologies should be developed beyond current physico-spectral evaluations to extend our analytical capacities to the subcellular level. Metabolites define and determine many key physiological and agronomic features in plants and an ability to integrate a metabolomics approach within current HTP phenotyping platforms has huge potential for added value. While key challenges remain on several fronts, novel technological innovations are upcoming yet under-exploited in a phenotyping context. In this review, we present an overview of the state of the art and how current limitations might be overcome to enable full integration of metabolomics approaches into a generic phenotyping pipeline in the near future.
Collapse
Affiliation(s)
- Robert D Hall
- BU Bioscience, Wageningen University & Research, 6700 AA, Wageningen, The Netherlands; Laboratory of Plant Physiology, Wageningen University, 6700 AA, Wageningen, The Netherlands; Netherlands Metabolomics Centre, Einsteinweg 55, Leiden, The Netherlands.
| | - John C D'Auria
- Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK Gatersleben), Gatersleben, Corrensstraße 3, 06466 Seeland, Germany
| | - Antonio C Silva Ferreira
- Universidade Católica Portuguesa, CBQF-Centro de Biotecnologia e Química Fina-Laboratório Associado, Escola Superior de Biotecnologia, Rua Arquiteto Lobão Vital, Apartado 2511, 4202-401 Porto, Portugal; Faculty of AgriSciences, University of Stellenbosch, Matieland 7602, South Africa; Cork Supply Portugal, S.A., Rua Nova do Fial, 4535, Portugal
| | - Yves Gibon
- UMR 1332 Biologie du Fruit et Pathologie, INRAE, Univ. Bordeaux, INRAE Nouvelle Aquitaine - Bordeaux, Avenue Edouard Bourlaux, Villenave d'Ornon, France; Bordeaux Metabolome, MetaboHUB, INRAE, Univ. Bordeaux, Avenue Edouard Bourlaux, Villenave d'Ornon, France PMB-Metabolome, INRAE, Centre INRAE de Nouvelle, Aquitaine-Bordeaux, Villenave d'Ornon, France
| | - Dariusz Kruszka
- Institute of Plant Genetics, Polish Academy of Sciences, 60-479 Poznan, Poland
| | - Puneet Mishra
- Food and Biobased Research, Wageningen University & Research, 6708 WE, Wageningen, The Netherlands
| | - Rick van de Zedde
- Plant Sciences Group, Wageningen University & Research, 6700 AA, Wageningen, The Netherlands
| |
Collapse
|
12
|
Pinter N, Glätzer D, Fahrner M, Fröhlich K, Johnson J, Grüning BA, Warscheid B, Drepper F, Schilling O, Föll MC. MaxQuant and MSstats in Galaxy Enable Reproducible Cloud-Based Analysis of Quantitative Proteomics Experiments for Everyone. J Proteome Res 2022; 21:1558-1565. [PMID: 35503992 DOI: 10.1021/acs.jproteome.2c00051] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Quantitative mass spectrometry-based proteomics has become a high-throughput technology for the identification and quantification of thousands of proteins in complex biological samples. Two frequently used tools, MaxQuant and MSstats, allow for the analysis of raw data and finding proteins with differential abundance between conditions of interest. To enable accessible and reproducible quantitative proteomics analyses in a cloud environment, we have integrated MaxQuant (including TMTpro 16/18plex), Proteomics Quality Control (PTXQC), MSstats, and MSstatsTMT into the open-source Galaxy framework. This enables the web-based analysis of label-free and isobaric labeling proteomics experiments via Galaxy's graphical user interface on public clouds. MaxQuant and MSstats in Galaxy can be applied in conjunction with thousands of existing Galaxy tools and integrated into standardized, sharable workflows. Galaxy tracks all metadata and intermediate results in analysis histories, which can be shared privately for collaborations or publicly, allowing full reproducibility and transparency of published analysis. To further increase accessibility, we provide detailed hands-on training materials. The integration of MaxQuant and MSstats into the Galaxy framework enables their usage in a reproducible way on accessible large computational infrastructures, hence realizing the foundation for high-throughput proteomics data science for everyone.
Collapse
Affiliation(s)
- Niko Pinter
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany
| | - Damian Glätzer
- Biochemistry and Functional Proteomics, Institute of Biology II, Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany
| | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany.,Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany
| | - Klemens Fröhlich
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany.,Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany.,Spemann Graduate School of Biology and Medicine (SGBM), Albert-Ludwigs-University Freiburg, 79104 Freiburg, Germany
| | - James Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | | | - Bettina Warscheid
- Biochemistry and Functional Proteomics, Institute of Biology II, Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany.,Faculty of Chemistry and Pharmacy, Department of Biochemistry, Julius Maximilian University of Würzburg, 97074 Würzburg, Germany
| | - Friedel Drepper
- Biochemistry and Functional Proteomics, Institute of Biology II, Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany.,German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), 79106 Freiburg, Germany
| | - Melanie Christine Föll
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany.,Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts 02115, United States
| |
Collapse
|
13
|
Fukushima A, Takahashi M, Nagasaki H, Aono Y, Kobayashi M, Kusano M, Saito K, Kobayashi N, Arita M. Development of RIKEN Plant Metabolome MetaDatabase. PLANT & CELL PHYSIOLOGY 2022; 63:433-440. [PMID: 34918130 PMCID: PMC8917833 DOI: 10.1093/pcp/pcab173] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Revised: 11/15/2021] [Accepted: 12/16/2021] [Indexed: 06/14/2023]
Abstract
The advancement of metabolomics in terms of techniques for measuring small molecules has enabled the rapid detection and quantification of numerous cellular metabolites. Metabolomic data provide new opportunities to gain a deeper understanding of plant metabolism that can improve the health of both plants and humans that consume them. Although major public repositories for general metabolomic data have been established, the community still has shortcomings related to data sharing, especially in terms of data reanalysis, reusability and reproducibility. To address these issues, we developed the RIKEN Plant Metabolome MetaDatabase (RIKEN PMM, http://metabobank.riken.jp/pmm/db/plantMetabolomics), which stores mass spectrometry-based (e.g. gas chromatography-MS-based) metabolite profiling data of plants together with their detailed, structured experimental metadata, including sampling and experimental procedures. Our metadata are described as Linked Open Data based on the Resource Description Framework using standardized and controlled vocabularies, such as the Metabolomics Standards Initiative Ontology, which are to be integrated with various life and biomedical science data using the World Wide Web. RIKEN PMM implements intuitive and interactive operations for plant metabolome data, including raw data (netCDF format), mass spectra (NIST MSP format) and metabolite annotations. The feature is suitable not only for biologists who are interested in metabolomic phenotypes, but also for researchers who would like to investigate life science in general through plant metabolomic approaches.
Collapse
Affiliation(s)
| | - Mikiko Takahashi
- Metabolome Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
| | | | - Yusuke Aono
- Degree Programs in Life and Earth Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8572, Japan
| | - Makoto Kobayashi
- Metabolome Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
| | - Miyako Kusano
- Metabolome Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
- Faculty of Life and Environmental Science, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8572, Japan
- Tsukuba Plant Innovation Research Center, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8572, Japan
| | - Kazuki Saito
- Metabolome Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
| | - Norio Kobayashi
- Metabolome Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
- Data Knowledge Organization Unit, RIKEN Information R&D and Strategy Headquarters, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Masanori Arita
- Metabolome Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
- Bioinformation and DDBJ Center, National Institute of Genetics, Yata 1111, Mishima, Shizuoka 411-8540, Japan
| |
Collapse
|
14
|
Trapotsi MA, Hosseini-Gerami L, Bender A. Computational analyses of mechanism of action (MoA): data, methods and integration. RSC Chem Biol 2022; 3:170-200. [PMID: 35360890 PMCID: PMC8827085 DOI: 10.1039/d1cb00069a] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 12/09/2021] [Indexed: 12/15/2022] Open
Abstract
The elucidation of a compound's Mechanism of Action (MoA) is a challenging task in the drug discovery process, but it is important in order to rationalise phenotypic findings and to anticipate potential side-effects. Bioinformatic approaches, advances in machine learning techniques and the increasing deposition of high-throughput data in public databases have significantly contributed to recent advances in the field, but it is not straightforward to decide which data and methods are most suitable to use in a given case. In this review, we focus on these methods and data and their applications in generating MoA hypotheses for subsequent experimental validation. We discuss compound-specific data such as -omics, cell morphology and bioactivity data, as well as commonly used supplementary prior knowledge such as network and pathway data, and provide information on databases where this data can be accessed. In terms of methodologies, we discuss both well-established methods (connectivity mapping, pathway enrichment) as well as more developing methods (neural networks and multi-omics integration). Finally, we review case studies where the MoA of a compound was successfully suggested from computational analysis by incorporating multiple data modalities and/or methodologies. Our aim for this review is to provide researchers with insights into the benefits and drawbacks of both the data and methods in terms of level of understanding, biases and interpretation - and to highlight future avenues of investigation which we foresee will improve the field of MoA elucidation, including greater public access to -omics data and methodologies which are capable of data integration.
Collapse
Affiliation(s)
- Maria-Anna Trapotsi
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Layla Hosseini-Gerami
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Andreas Bender
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| |
Collapse
|
15
|
Gupta S, Sharma U. Metabolomics of neurological disorders in India. ANALYTICAL SCIENCE ADVANCES 2021; 2:594-610. [PMID: 38715858 PMCID: PMC10989583 DOI: 10.1002/ansa.202000169] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 10/31/2021] [Accepted: 11/01/2021] [Indexed: 06/11/2024]
Abstract
Metabolomics is the comprehensive study of the metabolome and its alterations within biological fluids and tissues. Over the years, applications of metabolomics have been explored in several areas, including personalised medicine in diseases, metabolome-wide association studies (MWAS), pharmacometabolomics and in combination with other branches of omics such as proteomics, transcriptomics and genomics. Mass spectrometry (MS) and nuclear magnetic resonance (NMR) spectroscopy are the major analytical techniques widely employed in metabolomics. In addition, MS is coupled with chromatography techniques like gas chromatography (GC) and liquid chromatography (LC) to separate metabolites before analysis. These analytical techniques have made possible identification and quantification of large numbers of metabolites, encompassing characterization of diseases and facilitating a systematic and rational therapeutic strategy based on metabolic patterns. In recent years, the metabolomics approach has been used to obtain a deeper insight into the underlying biochemistry of neurodegenerative disorders and the discovery of biomarkers of clinical implications. The current review mainly focuses on an Indian perspective of metabolomics for the identification of metabolites and metabolic alterations serving as potential diagnostic biomarkers for neurological diseases including acute spinal cord injury, amyotrophic lateral sclerosis, tethered cord syndrome, spina bifida, stroke, Parkinson's disease, glioblastoma and neurological disorders with inborn errors of metabolism.
Collapse
Affiliation(s)
- Sangeetha Gupta
- Amity Institute of PharmacyAmity UniversityNoidaUttar PradeshIndia
| | - Uma Sharma
- Department of NMR & MRI FacilityAll India Institute of Medical SciencesNew DelhiIndia
| |
Collapse
|
16
|
Shrivastava AD, Swainston N, Samanta S, Roberts I, Wright Muelas M, Kell DB. MassGenie: A Transformer-Based Deep Learning Method for Identifying Small Molecules from Their Mass Spectra. Biomolecules 2021; 11:1793. [PMID: 34944436 PMCID: PMC8699281 DOI: 10.3390/biom11121793] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 11/14/2021] [Accepted: 11/27/2021] [Indexed: 12/15/2022] Open
Abstract
The 'inverse problem' of mass spectrometric molecular identification ('given a mass spectrum, calculate/predict the 2D structure of the molecule whence it came') is largely unsolved, and is especially acute in metabolomics where many small molecules remain unidentified. This is largely because the number of experimentally available electrospray mass spectra of small molecules is quite limited. However, the forward problem ('calculate a small molecule's likely fragmentation and hence at least some of its mass spectrum from its structure alone') is much more tractable, because the strengths of different chemical bonds are roughly known. This kind of molecular identification problem may be cast as a language translation problem in which the source language is a list of high-resolution mass spectral peaks and the 'translation' a representation (for instance in SMILES) of the molecule. It is thus suitable for attack using the deep neural networks known as transformers. We here present MassGenie, a method that uses a transformer-based deep neural network, trained on ~6 million chemical structures with augmented SMILES encoding and their paired molecular fragments as generated in silico, explicitly including the protonated molecular ion. This architecture (containing some 400 million elements) is used to predict the structure of a molecule from the various fragments that may be expected to be observed when some of its bonds are broken. Despite being given essentially no detailed nor explicit rules about molecular fragmentation methods, isotope patterns, rearrangements, neutral losses, and the like, MassGenie learns the effective properties of the mass spectral fragment and valency space, and can generate candidate molecular structures that are very close or identical to those of the 'true' molecules. We also use VAE-Sim, a previously published variational autoencoder, to generate candidate molecules that are 'similar' to the top hit. In addition to using the 'top hits' directly, we can produce a rank order of these by 'round-tripping' candidate molecules and comparing them with the true molecules, where known. As a proof of principle, we confine ourselves to positive electrospray mass spectra from molecules with a molecular mass of 500Da or lower, including those in the last CASMI challenge (for which the results are known), getting 49/93 (53%) precisely correct. The transformer method, applied here for the first time to mass spectral interpretation, works extremely effectively both for mass spectra generated in silico and on experimentally obtained mass spectra from pure compounds. It seems to act as a Las Vegas algorithm, in that it either gives the correct answer or simply states that it cannot find one. The ability to create and to 'learn' millions of fragmentation patterns in silico, and therefrom generate candidate structures (that do not have to be in existing libraries) directly, thus opens up entirely the field of de novo small molecule structure prediction from experimental mass spectra.
Collapse
Affiliation(s)
- Aditya Divyakant Shrivastava
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
- Department of Computer Science and Engineering, Nirma University, Ahmedabad 382481, India
| | - Neil Swainston
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
- Mellizyme Biotechnology Ltd., Liverpool Science Park IC1, 131 Mount Pleasant, Liverpool L3 5TF, UK
| | - Soumitra Samanta
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
| | - Ivayla Roberts
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
| | - Marina Wright Muelas
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
| | - Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
- Mellizyme Biotechnology Ltd., Liverpool Science Park IC1, 131 Mount Pleasant, Liverpool L3 5TF, UK
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
17
|
David A, Chaker J, Price EJ, Bessonneau V, Chetwynd AJ, Vitale CM, Klánová J, Walker DI, Antignac JP, Barouki R, Miller GW. Towards a comprehensive characterisation of the human internal chemical exposome: Challenges and perspectives. ENVIRONMENT INTERNATIONAL 2021; 156:106630. [PMID: 34004450 DOI: 10.1016/j.envint.2021.106630] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 04/15/2021] [Accepted: 05/03/2021] [Indexed: 05/18/2023]
Abstract
The holistic characterisation of the human internal chemical exposome using high-resolution mass spectrometry (HRMS) would be a step forward to investigate the environmental ætiology of chronic diseases with an unprecedented precision. HRMS-based methods are currently operational to reproducibly profile thousands of endogenous metabolites as well as externally-derived chemicals and their biotransformation products in a large number of biological samples from human cohorts. These approaches provide a solid ground for the discovery of unrecognised biomarkers of exposure and metabolic effects associated with many chronic diseases. Nevertheless, some limitations remain and have to be overcome so that chemical exposomics can provide unbiased detection of chemical exposures affecting disease susceptibility in epidemiological studies. Some of these limitations include (i) the lack of versatility of analytical techniques to capture the wide diversity of chemicals; (ii) the lack of analytical sensitivity that prevents the detection of exogenous (and endogenous) chemicals occurring at (ultra) trace levels from restricted sample amounts, and (iii) the lack of automation of the annotation/identification process. In this article, we discuss a number of technological and methodological limitations hindering applications of HRMS-based methods and propose initial steps to push towards a more comprehensive characterisation of the internal chemical exposome. We also discuss other challenges including the need for harmonisation and the difficulty inherent in assessing the dynamic nature of the internal chemical exposome, as well as the need for establishing a strong international collaboration, high level networking, and sustainable research infrastructure. A great amount of research, technological development and innovative bio-informatics tools are still needed to profile and characterise the "invisible" (not profiled), "hidden" (not detected) and "dark" (not annotated) components of the internal chemical exposome and concerted efforts across numerous research fields are paramount.
Collapse
Affiliation(s)
- Arthur David
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35000 Rennes, France.
| | - Jade Chaker
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35000 Rennes, France
| | - Elliott J Price
- Faculty of Sports Studies, Masaryk University, Brno, Czech Republic; RECETOX Centre, Masaryk University, Brno, Czech Republic
| | - Vincent Bessonneau
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35000 Rennes, France
| | - Andrew J Chetwynd
- School of Geography Earth and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK
| | | | - Jana Klánová
- RECETOX Centre, Masaryk University, Brno, Czech Republic
| | - Douglas I Walker
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | | | - Robert Barouki
- Unité UMR-S 1124 Inserm-Université Paris Descartes "Toxicologie Pharmacologie et Signalisation Cellulaire", Paris, France
| | - Gary W Miller
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY, USA
| |
Collapse
|
18
|
MSCAT: A Machine Learning Assisted Catalog of Metabolomics Software Tools. Metabolites 2021; 11:metabo11100678. [PMID: 34677393 PMCID: PMC8540572 DOI: 10.3390/metabo11100678] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/18/2021] [Accepted: 09/22/2021] [Indexed: 01/06/2023] Open
Abstract
The bottleneck for taking full advantage of metabolomics data is often the availability, awareness, and usability of analysis tools. Software tools specifically designed for metabolomics data are being developed at an increasing rate, with hundreds of available tools already in the literature. Many of these tools are open-source and freely available but are very diverse with respect to language, data formats, and stages in the metabolomics pipeline. To help mitigate the challenges of meeting the increasing demand for guidance in choosing analytical tools and coordinating the adoption of best practices for reproducibility, we have designed and built the MSCAT (Metabolomics Software CATalog) database of metabolomics software tools that can be sustainably and continuously updated. This database provides a survey of the landscape of available tools and can assist researchers in their selection of data analysis workflows for metabolomics studies according to their specific needs. We used machine learning (ML) methodology for the purpose of semi-automating the identification of metabolomics software tool names within abstracts. MSCAT searches the literature to find new software tools by implementing a Named Entity Recognition (NER) model based on a neural network model at the sentence level composed of a character-level convolutional neural network (CNN) combined with a bidirectional long-short-term memory (LSTM) layer and a conditional random fields (CRF) layer. The list of potential new tools (and their associated publication) is then forwarded to the database maintainer for the curation of the database entry corresponding to the tool. The end-user interface allows for filtering of tools by multiple characteristics as well as plotting of the aggregate tool data to monitor the metabolomics software landscape.
Collapse
|
19
|
Johnson D, Batista D, Cochrane K, Davey RP, Etuk A, Gonzalez-Beltran A, Haug K, Izzo M, Larralde M, Lawson TN, Minotto A, Moreno P, Nainala VC, O'Donovan C, Pireddu L, Roger P, Shaw F, Steinbeck C, Weber RJM, Sansone SA, Rocca-Serra P. ISA API: An open platform for interoperable life science experimental metadata. Gigascience 2021; 10:giab060. [PMID: 34528664 PMCID: PMC8444265 DOI: 10.1093/gigascience/giab060] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 03/19/2021] [Accepted: 08/23/2021] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND The Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open source community specifications and software tools for enabling discovery, exchange, and publication of metadata from experiments in the life sciences. The original ISA software suite provided a set of user-facing Java tools for creating and manipulating the information structured in ISA-Tab-a now widely used tabular format. To make the ISA framework more accessible to machines and enable programmatic manipulation of experiment metadata, the JSON serialization ISA-JSON was developed. RESULTS In this work, we present the ISA API, a Python library for the creation, editing, parsing, and validating of ISA-Tab and ISA-JSON formats by using a common data model engineered as Python object classes. We describe the ISA API feature set, early adopters, and its growing user community. CONCLUSIONS The ISA API provides users with rich programmatic metadata-handling functionality to support automation, a common interface, and an interoperable medium between the 2 ISA formats, as well as with other life science data formats required for depositing data in public databases.
Collapse
Affiliation(s)
- David Johnson
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
- Department of Informatics and Media, Uppsala University, Box 513, 75120 Uppsala, Sweden
| | - Dominique Batista
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Keeva Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Robert P Davey
- Earlham Institute, Data infrastructure and algorithms, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Anthony Etuk
- Earlham Institute, Data infrastructure and algorithms, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Alejandra Gonzalez-Beltran
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
- Science and Technology Facilities Council, Scientific Computing Department, Rutherford Appleton Laboratory, Harwell Campus, Didcot, OX11 0QX, UK
| | - Kenneth Haug
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- Genome Research Limited, Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Saffron Walden, CB10 1RQ, UK
| | - Massimiliano Izzo
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Martin Larralde
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Thomas N Lawson
- School of Biosciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Alice Minotto
- Earlham Institute, Data infrastructure and algorithms, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Venkata Chandrasekhar Nainala
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Claire O'Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Luca Pireddu
- Distributed Computing Group, CRS4: Center for Advanced Studies, Research & Development in Sardinia, Pula 09050, Italy
| | - Pierrick Roger
- CEA, LIST, Laboratory for Data Analysis and Systems’ Intelligence, MetaboHUB, Gif-Sur-Yvette F-91191, France
| | - Felix Shaw
- Earlham Institute, Data infrastructure and algorithms, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Christoph Steinbeck
- Cheminformatics and Computational Metabolomics, Institute for Analytical Chemistry, Lessingstr. 8, 07743 Jena, Germany
| | - Ralf J M Weber
- School of Biosciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
- Phenome Centre Birmingham, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| |
Collapse
|
20
|
Kikuchi J, Yamada S. The exposome paradigm to predict environmental health in terms of systemic homeostasis and resource balance based on NMR data science. RSC Adv 2021; 11:30426-30447. [PMID: 35480260 PMCID: PMC9041152 DOI: 10.1039/d1ra03008f] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Accepted: 08/31/2021] [Indexed: 12/22/2022] Open
Abstract
The environment, from microbial ecosystems to recycled resources, fluctuates dynamically due to many physical, chemical and biological factors, the profile of which reflects changes in overall state, such as environmental illness caused by a collapse of homeostasis. To evaluate and predict environmental health in terms of systemic homeostasis and resource balance, a comprehensive understanding of these factors requires an approach based on the "exposome paradigm", namely the totality of exposure to all substances. Furthermore, in considering sustainable development to meet global population growth, it is important to gain an understanding of both the circulation of biological resources and waste recycling in human society. From this perspective, natural environment, agriculture, aquaculture, wastewater treatment in industry, biomass degradation and biodegradable materials design are at the forefront of current research. In this respect, nuclear magnetic resonance (NMR) offers tremendous advantages in the analysis of samples of molecular complexity, such as crude bio-extracts, intact cells and tissues, fibres, foods, feeds, fertilizers and environmental samples. Here we outline examples to promote an understanding of recent applications of solution-state, solid-state, time-domain NMR and magnetic resonance imaging (MRI) to the complex evaluation of organisms, materials and the environment. We also describe useful databases and informatics tools, as well as machine learning techniques for NMR analysis, demonstrating that NMR data science can be used to evaluate the exposome in both the natural environment and human society towards a sustainable future.
Collapse
Affiliation(s)
- Jun Kikuchi
- Environmental Metabolic Analysis Research Team, RIKEN Center for Sustainable Resource Science 1-7-22 Suehiro-cho, Tsurumi-ku Yokohama 230-0045 Japan
- Graduate School of Bioagricultural Sciences, Nagoya University Furo-cho, Chikusa-ku Nagoya 464-8601 Japan
- Graduate School of Medical Life Science, Yokohama City University 1-7-29 Suehiro-cho, Tsurumi-ku Yokohama 230-0045 Japan
| | - Shunji Yamada
- Environmental Metabolic Analysis Research Team, RIKEN Center for Sustainable Resource Science 1-7-22 Suehiro-cho, Tsurumi-ku Yokohama 230-0045 Japan
- Prediction Science Laboratory, RIKEN Cluster for Pioneering Research 7-1-26 Minatojima-minami-machi, Chuo-ku Kobe 650-0047 Japan
- Data Assimilation Research Team, RIKEN Center for Computational Science 7-1-26 Minatojima-minami-machi, Chuo-ku Kobe 650-0047 Japan
| |
Collapse
|
21
|
Goonasekera N, Mahmoud A, Chilton J, Afgan E. GalaxyCloudRunner: enhancing scalable computing for Galaxy. Bioinformatics 2021; 37:1763-1765. [PMID: 33104194 DOI: 10.1093/bioinformatics/btaa860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 08/18/2020] [Accepted: 10/11/2020] [Indexed: 11/13/2022] Open
Abstract
SUMMARY The existence of more than 100 public Galaxy servers with service quotas is indicative of the need for an increased availability of compute resources for Galaxy to use. The GalaxyCloudRunner enables a Galaxy server to easily expand its available compute capacity by sending user jobs to cloud resources. User jobs are routed to the acquired resources based on a set of configurable rules and the resources can be dynamically acquired from any of four popular cloud providers (AWS, Azure, GCP or OpenStack) in an automated fashion. AVAILABILITY AND IMPLEMENTATION GalaxyCloudRunner is implemented in Python and leverages Docker containers. The source code is MIT licensed and available at https://github.com/cloudve/galaxycloudrunner. The documentation is available at http://gcr.cloudve.org/.
Collapse
Affiliation(s)
- Nuwan Goonasekera
- Melbourne Bioinformatics, Faculty of Medicine, Dentistry & Health Sciences, University of Melbourne, Melbourne, VIC 3010, Australia
| | - Alexandru Mahmoud
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - John Chilton
- Department of Biochemistry and Molecular Biology, Penn State University, State College, PA 16801, USA
| | - Enis Afgan
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
22
|
Pérez-Jiménez M, Sherman E, Pozo-Bayón MA, Pinu FR. Application of untargeted volatile profiling and data driven approaches in wine flavoromics research. Food Res Int 2021; 145:110392. [PMID: 34112395 DOI: 10.1016/j.foodres.2021.110392] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 03/31/2021] [Accepted: 05/04/2021] [Indexed: 11/28/2022]
Abstract
Traditional flavor chemistry research usually makes use of targeted approaches by focusing on the detection and quantification of key flavor active metabolites that are present in food and beverages. In the last decade, flavoromics has emerged as an alternative to targeted methods where non-targeted and data driven approaches have been used to determine as many metabolites as possible with the aim to establish relationships among the chemical composition of foods and their sensory properties. Flavoromics has been successfully applied in wine research to gain more insights into the impact of a wide range of flavor active metabolites on wine quality. In this review, we aim to provide an overview of the applications of flavoromics approaches in wine research based on existing literature mainly by focusing on untargeted volatile profiling of wines and how this can be used as a powerful tool to generate novel insights. We highlight the fact that untargeted volatile profiling used in flavoromics approaches ultimately can assist the wine industry to produce different wine styles and to market existing wines appropriately based on consumer preference. In addition to summarizing the main steps involved in untargeted volatile profiling, we also provide an outlook about future perspectives and challenges of wine flavoromics research.
Collapse
Affiliation(s)
- Maria Pérez-Jiménez
- Institute of Food Science Research (CIAL), CSIC-UAM, C/Nicolás Cabrera, 28049 Madrid, Spain
| | - Emma Sherman
- The New Zealand Institute for Plant and Food Research Limited, Private Bag 92169, Auckland 1142, New Zealand
| | - M A Pozo-Bayón
- Institute of Food Science Research (CIAL), CSIC-UAM, C/Nicolás Cabrera, 28049 Madrid, Spain
| | - Farhana R Pinu
- The New Zealand Institute for Plant and Food Research Limited, Private Bag 92169, Auckland 1142, New Zealand.
| |
Collapse
|
23
|
Spjuth O, Frid J, Hellander A. The machine learning life cycle and the cloud: implications for drug discovery. Expert Opin Drug Discov 2021; 16:1071-1079. [PMID: 34057379 DOI: 10.1080/17460441.2021.1932812] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Introduction: Artificial intelligence (AI) and machine learning (ML) are increasingly used in many aspects of drug discovery. Larger data sizes and methods such as Deep Neural Networks contribute to challenges in data management, the required software stack, and computational infrastructure. There is an increasing need in drug discovery to continuously re-train models and make them available in production environments.Areas covered: This article describes how cloud computing can aid the ML life cycle in drug discovery. The authors discuss opportunities with containerization and scientific workflows and introduce the concept of MLOps and describe how it can facilitate reproducible and robust ML modeling in drug discovery organizations. They also discuss ML on private, sensitive and regulated data.Expert opinion: Cloud computing offers a compelling suite of building blocks to sustain the ML life cycle integrated in iterative drug discovery. Containerization and platforms such as Kubernetes together with scientific workflows can enable reproducible and resilient analysis pipelines, and the elasticity and flexibility of cloud infrastructures enables scalable and efficient access to compute resources. Drug discovery commonly involves working with sensitive or private data, and cloud computing and federated learning can contribute toward enabling collaborative drug discovery within and between organizations.Abbreviations: AI = Artificial Intelligence; DL = Deep Learning; GPU = Graphics Processing Unit; IaaS = Infrastructure as a Service; K8S = Kubernetes; ML = Machine Learning; MLOps = Machine Learning and Operations; PaaS = Platform as a Service; QC = Quality Control; SaaS = Software as a Service.
Collapse
Affiliation(s)
- Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala Sweden.,Scaleout Systems AB, Sweden
| | | | - Andreas Hellander
- Scaleout Systems AB, Sweden.,Department of Information Technology, Uppsala University, Sweden
| |
Collapse
|
24
|
Misra BB. Advances in high resolution GC-MS technology: a focus on the application of GC-Orbitrap-MS in metabolomics and exposomics for FAIR practices. ANALYTICAL METHODS : ADVANCING METHODS AND APPLICATIONS 2021; 13:2265-2282. [PMID: 33987631 DOI: 10.1039/d1ay00173f] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Gas chromatography-mass spectrometry (GC-MS) provides a complementary analytical platform for capturing volatiles, non-polar and (derivatized) polar metabolites and exposures from a diverse array of matrixes. High resolution (HR) GC-MS as a data generation platform can capture data on analytes that are usually not detectable/quantifiable in liquid chromatography mass-spectrometry-based solutions. With the rise of high-resolution accurate mass (HRAM) GC-MS systems such as GC-Orbitrap-MS in the last decade after the time-of-flight (ToF) renaissance, numerous applications have been found in the fields of metabolomics and exposomics. In a short span of time, a multitude of studies have used GC-Orbitrap-MS to generate exciting new high throughput data spanning from diverse basic to applied research areas. The GC-Orbitrap-MS has found application in both targeted and untargeted efforts for capturing metabolomes and exposomes across diverse studies. In this review, I capture and summarize all the reported studies to date, and provide a snapshot of the milieu of commercial and open-source software solutions, spectral libraries, and informatics solutions available to a GC-Orbitrap-MS system instrument user or a data analyst dealing with these datasets. Lastly, but importantly, I provide an account on data sharing and meta-data capturing solutions that are available to make HRAM GC-MS based metabolomics and exposomics studies findable, accessible, interoperable, and reproducible (FAIR). These FAIR practices would allow data generators and users of GC-HRMS instruments to help the community of GC-MS researchers to collaborate and co-develop exciting tools and algorithms in the future.
Collapse
Affiliation(s)
- Biswapriya B Misra
- Independent Researcher, Pine-211, Raintree Park Dwaraka Krishna, Namburu, AP-522508, India.
| |
Collapse
|
25
|
Alvarez RV, Mariño-Ramírez L, Landsman D. Transcriptome annotation in the cloud: complexity, best practices, and cost. Gigascience 2021; 10:6123656. [PMID: 33511996 DOI: 10.1093/gigascience/giaa163] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 11/13/2020] [Accepted: 12/23/2020] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND The NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiative provides NIH-funded researchers cost-effective access to commercial cloud providers, such as Amazon Web Services (AWS) and Google Cloud Platform (GCP). These cloud providers represent an alternative for the execution of large computational biology experiments like transcriptome annotation, which is a complex analytical process that requires the interrogation of multiple biological databases with several advanced computational tools. The core components of annotation pipelines published since 2012 are BLAST sequence alignments using annotated databases of both nucleotide or protein sequences almost exclusively with networked on-premises compute systems. FINDINGS We compare multiple BLAST sequence alignments using AWS and GCP. We prepared several Jupyter Notebooks with all the code required to submit computing jobs to the batch system on each cloud provider. We consider the consequence of the number of query transcripts in input files and the effect on cost and processing time. We tested compute instances with 16, 32, and 64 vCPUs on each cloud provider. Four classes of timing results were collected: the total run time, the time for transferring the BLAST databases to the instance local solid-state disk drive, the time to execute the CWL script, and the time for the creation, set-up, and release of an instance. This study aims to establish an estimate of the cost and compute time needed for the execution of multiple BLAST runs in a cloud environment. CONCLUSIONS We demonstrate that public cloud providers are a practical alternative for the execution of advanced computational biology experiments at low cost. Using our cloud recipes, the BLAST alignments required to annotate a transcriptome with ∼500,000 transcripts can be processed in <2 hours with a compute cost of ∼$200-$250. In our opinion, for BLAST-based workflows, the choice of cloud platform is not dependent on the workflow but, rather, on the specific details and requirements of the cloud provider. These choices include the accessibility for institutional use, the technical knowledge required for effective use of the platform services, and the availability of open source frameworks such as APIs to deploy the workflow.
Collapse
Affiliation(s)
- Roberto Vera Alvarez
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, NIH, 9000 Rockville Pike, Bethesda, MD 20890, USA
| | - Leonardo Mariño-Ramírez
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, NIH, 9000 Rockville Pike, Bethesda, MD 20890, USA
| | - David Landsman
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, NIH, 9000 Rockville Pike, Bethesda, MD 20890, USA
| |
Collapse
|
26
|
Chang HY, Colby SM, Du X, Gomez JD, Helf MJ, Kechris K, Kirkpatrick CR, Li S, Patti GJ, Renslow RS, Subramaniam S, Verma M, Xia J, Young JD. A Practical Guide to Metabolomics Software Development. Anal Chem 2021; 93:1912-1923. [PMID: 33467846 PMCID: PMC7859930 DOI: 10.1021/acs.analchem.0c03581] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
![]()
A growing number
of software tools have been developed for metabolomics
data processing and analysis. Many new tools are contributed by metabolomics
practitioners who have limited prior experience with software development,
and the tools are subsequently implemented by users with expertise
that ranges from basic point-and-click data analysis to advanced coding.
This Perspective is intended to introduce metabolomics software users
and developers to important considerations that determine the overall
impact of a publicly available tool within the scientific community.
The recommendations reflect the collective experience of an NIH-sponsored
Metabolomics Consortium working group that was formed with the goal
of researching guidelines and best practices for metabolomics tool
development. The recommendations are aimed at metabolomics researchers
with little formal background in programming and are organized into
three stages: (i) preparation, (ii) tool development, and (iii) distribution
and maintenance.
Collapse
Affiliation(s)
- Hui-Yin Chang
- Department of Pathology, University of Michigan, 1301 Catherine Street, Ann Arbor, Michigan 48109, United States.,Department of Biomedical Sciences and Engineering, National Central University, No. 300, Zhongda Road, Zhongli District, Taoyuan City 320, Taiwan
| | - Sean M Colby
- Biological Sciences Division, Pacific Northwest National Laboratory, P.O. Box 999, MSIN: K8-98, Richland, Washington 99352, United States
| | - Xiuxia Du
- Department of Bioinformatics & Genomics, University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, North Carolina 28223, United States
| | - Javier D Gomez
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, PMB 351604, 2301 Vanderbilt Place, Nashville, Tennessee 37235, United States
| | - Maximilian J Helf
- Boyce Thompson Institute and Department of Chemistry and Chemical Biology, Cornell University, 533 Tower Road, Ithaca, New York 14853, United States
| | - Katerina Kechris
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, 13001 East 17th Place B119, Aurora, Colorado 80045, United States
| | - Christine R Kirkpatrick
- San Diego Supercomputer Center, University of California San Diego, MC 0505, 9500 Gilman Drive, La Jolla, California 92093, United States
| | - Shuzhao Li
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, Connecticut 06032, United States
| | - Gary J Patti
- Department of Chemistry, Department of Medicine, and Siteman Cancer Center, Washington University in St. Louis, CB 1134, One Brookings Drive, St. Louis, Missouri 63130, United States
| | - Ryan S Renslow
- Biological Sciences Division, Pacific Northwest National Laboratory, P.O. Box 999, MSIN: K8-98, Richland, Washington 99352, United States.,Gene and Linda Voiland School of Chemical Engineering and Bioengineering, Washington State University, P.O. Box 646515, Pullman, Washington 99164, United States
| | - Shankar Subramaniam
- San Diego Supercomputer Center, University of California San Diego, MC 0505, 9500 Gilman Drive, La Jolla, California 92093, United States.,Department of Bioengineering, Department of Computer Science and Engineering, Department of Cellular and Molecular Medicine, and Department of Chemistry and Biochemistry, University of California San Diego, 9500 Gilman Drive #0412, La Jolla, California 92093, United States
| | - Mukesh Verma
- Epidemiology and Genomics Research Program, National Cancer Institute, National Institutes of Health, Suite 4E102, 9609 Medical Center Drive, MSC 9763, Rockville, Maryland 20850, United States
| | - Jianguo Xia
- Faculty of Agricultural and Environmental Sciences, McGill University, 21111 Lakeshore Road, Ste. Anne de Bellevue, Quebec H9X 3 V9, Canada
| | - Jamey D Young
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, PMB 351604, 2301 Vanderbilt Place, Nashville, Tennessee 37235, United States.,Department of Molecular Physiology and Biophysics, Vanderbilt University, PMB 351604, 2301 Vanderbilt Place, Nashville, Tennessee 37235, United States
| |
Collapse
|
27
|
Vera Alvarez R, Pongor L, Mariño-Ramírez L, Landsman D. PM4NGS, a project management framework for next-generation sequencing data analysis. Gigascience 2021; 10:giaa141. [PMID: 33410471 PMCID: PMC7788391 DOI: 10.1093/gigascience/giaa141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 10/14/2020] [Accepted: 11/16/2020] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND FAIR (Findability, Accessibility, Interoperability, and Reusability) next-generation sequencing (NGS) data analysis relies on complex computational biology workflows and pipelines to guarantee reproducibility, portability, and scalability. Moreover, workflow languages, managers, and container technologies have helped address the problem of data analysis pipeline execution across multiple platforms in scalable ways. FINDINGS Here, we present a project management framework for NGS data analysis called PM4NGS. This framework is composed of an automatic creation of a standard organizational structure of directories and files, bioinformatics tool management using Docker or Bioconda, and data analysis pipelines in CWL format. Pre-configured Jupyter notebooks with minimum Python code are included in PM4NGS to produce a project report and publication-ready figures. We present 3 pipelines for demonstration purposes including the analysis of RNA-Seq, ChIP-Seq, and ChIP-exo datasets. CONCLUSIONS PM4NGS is an open source framework that creates a standard organizational structure for NGS data analysis projects. PM4NGS is easy to install, configure, and use by non-bioinformaticians on personal computers and laptops. It permits execution of the NGS data analysis on Windows 10 with the Windows Subsystem for Linux feature activated. The framework aims to reduce the gap between researcher in experimental laboratories producing NGS data and workflows for data analysis. PM4NGS documentation can be accessed at https://pm4ngs.readthedocs.io/.
Collapse
Affiliation(s)
- Roberto Vera Alvarez
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, 8900 Rockville Pike, NIH, Bethesda, MD 20894, USA
| | - Lorinc Pongor
- Developmental Therapeutics Branch and Laboratory of Molecular Pharmacology, Center for Cancer Research, National Cancer Institute, 8900 Rockville Pike, NIH, Bethesda, MD 20894, USA
| | - Leonardo Mariño-Ramírez
- Division of Intramural Research, National Institute on Minority Health and Health Disparities, 8900 Rockville Pike, NIH, Bethesda, MD 20894, USA
| | - David Landsman
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, 8900 Rockville Pike, NIH, Bethesda, MD 20894, USA
| |
Collapse
|
28
|
Edison AS, Colonna M, Gouveia GJ, Holderman NR, Judge MT, Shen X, Zhang S. NMR: Unique Strengths That Enhance Modern Metabolomics Research. Anal Chem 2020; 93:478-499. [DOI: 10.1021/acs.analchem.0c04414] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
29
|
Yang Q, Wang Y, Zhang Y, Li F, Xia W, Zhou Y, Qiu Y, Li H, Zhu F. NOREVA: enhanced normalization and evaluation of time-course and multi-class metabolomic data. Nucleic Acids Res 2020; 48:W436-W448. [PMID: 32324219 PMCID: PMC7319444 DOI: 10.1093/nar/gkaa258] [Citation(s) in RCA: 129] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2020] [Revised: 03/21/2020] [Accepted: 04/04/2020] [Indexed: 12/23/2022] Open
Abstract
Biological processes (like microbial growth & physiological response) are usually dynamic and require the monitoring of metabolic variation at different time-points. Moreover, there is clear shift from case-control (N=2) study to multi-class (N>2) problem in current metabolomics, which is crucial for revealing the mechanisms underlying certain physiological process, disease metastasis, etc. These time-course and multi-class metabolomics have attracted great attention, and data normalization is essential for removing unwanted biological/experimental variations in these studies. However, no tool (including NOREVA 1.0 focusing only on case-control studies) is available for effectively assessing the performance of normalization method on time-course/multi-class metabolomic data. Thus, NOREVA was updated to version 2.0 by (i) realizing normalization and evaluation of both time-course and multi-class metabolomic data, (ii) integrating 144 normalization methods of a recently proposed combination strategy and (iii) identifying the well-performing methods by comprehensively assessing the largest set of normalizations (168 in total, significantly larger than those 24 in NOREVA 1.0). The significance of this update was extensively validated by case studies on benchmark datasets. All in all, NOREVA 2.0 is distinguished for its capability in identifying well-performing normalization method(s) for time-course and multi-class metabolomics, which makes it an indispensable complement to other available tools. NOREVA can be accessed at https://idrblab.org/noreva/.
Collapse
Affiliation(s)
- Qingxia Yang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ying Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Weiqi Xia
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ying Zhou
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation & The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, China
| | - Yunqing Qiu
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation & The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, China
| | - Honglin Li
- School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| |
Collapse
|
30
|
Capuccini M, Dahlö M, Toor S, Spjuth O. MaRe: Processing Big Data with application containers on Apache Spark. Gigascience 2020; 9:giaa042. [PMID: 32369166 PMCID: PMC7199472 DOI: 10.1093/gigascience/giaa042] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Revised: 02/10/2020] [Accepted: 04/07/2020] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Life science is increasingly driven by Big Data analytics, and the MapReduce programming model has been proven successful for data-intensive analyses. However, current MapReduce frameworks offer poor support for reusing existing processing tools in bioinformatics pipelines. Furthermore, these frameworks do not have native support for application containers, which are becoming popular in scientific data processing. RESULTS Here we present MaRe, an open source programming library that introduces support for Docker containers in Apache Spark. Apache Spark and Docker are the MapReduce framework and container engine that have collected the largest open source community; thus, MaRe provides interoperability with the cutting-edge software ecosystem. We demonstrate MaRe on 2 data-intensive applications in life science, showing ease of use and scalability. CONCLUSIONS MaRe enables scalable data-intensive processing in life science with Apache Spark and application containers. When compared with current best practices, which involve the use of workflow systems, MaRe has the advantage of providing data locality, ingestion from heterogeneous storage systems, and interactive processing. MaRe is generally applicable and available as open source software.
Collapse
Affiliation(s)
- Marco Capuccini
- Department of Information Technology, Uppsala University, Box 337, 75105, Uppsala, Sweden
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Martin Dahlö
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
- Science for Life Laboratory, Uppsala University, Box 591, 751 24, Uppsala, Sweden
- Uppsala Multidisciplinary Center for Advanced Computational Science, Uppsala University, Box 337, 75105, Uppsala, Sweden
| | - Salman Toor
- Department of Information Technology, Uppsala University, Box 337, 75105, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| |
Collapse
|
31
|
|
32
|
McLean C, Kujawinski EB. AutoTuner: High Fidelity and Robust Parameter Selection for Metabolomics Data Processing. Anal Chem 2020; 92:5724-5732. [PMID: 32212641 PMCID: PMC7310949 DOI: 10.1021/acs.analchem.9b04804] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
![]()
Untargeted
metabolomics experiments provide a snapshot of cellular
metabolism but remain challenging to interpret due to the computational
complexity involved in data processing and analysis. Prior to any
interpretation, raw data must be processed to remove noise and to
align mass-spectral peaks across samples. This step requires selection
of dataset-specific parameters, as erroneous parameters can result
in noise inflation. While several algorithms exist to automate parameter
selection, each depends on gradient descent optimization functions.
In contrast, our new parameter optimization algorithm, AutoTuner,
obtains parameter estimates from raw data in a single step as opposed
to many iterations. Here, we tested the accuracy and the run-time
of AutoTuner in comparison to isotopologue parameter optimization
(IPO), the most commonly used parameter selection tool, and compared
the resulting parameters’ influence on the properties of feature
tables after processing. We performed a Monte Carlo experiment to
test the robustness of AutoTuner parameter selection and found that
AutoTuner generated similar parameter estimates from random subsets
of samples. We conclude that AutoTuner is a desirable alternative
to existing tools, because it is scalable, highly robust, and very
fast (∼100–1000× speed improvement from other algorithms
going from days to minutes). AutoTuner is freely available as an R
package through BioConductor.
Collapse
Affiliation(s)
- Craig McLean
- Department of Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts 02543, United States.,MIT/WHOI Joint Program in Oceanography/Applied Ocean Science and Engineering, Department of Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts 02543, United States
| | - Elizabeth B Kujawinski
- Department of Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts 02543, United States
| |
Collapse
|
33
|
Tangaro MA, Donvito G, Antonacci M, Chiara M, Mandreoli P, Pesole G, Zambelli F. Laniakea: an open solution to provide Galaxy "on-demand" instances over heterogeneous cloud infrastructures. Gigascience 2020; 9:giaa033. [PMID: 32252069 PMCID: PMC7136032 DOI: 10.1093/gigascience/giaa033] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 03/13/2020] [Accepted: 03/17/2020] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND While the popular workflow manager Galaxy is currently made available through several publicly accessible servers, there are scenarios where users can be better served by full administrative control over a private Galaxy instance, including, but not limited to, concerns about data privacy, customisation needs, prioritisation of particular job types, tools development, and training activities. In such cases, a cloud-based Galaxy virtual instance represents an alternative that equips the user with complete control over the Galaxy instance itself without the burden of the hardware and software infrastructure involved in running and maintaining a Galaxy server. RESULTS We present Laniakea, a complete software solution to set up a "Galaxy on-demand" platform as a service. Building on the INDIGO-DataCloud software stack, Laniakea can be deployed over common cloud architectures usually supported both by public and private e-infrastructures. The user interacts with a Laniakea-based service through a simple front-end that allows a general setup of a Galaxy instance, and then Laniakea takes care of the automatic deployment of the virtual hardware and the software components. At the end of the process, the user gains access with full administrative privileges to a private, production-grade, fully customisable, Galaxy virtual instance and to the underlying virtual machine (VM). Laniakea features deployment of single-server or cluster-backed Galaxy instances, sharing of reference data across multiple instances, data volume encryption, and support for VM image-based, Docker-based, and Ansible recipe-based Galaxy deployments. A Laniakea-based Galaxy on-demand service, named Laniakea@ReCaS, is currently hosted at the ELIXIR-IT ReCaS cloud facility. CONCLUSIONS Laniakea offers to scientific e-infrastructures a complete and easy-to-use software solution to provide a Galaxy on-demand service to their users. Laniakea-based cloud services will help in making Galaxy more accessible to a broader user base by removing most of the burdens involved in deploying and running a Galaxy service. In turn, this will facilitate the adoption of Galaxy in scenarios where classic public instances do not represent an optimal solution. Finally, the implementation of Laniakea can be easily adapted and expanded to support different services and platforms beyond Galaxy.
Collapse
Affiliation(s)
- Marco Antonio Tangaro
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126 Bari, Italy
| | - Giacinto Donvito
- National Institute for Nuclear Physics (INFN), Section of Bari, Via Orabona 4, 70126 Bari, Italy
| | - Marica Antonacci
- National Institute for Nuclear Physics (INFN), Section of Bari, Via Orabona 4, 70126 Bari, Italy
| | - Matteo Chiara
- Department of Biosciences, University of Milan, via Celoria 26, 20133 Milano, Italy
| | - Pietro Mandreoli
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126 Bari, Italy
- Department of Biosciences, University of Milan, via Celoria 26, 20133 Milano, Italy
| | - Graziano Pesole
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126 Bari, Italy
- Department of Biosciences, Biotechnologies and Biopharmaceutics, University of Bari, Via Orabona 4, 70126 Bari, Italy
| | - Federico Zambelli
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126 Bari, Italy
- Department of Biosciences, University of Milan, via Celoria 26, 20133 Milano, Italy
| |
Collapse
|
34
|
Carlsson H, Abujrais S, Herman S, Khoonsari PE, Åkerfeldt T, Svenningsson A, Burman J, Kultima K. Targeted metabolomics of CSF in healthy individuals and patients with secondary progressive multiple sclerosis using high-resolution mass spectrometry. Metabolomics 2020; 16:26. [PMID: 32052189 PMCID: PMC7015966 DOI: 10.1007/s11306-020-1648-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 02/01/2020] [Indexed: 12/24/2022]
Abstract
INTRODUCTION Standardized commercial kits enable targeted metabolomics analysis and may thus provide an attractive complement to the more explorative approaches. The kits are typically developed for triple quadrupole mass spectrometers using serum and plasma. OBJECTIVES Here we measure the concentrations of preselected metabolites in cerebrospinal fluid (CSF) using a kit developed for high-resolution mass spectrometry (HRMS). Secondarily, the study aimed to investigate metabolite alterations in patients with secondary progressive multiple sclerosis (SPMS) compared to controls. METHODS We performed targeted metabolomics in human CSF on twelve SPMS patients and twelve age and sex-matched healthy controls using the Absolute IDQ-p400 kit (Biocrates Life Sciences AG) developed for HRMS. The extracts were analysed using two methods; liquid chromatography-mass spectrometry (LC-HRMS) and flow injection analysis-MS (FIA-HRMS). RESULTS Out of 408 targeted metabolites, 196 (48%) were detected above limit of detection and 35 were absolutely quantified. Metabolites analyzed using LC-HRMS had a median coefficient of variation (CV) of 3% and 2.5% between reinjections the same day and after prolonged storage, respectively. The corresponding results for the FIA-HRMS were a median CV of 27% and 21%, respectively. We found significantly (p < 0.05) elevated levels of glycine, asymmetric dimethylarginine (ADMA), glycerophospholipid PC-O (34:0) and sum of hexoses in SPMS patients compared to controls. CONCLUSION The Absolute IDQ-p400 kit could successfully be used for quantifying targeted metabolites in the CSF. Metabolites quantified using LC-HRMS showed superior reproducibility compared to FIA-HRMS.
Collapse
Affiliation(s)
- Henrik Carlsson
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala University Hospital, Entrance 61, 3rd Floor, Dag Hammarskjölds Väg 18, 751 85, Uppsala, Sweden
| | - Sandy Abujrais
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala University Hospital, Entrance 61, 3rd Floor, Dag Hammarskjölds Väg 18, 751 85, Uppsala, Sweden
| | - Stephanie Herman
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala University Hospital, Entrance 61, 3rd Floor, Dag Hammarskjölds Väg 18, 751 85, Uppsala, Sweden
| | - Payam Emami Khoonsari
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala University Hospital, Entrance 61, 3rd Floor, Dag Hammarskjölds Väg 18, 751 85, Uppsala, Sweden
| | - Torbjörn Åkerfeldt
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala University Hospital, Entrance 61, 3rd Floor, Dag Hammarskjölds Väg 18, 751 85, Uppsala, Sweden
| | - Anders Svenningsson
- Department of Clinical Sciences, Danderyd Hospital, Karolinska Institutet, Stockholm, Sweden
| | - Joachim Burman
- Department of Neuroscience, Uppsala University, Uppsala, Sweden
| | - Kim Kultima
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala University Hospital, Entrance 61, 3rd Floor, Dag Hammarskjölds Väg 18, 751 85, Uppsala, Sweden.
| |
Collapse
|
35
|
Long NP, Nghi TD, Kang YP, Anh NH, Kim HM, Park SK, Kwon SW. Toward a Standardized Strategy of Clinical Metabolomics for the Advancement of Precision Medicine. Metabolites 2020; 10:E51. [PMID: 32013105 PMCID: PMC7074059 DOI: 10.3390/metabo10020051] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 01/17/2020] [Accepted: 01/21/2020] [Indexed: 12/18/2022] Open
Abstract
Despite the tremendous success, pitfalls have been observed in every step of a clinical metabolomics workflow, which impedes the internal validity of the study. Furthermore, the demand for logistics, instrumentations, and computational resources for metabolic phenotyping studies has far exceeded our expectations. In this conceptual review, we will cover inclusive barriers of a metabolomics-based clinical study and suggest potential solutions in the hope of enhancing study robustness, usability, and transferability. The importance of quality assurance and quality control procedures is discussed, followed by a practical rule containing five phases, including two additional "pre-pre-" and "post-post-" analytical steps. Besides, we will elucidate the potential involvement of machine learning and demonstrate that the need for automated data mining algorithms to improve the quality of future research is undeniable. Consequently, we propose a comprehensive metabolomics framework, along with an appropriate checklist refined from current guidelines and our previously published assessment, in the attempt to accurately translate achievements in metabolomics into clinical and epidemiological research. Furthermore, the integration of multifaceted multi-omics approaches with metabolomics as the pillar member is in urgent need. When combining with other social or nutritional factors, we can gather complete omics profiles for a particular disease. Our discussion reflects the current obstacles and potential solutions toward the progressing trend of utilizing metabolomics in clinical research to create the next-generation healthcare system.
Collapse
Affiliation(s)
- Nguyen Phuoc Long
- College of Pharmacy, Seoul National University, Seoul 08826, Korea; (N.P.L.); (N.H.A.); (H.M.K.)
| | - Tran Diem Nghi
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea; (T.D.N.); (S.K.P.)
| | - Yun Pyo Kang
- Department of Cancer Physiology, Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA;
| | - Nguyen Hoang Anh
- College of Pharmacy, Seoul National University, Seoul 08826, Korea; (N.P.L.); (N.H.A.); (H.M.K.)
| | - Hyung Min Kim
- College of Pharmacy, Seoul National University, Seoul 08826, Korea; (N.P.L.); (N.H.A.); (H.M.K.)
| | - Sang Ki Park
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea; (T.D.N.); (S.K.P.)
| | - Sung Won Kwon
- College of Pharmacy, Seoul National University, Seoul 08826, Korea; (N.P.L.); (N.H.A.); (H.M.K.)
| |
Collapse
|
36
|
Verhoeven A, Giera M, Mayboroda OA. Scientific workflow managers in metabolomics: an overview. Analyst 2020; 145:3801-3808. [DOI: 10.1039/d0an00272k] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Metabolomics workflows for data processing reproducibility and accelerated clinical deployment.
Collapse
Affiliation(s)
- Aswin Verhoeven
- Center for Proteomics and Metabolomics
- Leiden University Medical Center
- Leiden
- The Netherlands
| | - Martin Giera
- Center for Proteomics and Metabolomics
- Leiden University Medical Center
- Leiden
- The Netherlands
| | - Oleg A. Mayboroda
- Center for Proteomics and Metabolomics
- Leiden University Medical Center
- Leiden
- The Netherlands
| |
Collapse
|
37
|
Goble C, Cohen-Boulakia S, Soiland-Reyes S, Garijo D, Gil Y, Crusoe MR, Peters K, Schober D. FAIR Computational Workflows. DATA INTELLIGENCE 2020. [DOI: 10.1162/dint_a_00033] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Computational workflows describe the complex multi-step methods that are used for data collection, data preparation, analytics, predictive modelling, and simulation that lead to new data products. They can inherently contribute to the FAIR data principles: by processing data according to established metadata; by creating metadata themselves during the processing of data; and by tracking and recording data provenance. These properties aid data quality assessment and contribute to secondary data usage. Moreover, workflows are digital objects in their own right. This paper argues that FAIR principles for workflows need to address their specific nature in terms of their composition of executable software steps, their provenance, and their development.
Collapse
Affiliation(s)
- Carole Goble
- Department of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
| | - Sarah Cohen-Boulakia
- Laboratoire de Recherche en Informatique, CNRS, Université Paris-Saclay, Batiment 650, Université Paris-Sud, 91405 ORSAY Cedex, France
| | - Stian Soiland-Reyes
- Department of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
- Common Workflow Language project, Software Freedom Conservancy, Inc. 137 Montague St STE 380, NY 11201-3548, USA
| | - Daniel Garijo
- Information Sciences Institute, University of Southern California, Marina Del Rey CA 90292, USA
| | - Yolanda Gil
- Information Sciences Institute, University of Southern California, Marina Del Rey CA 90292, USA
| | - Michael R. Crusoe
- Common Workflow Language project, Software Freedom Conservancy, Inc. 137 Montague St STE 380, NY 11201-3548, USA
| | - Kristian Peters
- Leibniz Institute of Plant Biochemistry (IPB Halle), Department of Biochemistry of Plant Interactions, Weinberg 3, 06120 Halle (Saale), Germany
| | - Daniel Schober
- Leibniz Institute of Plant Biochemistry (IPB Halle), Department of Biochemistry of Plant Interactions, Weinberg 3, 06120 Halle (Saale), Germany
| |
Collapse
|
38
|
Perez‐Riverol Y, Moreno P. Scalable Data Analysis in Proteomics and Metabolomics Using BioContainers and Workflows Engines. Proteomics 2019; 20:e1900147. [DOI: 10.1002/pmic.201900147] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 09/30/2019] [Indexed: 12/29/2022]
Affiliation(s)
- Yasset Perez‐Riverol
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI) Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UK
| | - Pablo Moreno
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI) Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UK
| |
Collapse
|
39
|
Föll MC, Moritz L, Wollmann T, Stillger MN, Vockert N, Werner M, Bronsert P, Rohr K, Grüning BA, Schilling O. Accessible and reproducible mass spectrometry imaging data analysis in Galaxy. Gigascience 2019; 8:giz143. [PMID: 31816088 PMCID: PMC6901077 DOI: 10.1093/gigascience/giz143] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Revised: 09/10/2019] [Accepted: 11/10/2019] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Mass spectrometry imaging is increasingly used in biological and translational research because it has the ability to determine the spatial distribution of hundreds of analytes in a sample. Being at the interface of proteomics/metabolomics and imaging, the acquired datasets are large and complex and often analyzed with proprietary software or in-house scripts, which hinders reproducibility. Open source software solutions that enable reproducible data analysis often require programming skills and are therefore not accessible to many mass spectrometry imaging (MSI) researchers. FINDINGS We have integrated 18 dedicated mass spectrometry imaging tools into the Galaxy framework to allow accessible, reproducible, and transparent data analysis. Our tools are based on Cardinal, MALDIquant, and scikit-image and enable all major MSI analysis steps such as quality control, visualization, preprocessing, statistical analysis, and image co-registration. Furthermore, we created hands-on training material for use cases in proteomics and metabolomics. To demonstrate the utility of our tools, we re-analyzed a publicly available N-linked glycan imaging dataset. By providing the entire analysis history online, we highlight how the Galaxy framework fosters transparent and reproducible research. CONCLUSION The Galaxy framework has emerged as a powerful analysis platform for the analysis of MSI data with ease of use and access, together with high levels of reproducibility and transparency.
Collapse
Affiliation(s)
- Melanie Christine Föll
- Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
- Faculty of Biology, University of Freiburg, Schänzlestraße 1, 79104 Freiburg, Germany
| | - Lennart Moritz
- Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
| | - Thomas Wollmann
- Biomedical Computer Vision Group, BioQuant, IPMB, Heidelberg University, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
| | - Maren Nicole Stillger
- Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
- Faculty of Biology, University of Freiburg, Schänzlestraße 1, 79104 Freiburg, Germany
- Institute of Molecular Medicine and Cell Research, Faculty of Medicine, University of Freiburg, Stefan-Meier-Straße 17, 79104 Freiburg, Germany
| | - Niklas Vockert
- Biomedical Computer Vision Group, BioQuant, IPMB, Heidelberg University, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
| | - Martin Werner
- Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
- Faculty of Medicine - University of Freiburg, Breisacher Straße 153, 79110 Freiburg, Germany
- Tumorbank Comprehensive Cancer Center Freiburg, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
- German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Hugstetter Straße 55, 79106 Freiburg, Germany
| | - Peter Bronsert
- Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
- Faculty of Medicine - University of Freiburg, Breisacher Straße 153, 79110 Freiburg, Germany
- Tumorbank Comprehensive Cancer Center Freiburg, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
- German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Hugstetter Straße 55, 79106 Freiburg, Germany
| | - Karl Rohr
- Biomedical Computer Vision Group, BioQuant, IPMB, Heidelberg University, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
| | - Björn Andreas Grüning
- Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Oliver Schilling
- Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
- Faculty of Medicine - University of Freiburg, Breisacher Straße 153, 79110 Freiburg, Germany
- German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Hugstetter Straße 55, 79106 Freiburg, Germany
| |
Collapse
|
40
|
Capuccini M, Larsson A, Carone M, Novella JA, Sadawi N, Gao J, Toor S, Spjuth O. On-demand virtual research environments using microservices. PeerJ Comput Sci 2019; 5:e232. [PMID: 33816885 PMCID: PMC7924445 DOI: 10.7717/peerj-cs.232] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Accepted: 10/10/2019] [Indexed: 06/12/2023]
Abstract
The computational demands for scientific applications are continuously increasing. The emergence of cloud computing has enabled on-demand resource allocation. However, relying solely on infrastructure as a service does not achieve the degree of flexibility required by the scientific community. Here we present a microservice-oriented methodology, where scientific applications run in a distributed orchestration platform as software containers, referred to as on-demand, virtual research environments. The methodology is vendor agnostic and we provide an open source implementation that supports the major cloud providers, offering scalable management of scientific pipelines. We demonstrate applicability and scalability of our methodology in life science applications, but the methodology is general and can be applied to other scientific domains.
Collapse
Affiliation(s)
- Marco Capuccini
- Department of Information Technology, Uppsala University, Uppsala, Sweden
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Anders Larsson
- National Bioinformatics Infrastructure Sweden, Uppsala University, Uppsala, Sweden
| | - Matteo Carone
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Jon Ander Novella
- National Bioinformatics Infrastructure Sweden, Uppsala University, Uppsala, Sweden
| | - Noureddin Sadawi
- Department of Surgery and Cancer, Imperial College London, London, United Kingdom
| | - Jianliang Gao
- Department of Surgery and Cancer, Imperial College London, London, United Kingdom
| | - Salman Toor
- Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| |
Collapse
|
41
|
Cardoso S, Afonso T, Maraschin M, Rocha M. WebSpecmine: A Website for Metabolomics Data Analysis and Mining. Metabolites 2019; 9:metabo9100237. [PMID: 31635085 PMCID: PMC6835413 DOI: 10.3390/metabo9100237] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Revised: 10/09/2019] [Accepted: 10/15/2019] [Indexed: 11/16/2022] Open
Abstract
Metabolomics data analysis is an important task in biomedical research. The available tools do not provide a wide variety of methods and data types, nor ways to store and share data and results generated. Thus, we have developed WebSpecmine to overcome the aforementioned limitations. WebSpecmine is a web-based application designed to perform the analysis of metabolomics data based on spectroscopic and chromatographic techniques (NMR, Infrared, UV-visible, and Raman, and LC/GC-MS) and compound concentrations. Users, even those not possessing programming skills, can access several analysis methods including univariate, unsupervised and supervised multivariate statistical analysis, as well as metabolite identification and pathway analysis, also being able to create accounts to store their data and results, either privately or publicly. The tool's implementation is based in the R project, including its shiny web-based framework. Webspecmine is freely available, supporting all major browsers. We provide abundant documentation, including tutorials and a user guide with case studies.
Collapse
Affiliation(s)
- Sara Cardoso
- CEB-Centre Biological Engineering, University of Minho, 4710-057 Braga, Portugal.
| | - Telma Afonso
- CEB-Centre Biological Engineering, University of Minho, 4710-057 Braga, Portugal.
| | - Marcelo Maraschin
- Plant Morphogenesis and Biochemistry Laboratory, Federal University of Santa Catarina, Florianópolis SC 88040-900, Brazil.
| | - Miguel Rocha
- CEB-Centre Biological Engineering, University of Minho, 4710-057 Braga, Portugal.
| |
Collapse
|
42
|
Mendez KM, Pritchard L, Reinke SN, Broadhurst DI. Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing. Metabolomics 2019; 15:125. [PMID: 31522294 PMCID: PMC6745024 DOI: 10.1007/s11306-019-1588-0] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Accepted: 09/07/2019] [Indexed: 12/20/2022]
Abstract
BACKGROUND A lack of transparency and reporting standards in the scientific community has led to increasing and widespread concerns relating to reproduction and integrity of results. As an omics science, which generates vast amounts of data and relies heavily on data science for deriving biological meaning, metabolomics is highly vulnerable to irreproducibility. The metabolomics community has made substantial efforts to align with FAIR data standards by promoting open data formats, data repositories, online spectral libraries, and metabolite databases. Open data analysis platforms also exist; however, they tend to be inflexible and rely on the user to adequately report their methods and results. To enable FAIR data science in metabolomics, methods and results need to be transparently disseminated in a manner that is rapid, reusable, and fully integrated with the published work. To ensure broad use within the community such a framework also needs to be inclusive and intuitive for both computational novices and experts alike. AIM OF REVIEW To encourage metabolomics researchers from all backgrounds to take control of their own data science, mould it to their personal requirements, and enthusiastically share resources through open science. KEY SCIENTIFIC CONCEPTS OF REVIEW This tutorial introduces the concept of interactive web-based computational laboratory notebooks. The reader is guided through a set of experiential tutorials specifically targeted at metabolomics researchers, based around the Jupyter Notebook web application, GitHub data repository, and Binder cloud computing platform.
Collapse
Affiliation(s)
- Kevin M Mendez
- Centre for Metabolomics & Computational Biology, School of Science, Edith Cowan University, Joondalup, 6027, Australia
| | - Leighton Pritchard
- Strathclyde Institute of Pharmacy & Biomedical Sciences, University of Strathclyde, Cathedral Street, Glasgow, G1 1XQ, Scotland, UK
| | - Stacey N Reinke
- Centre for Metabolomics & Computational Biology, School of Science, Edith Cowan University, Joondalup, 6027, Australia.
| | - David I Broadhurst
- Centre for Metabolomics & Computational Biology, School of Science, Edith Cowan University, Joondalup, 6027, Australia.
| |
Collapse
|
43
|
Playdon MC, Joshi AD, Tabung FK, Cheng S, Henglin M, Kim A, Lin T, van Roekel EH, Huang J, Krumsiek J, Wang Y, Mathé E, Temprosa M, Moore S, Chawes B, Eliassen AH, Gsur A, Gunter MJ, Harada S, Langenberg C, Oresic M, Perng W, Seow WJ, Zeleznik OA. Metabolomics Analytics Workflow for Epidemiological Research: Perspectives from the Consortium of Metabolomics Studies (COMETS). Metabolites 2019; 9:E145. [PMID: 31319517 PMCID: PMC6681081 DOI: 10.3390/metabo9070145] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 06/28/2019] [Accepted: 07/04/2019] [Indexed: 12/13/2022] Open
Abstract
The application of metabolomics technology to epidemiological studies is emerging as a new approach to elucidate disease etiology and for biomarker discovery. However, analysis of metabolomics data is complex and there is an urgent need for the standardization of analysis workflow and reporting of study findings. To inform the development of such guidelines, we conducted a survey of 47 cohort representatives from the Consortium of Metabolomics Studies (COMETS) to gain insights into the current strategies and procedures used for analyzing metabolomics data in epidemiological studies worldwide. The results indicated a variety of applied analytical strategies, from biospecimen and data pre-processing and quality control to statistical analysis and reporting of study findings. These strategies included methods commonly used within the metabolomics community and applied in epidemiological research, as well as novel approaches to pre-processing pipelines and data analysis. To help with these discrepancies, we propose use of open-source initiatives such as the online web-based tool COMETS Analytics, which includes helpful tools to guide analytical workflow and the standardized reporting of findings from metabolomics analyses within epidemiological studies. Ultimately, this will improve the quality of statistical analyses, research findings, and study reproducibility.
Collapse
Affiliation(s)
- Mary C Playdon
- Department of Nutrition and Integrative Physiology, College of Health, University of Utah, Salt Lake City, UT 84112, USA.
- Division of Cancer Population Sciences, Huntsman Cancer Institute, Salt Lake City, UT 84112, USA.
| | - Amit D Joshi
- Clinical and Translational Epidemiology Unit, Mongan Institute, Massachusetts General Hospital, Boston, MA 02114, USA
- Division of Gastroenterology, Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Genetic Epidemiology and Statistical Genetics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Fred K Tabung
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University College of Medicine, Columbus, OH 43210, USA
- The Ohio State University Comprehensive Cancer Center, Arthur G. James Cancer Hospital and Richard J. Solove Research Institute, Columbus, OH 43210, USA
- Division of Epidemiology, The Ohio State University College of Public Health, Columbus, OH 43210, USA
| | - Susan Cheng
- Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Mir Henglin
- Cardiovascular Division, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Andy Kim
- Cardiovascular Division, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Tengda Lin
- Division of Cancer Population Sciences, Huntsman Cancer Institute, Salt Lake City, UT 84112, USA
- Department of Population Health Sciences, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
| | - Eline H van Roekel
- Department of Epidemiology, GROW School for Oncology and Developmental Biology, Maastricht University, 6200 MD Maastricht, The Netherlands
| | - Jiaqi Huang
- Division of Cancer Epidemiology and Genetics, Metabolic Epidemiology Branch, National Cancer Institute, Rockville, MD 20850, USA
| | - Jan Krumsiek
- Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10021, USA
| | - Ying Wang
- Behavioral and Epidemiology Research Group, American Cancer Society, Atlanta, GA 30303, USA
| | - Ewy Mathé
- College of Medicine, Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
| | - Marinella Temprosa
- Department of Epidemiology and Biostatistics, Milken Institute School of Public Health, George Washington University, Washington, DC 20052, USA
| | - Steven Moore
- Division of Cancer Epidemiology and Genetics, Metabolic Epidemiology Branch, National Cancer Institute, Rockville, MD 20850, USA
| | - Bo Chawes
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, 1165 Copenhagen, Denmark
| | - A Heather Eliassen
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Andrea Gsur
- Institute of Cancer Research, Department of Medicine, Medical University of Vienna, 1090 Vienna, Austria
| | - Marc J Gunter
- Section of Nutrition and Metabolism, International Agency for Research on Cancer, World Health Organization, 69008 Lyon, France
| | - Sei Harada
- Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo 160-8582, Japan
| | - Claudia Langenberg
- MRC Epidemiology Unit, Public Health, University of Cambridge, Cambridge CB2 1 TN, UK
- The Francis Crick Institute, London NW1 1ST, UK
| | - Matej Oresic
- Turku Centre for Biotechnology, University of Turku, 20500 Turku, Finland
- School of Medical Sciences, Örebro University, 702 81 Örebro, Sweden
| | - Wei Perng
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO 80045, USA
- Life course epidemiology of adiposity and diabetes (LEAD) Center, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Wei Jie Seow
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore 117549, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore 119228, Singapore
| | - Oana A Zeleznik
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
44
|
Abbiss H, Maker GL, Trengove RD. Metabolomics Approaches for the Diagnosis and Understanding of Kidney Diseases. Metabolites 2019; 9:E34. [PMID: 30769897 PMCID: PMC6410198 DOI: 10.3390/metabo9020034] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 01/29/2019] [Accepted: 02/05/2019] [Indexed: 02/07/2023] Open
Abstract
Diseases of the kidney are difficult to diagnose and treat. This review summarises the definition, cause, epidemiology and treatment of some of these diseases including chronic kidney disease, diabetic nephropathy, acute kidney injury, kidney cancer, kidney transplantation and polycystic kidney diseases. Numerous studies have adopted a metabolomics approach to uncover new small molecule biomarkers of kidney diseases to improve specificity and sensitivity of diagnosis and to uncover biochemical mechanisms that may elucidate the cause and progression of these diseases. This work includes a description of mass spectrometry-based metabolomics approaches, including some of the currently available tools, and emphasises findings from metabolomics studies of kidney diseases. We have included a varied selection of studies (disease, model, sample number, analytical platform) and focused on metabolites which were commonly reported as discriminating features between kidney disease and a control. These metabolites are likely to be robust indicators of kidney disease processes, and therefore potential biomarkers, warranting further investigation.
Collapse
Affiliation(s)
- Hayley Abbiss
- School of Veterinary and Life Sciences, Murdoch University, 90 South Street, Perth 6150, Australia.
- Separation Science and Metabolomics Laboratory, Murdoch University, 90 South Street, Perth 6150, Australia.
| | - Garth L Maker
- School of Veterinary and Life Sciences, Murdoch University, 90 South Street, Perth 6150, Australia.
- Separation Science and Metabolomics Laboratory, Murdoch University, 90 South Street, Perth 6150, Australia.
| | - Robert D Trengove
- Separation Science and Metabolomics Laboratory, Murdoch University, 90 South Street, Perth 6150, Australia.
- Metabolomics Australia, Murdoch University Node, Murdoch University, 90 South Street, Perth 6150, Australia.
| |
Collapse
|