1
|
Schiml VC, Delogu F, Kumar P, Kunath B, Batut B, Mehta S, Johnson JE, Grüning B, Pope PB, Jagtap PD, Griffin TJ, Arntzen MØ. Integrative meta-omics in Galaxy and beyond. ENVIRONMENTAL MICROBIOME 2023; 18:56. [PMID: 37420292 PMCID: PMC10329324 DOI: 10.1186/s40793-023-00514-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 07/05/2023] [Indexed: 07/09/2023]
Abstract
BACKGROUND 'Omics methods have empowered scientists to tackle the complexity of microbial communities on a scale not attainable before. Individually, omics analyses can provide great insight; while combined as "meta-omics", they enhance the understanding of which organisms occupy specific metabolic niches, how they interact, and how they utilize environmental nutrients. Here we present three integrative meta-omics workflows, developed in Galaxy, for enhanced analysis and integration of metagenomics, metatranscriptomics, and metaproteomics, combined with our newly developed web-application, ViMO (Visualizer for Meta-Omics) to analyse metabolisms in complex microbial communities. RESULTS In this study, we applied the workflows on a highly efficient cellulose-degrading minimal consortium enriched from a biogas reactor to analyse the key roles of uncultured microorganisms in complex biomass degradation processes. Metagenomic analysis recovered metagenome-assembled genomes (MAGs) for several constituent populations including Hungateiclostridium thermocellum, Thermoclostridium stercorarium and multiple heterogenic strains affiliated to Coprothermobacter proteolyticus. The metagenomics workflow was developed as two modules, one standard, and one optimized for improving the MAG quality in complex samples by implementing a combination of single- and co-assembly, and dereplication after binning. The exploration of the active pathways within the recovered MAGs can be visualized in ViMO, which also provides an overview of the MAG taxonomy and quality (contamination and completeness), and information about carbohydrate-active enzymes (CAZymes), as well as KEGG annotations and pathways, with counts and abundances at both mRNA and protein level. To achieve this, the metatranscriptomic reads and metaproteomic mass-spectrometry spectra are mapped onto predicted genes from the metagenome to analyse the functional potential of MAGs, as well as the actual expressed proteins and functions of the microbiome, all visualized in ViMO. CONCLUSION Our three workflows for integrative meta-omics in combination with ViMO presents a progression in the analysis of 'omics data, particularly within Galaxy, but also beyond. The optimized metagenomics workflow allows for detailed reconstruction of microbial community consisting of MAGs with high quality, and thus improves analyses of the metabolism of the microbiome, using the metatranscriptomics and metaproteomics workflows.
Collapse
Affiliation(s)
- Valerie C Schiml
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), P.O. Box 5003, 1432, Ås, Norway
| | - Francesco Delogu
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), P.O. Box 5003, 1432, Ås, Norway
| | - Praveen Kumar
- Department of Biochemistry, Biophysics and Molecular Biology, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Benoit Kunath
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), P.O. Box 5003, 1432, Ås, Norway
| | - Bérénice Batut
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Subina Mehta
- Department of Biochemistry, Biophysics and Molecular Biology, University of Minnesota, Minneapolis, MN, 55455, USA
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Phillip B Pope
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), P.O. Box 5003, 1432, Ås, Norway
- Faculty of Biosciences, Norwegian University of Life Sciences (NMBU), P.O. Box 5003, 1432, Ås, Norway
| | - Pratik D Jagtap
- Department of Biochemistry, Biophysics and Molecular Biology, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Timothy J Griffin
- Department of Biochemistry, Biophysics and Molecular Biology, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Magnus Ø Arntzen
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), P.O. Box 5003, 1432, Ås, Norway.
| |
Collapse
|
2
|
Mehta S, Bernt M, Chambers M, Fahrner M, Föll MC, Gruening B, Horro C, Johnson JE, Loux V, Rajczewski AT, Schilling O, Vandenbrouck Y, Gustafsson OJR, Thang WCM, Hyde C, Price G, Jagtap PD, Griffin TJ. A Galaxy of informatics resources for MS-based proteomics. Expert Rev Proteomics 2023; 20:251-266. [PMID: 37787106 DOI: 10.1080/14789450.2023.2265062] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/06/2023] [Indexed: 10/04/2023]
Abstract
INTRODUCTION Continuous advances in mass spectrometry (MS) technologies have enabled deeper and more reproducible proteome characterization and a better understanding of biological systems when integrated with other 'omics data. Bioinformatic resources meeting the analysis requirements of increasingly complex MS-based proteomic data and associated multi-omic data are critically needed. These requirements included availability of software that would span diverse types of analyses, scalability for large-scale, compute-intensive applications, and mechanisms to ease adoption of the software. AREAS COVERED The Galaxy ecosystem meets these requirements by offering a multitude of open-source tools for MS-based proteomics analyses and applications, all in an adaptable, scalable, and accessible computing environment. A thriving global community maintains these software and associated training resources to empower researcher-driven analyses. EXPERT OPINION The community-supported Galaxy ecosystem remains a crucial contributor to basic biological and clinical studies using MS-based proteomics. In addition to the current status of Galaxy-based resources, we describe ongoing developments for meeting emerging challenges in MS-based proteomic informatics. We hope this review will catalyze increased use of Galaxy by researchers employing MS-based proteomics and inspire software developers to join the community and implement new tools, workflows, and associated training content that will add further value to this already rich ecosystem.
Collapse
Affiliation(s)
- Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Matthias Bernt
- Helmholtz Centre for Environmental Research - UFZ, Department Computational Biology, Leipzig, Germany
| | | | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Melanie Christine Föll
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Bjoern Gruening
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Carlos Horro
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
| | - Valentin Loux
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
- Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, Jouy-en-Josas, France
| | - Andrew T Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | | | - W C Mike Thang
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Cameron Hyde
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Sippy Downs, University of the Sunshine Coast, Australia
| | - Gareth Price
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
3
|
Chen JW, Shrestha L, Green G, Leier A, Marquez-Lago TT. The hitchhikers' guide to RNA sequencing and functional analysis. Brief Bioinform 2023; 24:bbac529. [PMID: 36617463 PMCID: PMC9851315 DOI: 10.1093/bib/bbac529] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 10/18/2022] [Accepted: 11/07/2022] [Indexed: 01/10/2023] Open
Abstract
DNA and RNA sequencing technologies have revolutionized biology and biomedical sciences, sequencing full genomes and transcriptomes at very high speeds and reasonably low costs. RNA sequencing (RNA-Seq) enables transcript identification and quantification, but once sequencing has concluded researchers can be easily overwhelmed with questions such as how to go from raw data to differential expression (DE), pathway analysis and interpretation. Several pipelines and procedures have been developed to this effect. Even though there is no unique way to perform RNA-Seq analysis, it usually follows these steps: 1) raw reads quality check, 2) alignment of reads to a reference genome, 3) aligned reads' summarization according to an annotation file, 4) DE analysis and 5) gene set analysis and/or functional enrichment analysis. Each step requires researchers to make decisions, and the wide variety of options and resulting large volumes of data often lead to interpretation challenges. There also seems to be insufficient guidance on how best to obtain relevant information and derive actionable knowledge from transcription experiments. In this paper, we explain RNA-Seq steps in detail and outline differences and similarities of different popular options, as well as advantages and disadvantages. We also discuss non-coding RNA analysis, multi-omics, meta-transcriptomics and the use of artificial intelligence methods complementing the arsenal of tools available to researchers. Lastly, we perform a complete analysis from raw reads to DE and functional enrichment analysis, visually illustrating how results are not absolute truths and how algorithmic decisions can greatly impact results and interpretation.
Collapse
Affiliation(s)
- Jiung-Wen Chen
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Lisa Shrestha
- Department of Genetics, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
| | - George Green
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - André Leier
- Department of Genetics, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
| | - Tatiana T Marquez-Lago
- Department of Genetics, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
- Department of Microbiology, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
| |
Collapse
|
4
|
Hiltemann S, Rasche H, Gladman S, Hotz HR, Larivière D, Blankenberg D, Jagtap PD, Wollmann T, Bretaudeau A, Goué N, Griffin TJ, Royaux C, Le Bras Y, Mehta S, Syme A, Coppens F, Droesbeke B, Soranzo N, Bacon W, Psomopoulos F, Gallardo-Alba C, Davis J, Föll MC, Fahrner M, Doyle MA, Serrano-Solano B, Fouilloux AC, van Heusden P, Maier W, Clements D, Heyl F, Grüning B, Batut B. Galaxy Training: A powerful framework for teaching! PLoS Comput Biol 2023; 19:e1010752. [PMID: 36622853 PMCID: PMC9829167 DOI: 10.1371/journal.pcbi.1010752] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics, and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments.
Collapse
Affiliation(s)
- Saskia Hiltemann
- Clinical Bioinformatics Group, Department of Pathology, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Helena Rasche
- Clinical Bioinformatics Group, Department of Pathology, Erasmus Medical Center, Rotterdam, the Netherlands
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
- Academie voor de Technologie van Gezondheid en Milieu, Avans Hogeschool, Breda, the Netherlands
| | - Simon Gladman
- Melbourne Bioinformatics, University of Melbourne, Parkville, Victoria, Australia
| | - Hans-Rudolf Hotz
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Delphine Larivière
- Eberly College of Science, Pennsylvania State University, State College, Pennsylvania, United States of America
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Thomas Wollmann
- Biomedical Computer Vision Group, BioQuant, IPMB, Heidelberg University, Heidelberg, Germany
| | - Anthony Bretaudeau
- IGEPP, INRAE, Institut Agro, Univ Rennes, Rennes, France
- GenOuest Core Facility, Univ Rennes, Inria, CNRS, IRISA, Rennes, France
| | - Nadia Goué
- AuBi, Mesocenter, Clermont Auvergne University, Aubière, France
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Coline Royaux
- PNDB French Biodiversity e-infrastructure, PatriNat, French Museum of Natural History, Concarneau, France
| | - Yvan Le Bras
- PNDB French Biodiversity e-infrastructure, PatriNat, French Museum of Natural History, Concarneau, France
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Anna Syme
- Melbourne Bioinformatics, University of Melbourne, Parkville, Victoria, Australia
| | - Frederik Coppens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Bert Droesbeke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Nicola Soranzo
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | - Wendi Bacon
- School of Life, Health and Chemical Sciences, The Open University, Milton Keynes, United Kingdom
| | - Fotis Psomopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Cristóbal Gallardo-Alba
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - John Davis
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Melanie Christine Föll
- Institute for Surgical Pathology, Medical Center—University of Freiburg, Faculty of Medicine, Freiburg, Germany
- Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts, United States of America
- German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Freiburg, Germany
| | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center—University of Freiburg, Faculty of Medicine, Freiburg, Germany
- German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Freiburg, Germany
| | - Maria A. Doyle
- Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, Victoria, Australia
| | - Beatriz Serrano-Solano
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
- ELIXIR Germany, Institute of Bio- and Geosciences (IBG-5)—Computational Metagenomics, Forschungszentrum Jülich GmbH, Jülich, Germany
- Euro-BioImaging ERIC Bio-Hub, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | | | - Peter van Heusden
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa
| | - Wolfgang Maier
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Dave Clements
- Anaconda Inc, Austin, Texas, United States of America
| | - Florian Heyl
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | | | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Bérénice Batut
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| |
Collapse
|
5
|
Thind AS, Monga I, Thakur PK, Kumari P, Dindhoria K, Krzak M, Ranson M, Ashford B. Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology. Brief Bioinform 2021; 22:6330938. [PMID: 34329375 DOI: 10.1093/bib/bbab259] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Revised: 06/14/2021] [Accepted: 06/18/2021] [Indexed: 12/13/2022] Open
Abstract
Significant innovations in next-generation sequencing techniques and bioinformatics tools have impacted our appreciation and understanding of RNA. Practical RNA sequencing (RNA-Seq) applications have evolved in conjunction with sequence technology and bioinformatic tools advances. In most projects, bulk RNA-Seq data is used to measure gene expression patterns, isoform expression, alternative splicing and single-nucleotide polymorphisms. However, RNA-Seq holds far more hidden biological information including details of copy number alteration, microbial contamination, transposable elements, cell type (deconvolution) and the presence of neoantigens. Recent novel and advanced bioinformatic algorithms developed the capacity to retrieve this information from bulk RNA-Seq data, thus broadening its scope. The focus of this review is to comprehend the emerging bulk RNA-Seq-based analyses, emphasizing less familiar and underused applications. In doing so, we highlight the power of bulk RNA-Seq in providing biological insights.
Collapse
Affiliation(s)
- Amarinder Singh Thind
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| | - Isha Monga
- Columbia University, New York City, NY, USA
| | | | - Pallawi Kumari
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Kiran Dindhoria
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | | | - Marie Ranson
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| | - Bruce Ashford
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| |
Collapse
|