1
|
Mehta S, Bernt M, Chambers M, Fahrner M, Föll MC, Gruening B, Horro C, Johnson JE, Loux V, Rajczewski AT, Schilling O, Vandenbrouck Y, Gustafsson OJR, Thang WCM, Hyde C, Price G, Jagtap PD, Griffin TJ. A Galaxy of informatics resources for MS-based proteomics. Expert Rev Proteomics 2023; 20:251-266. [PMID: 37787106 DOI: 10.1080/14789450.2023.2265062] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/06/2023] [Indexed: 10/04/2023]
Abstract
INTRODUCTION Continuous advances in mass spectrometry (MS) technologies have enabled deeper and more reproducible proteome characterization and a better understanding of biological systems when integrated with other 'omics data. Bioinformatic resources meeting the analysis requirements of increasingly complex MS-based proteomic data and associated multi-omic data are critically needed. These requirements included availability of software that would span diverse types of analyses, scalability for large-scale, compute-intensive applications, and mechanisms to ease adoption of the software. AREAS COVERED The Galaxy ecosystem meets these requirements by offering a multitude of open-source tools for MS-based proteomics analyses and applications, all in an adaptable, scalable, and accessible computing environment. A thriving global community maintains these software and associated training resources to empower researcher-driven analyses. EXPERT OPINION The community-supported Galaxy ecosystem remains a crucial contributor to basic biological and clinical studies using MS-based proteomics. In addition to the current status of Galaxy-based resources, we describe ongoing developments for meeting emerging challenges in MS-based proteomic informatics. We hope this review will catalyze increased use of Galaxy by researchers employing MS-based proteomics and inspire software developers to join the community and implement new tools, workflows, and associated training content that will add further value to this already rich ecosystem.
Collapse
Affiliation(s)
- Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Matthias Bernt
- Helmholtz Centre for Environmental Research - UFZ, Department Computational Biology, Leipzig, Germany
| | | | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Melanie Christine Föll
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Bjoern Gruening
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Carlos Horro
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
| | - Valentin Loux
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
- Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, Jouy-en-Josas, France
| | - Andrew T Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | | | - W C Mike Thang
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Cameron Hyde
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Sippy Downs, University of the Sunshine Coast, Australia
| | - Gareth Price
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
2
|
Iacovacci J, Lin W, Griffin JL, Glen RC. IonFlow: a galaxy tool for the analysis of ionomics data sets. Metabolomics 2021; 17:91. [PMID: 34562172 PMCID: PMC8464566 DOI: 10.1007/s11306-021-01841-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Accepted: 09/13/2021] [Indexed: 10/28/2022]
Abstract
INTRODUCTION Inductively coupled plasma mass spectrometry (ICP-MS) experiments generate complex multi-dimensional data sets that require specialist data analysis tools. OBJECTIVE Here we describe tools to facilitate analysis of the ionome composed of high-throughput elemental profiling data. METHODS IonFlow is a Galaxy tool written in R for ionomics data analysis and is freely accessible at https://github.com/wanchanglin/ionflow . It is designed as a pipeline that can process raw data to enable exploration and interpretation using multivariate statistical techniques and network-based algorithms, including principal components analysis, hierarchical clustering, relevance network extraction and analysis, and gene set enrichment analysis. RESULTS AND CONCLUSION The pipeline is described and tested on two benchmark data sets of the haploid S. Cerevisiae ionome and of the human HeLa cell ionome.
Collapse
Affiliation(s)
- J Iacovacci
- Department of Metabolism Digestion and Reproduction, Faculty of Medicine, Imperial College London, London, UK.
- Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, UK.
| | - W Lin
- Department of Metabolism Digestion and Reproduction, Faculty of Medicine, Imperial College London, London, UK
| | - J L Griffin
- Department of Metabolism Digestion and Reproduction, Faculty of Medicine, Imperial College London, London, UK
- Department of Biochemistry and Systems Biology Centre, University of Cambridge, Cambridge, UK
| | - R C Glen
- Department of Metabolism Digestion and Reproduction, Faculty of Medicine, Imperial College London, London, UK.
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Cambridge, UK.
| |
Collapse
|
3
|
Mirela-Bota P, Aguirre-Plans J, Meseguer A, Galletti C, Segura J, Planas-Iglesias J, Garcia-Garcia J, Guney E, Oliva B, Fernandez-Fuentes N. Galaxy InteractoMIX: An Integrated Computational Platform for the Study of Protein-Protein Interaction Data. J Mol Biol 2021; 433:166656. [PMID: 32976910 DOI: 10.1016/j.jmb.2020.09.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 08/30/2020] [Accepted: 09/16/2020] [Indexed: 12/19/2022]
Abstract
Protein interactions play a crucial role among the different functions of a cell and are central to our understanding of cellular processes both in health and disease. Here we present Galaxy InteractoMIX (http://galaxy.interactomix.com), a platform composed of 13 different computational tools each addressing specific aspects of the study of protein-protein interactions, ranging from large-scale cross-species protein-wide interactomes to atomic resolution level of protein complexes. Galaxy InteractoMIX provides an intuitive interface where users can retrieve consolidated interactomics data distributed across several databases or uncover links between diseases and genes by analyzing the interactomes underlying these diseases. The platform makes possible large-scale prediction and curation protein interactions using the conservation of motifs, interology, or presence or absence of key sequence signatures. The range of structure-based tools includes modeling and analysis of protein complexes, delineation of interfaces and the modeling of peptides acting as inhibitors of protein-protein interactions. Galaxy InteractoMIX includes a range of ready-to-use workflows to run complex analyses requiring minimal intervention by users. The potential range of applications of the platform covers different aspects of life science, biomedicine, biotechnology and drug discovery where protein associations are studied.
Collapse
|
4
|
Su SY, Lu IH, Cheng WC, Chung WC, Chen PY, Ho JM, Chen SH, Lin CY. EpiMOLAS: an intuitive web-based framework for genome-wide DNA methylation analysis. BMC Genomics 2020; 21:163. [PMID: 32241255 PMCID: PMC7114791 DOI: 10.1186/s12864-019-6404-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Accepted: 12/16/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND DNA methylation is a crucial epigenomic mechanism in various biological processes. Using whole-genome bisulfite sequencing (WGBS) technology, methylated cytosine sites can be revealed at the single nucleotide level. However, the WGBS data analysis process is usually complicated and challenging. RESULTS To alleviate the associated difficulties, we integrated the WGBS data processing steps and downstream analysis into a two-phase approach. First, we set up the required tools in Galaxy and developed workflows to calculate the methylation level from raw WGBS data and generate a methylation status summary, the mtable. This computation environment is wrapped into the Docker container image DocMethyl, which allows users to rapidly deploy an executable environment without tedious software installation and library dependency problems. Next, the mtable files were uploaded to the web server EpiMOLAS_web to link with the gene annotation databases that enable rapid data retrieval and analyses. CONCLUSION To our knowledge, the EpiMOLAS framework, consisting of DocMethyl and EpiMOLAS_web, is the first approach to include containerization technology and a web-based system for WGBS data analysis from raw data processing to downstream analysis. EpiMOLAS will help users cope with their WGBS data and also conduct reproducible analyses of publicly available data, thereby gaining insights into the mechanisms underlying complex biological phenomenon. The Galaxy Docker image DocMethyl is available at https://hub.docker.com/r/lsbnb/docmethyl/. EpiMOLAS_web is publicly accessible at http://symbiosis.iis.sinica.edu.tw/epimolas/.
Collapse
Affiliation(s)
- Sheng-Yao Su
- Taiwan International Graduate Program (TIGP) on Bioinformatics, Academia Sinica, Taipei, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - I-Hsuan Lu
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Wen-Chih Cheng
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan Miaoli, Taiwan
| | - Wei-Chun Chung
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Pao-Yang Chen
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
| | - Jan-Ming Ho
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Shu-Hwa Chen
- TMU Research Center of Cancer Translational Medicine, Taipei Medical University, Taipei, Taiwan
| | - Chung-Yen Lin
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan Miaoli, Taiwan
- Institute of Fisheries Science, College of Life Science, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
5
|
Schmidt L, Werner S, Kemmer T, Niebler S, Kristen M, Ayadi L, Johe P, Marchand V, Schirmeister T, Motorin Y, Hildebrandt A, Schmidt B, Helm M. Graphical Workflow System for Modification Calling by Machine Learning of Reverse Transcription Signatures. Front Genet 2019; 10:876. [PMID: 31608115 PMCID: PMC6774277 DOI: 10.3389/fgene.2019.00876] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 08/21/2019] [Indexed: 01/28/2023] Open
Abstract
Modification mapping from cDNA data has become a tremendously important approach in epitranscriptomics. So-called reverse transcription signatures in cDNA contain information on the position and nature of their causative RNA modifications. Data mining of, e.g. Illumina-based high-throughput sequencing data, is therefore fast growing in importance, and the field is still lacking effective tools. Here we present a versatile user-friendly graphical workflow system for modification calling based on machine learning. The workflow commences with a principal module for trimming, mapping, and postprocessing. The latter includes a quantification of mismatch and arrest rates with single-nucleotide resolution across the mapped transcriptome. Further downstream modules include tools for visualization, machine learning, and modification calling. From the machine-learning module, quality assessment parameters are provided to gauge the suitability of the initial dataset for effective machine learning and modification calling. This output is useful to improve the experimental parameters for library preparation and sequencing. In summary, the automation of the bioinformatics workflow allows a faster turnaround of the optimization cycles in modification calling.
Collapse
Affiliation(s)
- Lukas Schmidt
- Institute of Pharmacy and Biochemistry, Johannes Gutenberg-University, Mainz, Germany
| | - Stephan Werner
- Institute of Pharmacy and Biochemistry, Johannes Gutenberg-University, Mainz, Germany
| | - Thomas Kemmer
- Institute of Computer Science, Scientific Computing and Bioinformatics, Johannes Gutenberg-University, Mainz, Germany
| | - Stefan Niebler
- Institute of Computer Science, High Performance Computing, Johannes Gutenberg-University, Mainz, Germany
| | - Marco Kristen
- Institute of Pharmacy and Biochemistry, Johannes Gutenberg-University, Mainz, Germany
| | - Lilia Ayadi
- Next-Generation Sequencing Core Facility UMS2008 IBSLor CNRS-UL-INSERM, Biopôle, University of Lorraine, Vandœuvre-lès-Nancy, France.,IMoPA UMR7365 CNRS-UL, Biopôle, University of Lorraine, Vandœuvre-lès-Nancy, France
| | - Patrick Johe
- Institute of Pharmacy and Biochemistry, Johannes Gutenberg-University, Mainz, Germany
| | - Virginie Marchand
- Next-Generation Sequencing Core Facility UMS2008 IBSLor CNRS-UL-INSERM, Biopôle, University of Lorraine, Vandœuvre-lès-Nancy, France
| | - Tanja Schirmeister
- Institute of Pharmacy and Biochemistry, Johannes Gutenberg-University, Mainz, Germany
| | - Yuri Motorin
- Next-Generation Sequencing Core Facility UMS2008 IBSLor CNRS-UL-INSERM, Biopôle, University of Lorraine, Vandœuvre-lès-Nancy, France.,IMoPA UMR7365 CNRS-UL, Biopôle, University of Lorraine, Vandœuvre-lès-Nancy, France
| | - Andreas Hildebrandt
- Institute of Computer Science, Scientific Computing and Bioinformatics, Johannes Gutenberg-University, Mainz, Germany
| | - Bertil Schmidt
- Institute of Computer Science, High Performance Computing, Johannes Gutenberg-University, Mainz, Germany
| | - Mark Helm
- Institute of Pharmacy and Biochemistry, Johannes Gutenberg-University, Mainz, Germany
| |
Collapse
|
6
|
Vekris A, Pilalis E, Chatziioannou A, Petry KG. A Computational Pipeline for the Extraction of Actionable Biological Information From NGS-Phage Display Experiments. Front Physiol 2019; 10:1160. [PMID: 31607941 PMCID: PMC6769401 DOI: 10.3389/fphys.2019.01160] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Accepted: 08/28/2019] [Indexed: 12/20/2022] Open
Abstract
Phage Display is a powerful method for the identification of peptide binding to targets of variable complexities and tissues, from unique molecules to the internal surfaces of vessels of living organisms. Particularly for in vivo screenings, the resulting repertoires can be very complex and difficult to study with traditional approaches. Next Generation Sequencing (NGS) opened the possibility to acquire high resolution overviews of such repertoires and thus facilitates the identification of binders of interest. Additionally, the ever-increasing amount of available genome/proteome information became satisfactory regarding the identification of putative mimicked proteins, due to the large scale on which partial sequence homology is assessed. However, the subsequent production of massive data stresses the need for high-performance computational approaches in order to perform standardized and insightful molecular network analysis. Systems-level analysis is essential for efficient resolution of the underlying molecular complexity and the extraction of actionable interpretation, in terms of systemic biological processes and pathways that are systematically perturbed. In this work we introduce PepSimili, an integrated workflow tool, which performs mapping of massive peptide repertoires on whole proteomes and delivers a streamlined, systems-level biological interpretation. The tool employs modules for modeling and filtering of background noise due to random mappings and amplifies the biologically meaningful signal through coupling with BioInfoMiner, a systems interpretation tool that employs graph-theoretic methods for prioritization of systemic processes and corresponding driver genes. The current implementation exploits the Galaxy environment and is available online. A case study using public data is presented, with and without a control selection.
Collapse
Affiliation(s)
| | - Eleftherios Pilalis
- Metabolic Engineering and Bioinformatics Program, Institute of Chemical Biology, National Hellenic Research Foundation, Athens, Greece.,eNIOS Applications P.C., Athens, Greece
| | - Aristotelis Chatziioannou
- Metabolic Engineering and Bioinformatics Program, Institute of Chemical Biology, National Hellenic Research Foundation, Athens, Greece.,eNIOS Applications P.C., Athens, Greece
| | | |
Collapse
|
7
|
Blank C, Easterly C, Gruening B, Johnson J, Kolmeder CA, Kumar P, May D, Mehta S, Mesuere B, Brown Z, Elias JE, Hervey WJ, McGowan T, Muth T, Nunn B, Rudney J, Tanca A, Griffin TJ, Jagtap PD. Disseminating Metaproteomic Informatics Capabilities and Knowledge Using the Galaxy-P Framework. Proteomes 2018; 6:proteomes6010007. [PMID: 29385081 PMCID: PMC5874766 DOI: 10.3390/proteomes6010007] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 01/26/2018] [Accepted: 01/26/2018] [Indexed: 01/12/2023] Open
Abstract
The impact of microbial communities, also known as the microbiome, on human health and the environment is receiving increased attention. Studying translated gene products (proteins) and comparing metaproteomic profiles may elucidate how microbiomes respond to specific environmental stimuli, and interact with host organisms. Characterizing proteins expressed by a complex microbiome and interpreting their functional signature requires sophisticated informatics tools and workflows tailored to metaproteomics. Additionally, there is a need to disseminate these informatics resources to researchers undertaking metaproteomic studies, who could use them to make new and important discoveries in microbiome research. The Galaxy for proteomics platform (Galaxy-P) offers an open source, web-based bioinformatics platform for disseminating metaproteomics software and workflows. Within this platform, we have developed easily-accessible and documented metaproteomic software tools and workflows aimed at training researchers in their operation and disseminating the tools for more widespread use. The modular workflows encompass the core requirements of metaproteomic informatics: (a) database generation; (b) peptide spectral matching; (c) taxonomic analysis and (d) functional analysis. Much of the software available via the Galaxy-P platform was selected, packaged and deployed through an online metaproteomics "Contribution Fest" undertaken by a unique consortium of expert software developers and users from the metaproteomics research community, who have co-authored this manuscript. These resources are documented on GitHub and freely available through the Galaxy Toolshed, as well as a publicly accessible metaproteomics gateway Galaxy instance. These documented workflows are well suited for the training of novice metaproteomics researchers, through online resources such as the Galaxy Training Network, as well as hands-on training workshops. Here, we describe the metaproteomics tools available within these Galaxy-based resources, as well as the process by which they were selected and implemented in our community-based work. We hope this description will increase access to and utilization of metaproteomics tools, as well as offer a framework for continued community-based development and dissemination of cutting edge metaproteomics software.
Collapse
Affiliation(s)
- Clemens Blank
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg im Breisgau, Germany.
| | - Caleb Easterly
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Bjoern Gruening
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg im Breisgau, Germany.
| | - James Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Carolin A Kolmeder
- Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland.
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Damon May
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Bart Mesuere
- Computational Biology Group, Ghent University, Krijgslaan 281, B-9000 Ghent, Belgium.
| | - Zachary Brown
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Joshua E Elias
- Department of Chemical & Systems Biology, Stanford University, Stanford, CA 94305, USA.
| | - W Judson Hervey
- Center for Bio/Molecular Science & Engineering, Naval Research Laboratory, Washington, DC 20375, USA.
| | - Thomas McGowan
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Thilo Muth
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany.
| | - Brook Nunn
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
| | - Joel Rudney
- Department of Diagnostic and Biological Sciences, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Alessandro Tanca
- Porto Conte Ricerche Science and Technology Park of Sardinia, 07041 Alghero, Italy.
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| |
Collapse
|