1
|
Gendoo DMA. Overview of Bioinformatics Software and Databases for Metabolic Engineering. Methods Mol Biol 2023; 2553:265-274. [PMID: 36227548 DOI: 10.1007/978-1-0716-2617-7_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The explosion of the "omics" era has introduced a growing number of sets and tools that facilitate molecular interrogation of the metabolome. These include various bioinformatics and pharmacogenomics resources that can be utilized independently or collectively to facilitate metabolic engineering across disease, clinical oncology, and understanding of molecular changes across larger systems. This review provides starting points for accessing publicly available data and computational tools that support assessment of metabolic profiles and metabolic regulation, providing both a depth-and-breadth approach toward understanding the metabolome. We focus in particular on pathway databases and tools, which provide in-depth analysis of metabolic pathways, which is at the heart of metabolic engineering.
Collapse
Affiliation(s)
- Deena M A Gendoo
- Centre for Computational Biology, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.
| |
Collapse
|
2
|
Zhang Y, Jing G, Chen Y, Li J, Su X. Hierarchical Meta-Storms enables comprehensive and rapid comparison of microbiome functional profiles on a large scale using hierarchical dissimilarity metrics and parallel computing. BIOINFORMATICS ADVANCES 2021; 1:vbab003. [PMID: 36700101 PMCID: PMC9710644 DOI: 10.1093/bioadv/vbab003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 05/06/2021] [Indexed: 01/28/2023]
Abstract
Functional beta-diversity analysis on numerous microbiomes interprets the linkages between metabolic functions and their meta-data. To evaluate the microbiome beta-diversity, widely used distance metrices only count overlapped gene families but omit their inherent relationships, resulting in erroneous distances due to the sparsity of high-dimensional function profiles. Here we propose Hierarchical Meta-Storms (HMS) to tackle such problem. HMS contains two core components: (i) a dissimilarity algorithm that comprehensively measures functional distances among microbiomes using multi-level metabolic hierarchy and (ii) a fast Principal Co-ordinates Analysis (PCoA) implementation that deduces the beta-diversity pattern optimized by parallel computing. Results showed HMS can detect the variations of microbial functions in upper-level metabolic pathways, however, always missed by other methods. In addition, HMS accomplished the pairwise distance matrix and PCoA for 20 000 microbiomes in 3.9 h on a single computing node, which was 23 times faster and 80% less RAM consumption compared to existing methods, enabling the in-depth data mining among microbiomes on a high resolution. HMS takes microbiome functional profiles as input, produces their pairwise distance matrix and PCoA coordinates. Availability and implementation It is coded in C/C++ with parallel computing and released in two alternative forms: a standalone software (https://github.com/qdu-bioinfo/hierarchical-meta-storms) and an equivalent R package (https://github.com/qdu-bioinfo/hrms). Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Yufeng Zhang
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Gongchao Jing
- Single-Cell Center, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong 266101, China
| | - Yuzhu Chen
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Jinhua Li
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China,To whom correspondence should be addressed. or Jinhua Li
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China,Single-Cell Center, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong 266101, China,To whom correspondence should be addressed. or Jinhua Li
| |
Collapse
|
3
|
Stanstrup J, Broeckling CD, Helmus R, Hoffmann N, Mathé E, Naake T, Nicolotti L, Peters K, Rainer J, Salek RM, Schulze T, Schymanski EL, Stravs MA, Thévenot EA, Treutler H, Weber RJM, Willighagen E, Witting M, Neumann S. The metaRbolomics Toolbox in Bioconductor and beyond. Metabolites 2019; 9:E200. [PMID: 31548506 PMCID: PMC6835268 DOI: 10.3390/metabo9100200] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 09/16/2019] [Accepted: 09/17/2019] [Indexed: 11/17/2022] Open
Abstract
Metabolomics aims to measure and characterise the complex composition of metabolites in a biological system. Metabolomics studies involve sophisticated analytical techniques such as mass spectrometry and nuclear magnetic resonance spectroscopy, and generate large amounts of high-dimensional and complex experimental data. Open source processing and analysis tools are of major interest in light of innovative, open and reproducible science. The scientific community has developed a wide range of open source software, providing freely available advanced processing and analysis approaches. The programming and statistics environment R has emerged as one of the most popular environments to process and analyse Metabolomics datasets. A major benefit of such an environment is the possibility of connecting different tools into more complex workflows. Combining reusable data processing R scripts with the experimental data thus allows for open, reproducible research. This review provides an extensive overview of existing packages in R for different steps in a typical computational metabolomics workflow, including data processing, biostatistics, metabolite annotation and identification, and biochemical network and pathway analysis. Multifunctional workflows, possible user interfaces and integration into workflow management systems are also reviewed. In total, this review summarises more than two hundred metabolomics specific packages primarily available on CRAN, Bioconductor and GitHub.
Collapse
Affiliation(s)
- Jan Stanstrup
- Preventive and Clinical Nutrition, University of Copenhagen, Rolighedsvej 30, 1958 Frederiksberg C, Denmark.
| | - Corey D Broeckling
- Proteomics and Metabolomics Facility, Colorado State University, Fort Collins, CO 80523, USA.
| | - Rick Helmus
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, 1098 XH Amsterdam, The Netherlands.
| | - Nils Hoffmann
- Leibniz-Institut für Analytische Wissenschaften-ISAS-e.V., Otto-Hahn-Straße 6b, 44227 Dortmund, Germany.
| | - Ewy Mathé
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA.
| | - Thomas Naake
- Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam-Golm, Germany.
| | - Luca Nicolotti
- The Australian Wine Research Institute, Metabolomics Australia, PO Box 197, Adelaide SA 5064, Australia.
| | - Kristian Peters
- Leibniz Institute of Plant Biochemistry (IPB Halle), Bioinformatics and Scientific Data, 06120 Halle, Germany.
| | - Johannes Rainer
- Institute for Biomedicine, Eurac Research, Affiliated Institute of the University of Lübeck, 39100 Bolzano, Italy.
| | - Reza M Salek
- The International Agency for Research on Cancer, 150 cours Albert Thomas, CEDEX 08, 69372 Lyon, France.
| | - Tobias Schulze
- Department of Effect-Directed Analysis, Helmholtz Centre for Environmental Research-UFZ, Permoserstraße 15, 04318 Leipzig, Germany.
| | - Emma L Schymanski
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg.
| | - Michael A Stravs
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Überlandstrasse 133, 8600 Dubendorf, Switzerland.
| | - Etienne A Thévenot
- CEA, LIST, Laboratory for Data Sciences and Decision, MetaboHUB, Gif-Sur-Yvette F-91191, France.
| | - Hendrik Treutler
- Leibniz Institute of Plant Biochemistry (IPB Halle), Bioinformatics and Scientific Data, 06120 Halle, Germany.
| | - Ralf J M Weber
- Phenome Centre Birmingham and School of Biosciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK.
| | - Egon Willighagen
- Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.
| | - Michael Witting
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, 85764 Neuherberg, Germany.
- Chair of Analytical Food Chemistry, Technische Universität München, 85354 Weihenstephan, Germany.
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry (IPB Halle), Bioinformatics and Scientific Data, 06120 Halle, Germany.
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig Deutscher, Platz 5e, 04103 Leipzig, Germany.
| |
Collapse
|
4
|
Luo W, Pant G, Bhavnasi YK, Blanchard SG, Brouwer C. Pathview Web: user friendly pathway visualization and data integration. Nucleic Acids Res 2019; 45:W501-W508. [PMID: 28482075 PMCID: PMC5570256 DOI: 10.1093/nar/gkx372] [Citation(s) in RCA: 252] [Impact Index Per Article: 50.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 04/24/2017] [Indexed: 02/07/2023] Open
Abstract
Pathway analysis is widely used in omics studies. Pathway-based data integration and visualization is a critical component of the analysis. To address this need, we recently developed a novel R package called Pathview. Pathview maps, integrates and renders a large variety of biological data onto molecular pathway graphs. Here we developed the Pathview Web server, as to make pathway visualization and data integration accessible to all scientists, including those without the special computing skills or resources. Pathview Web features an intuitive graphical web interface and a user centered design. The server not only expands the core functions of Pathview, but also provides many useful features not available in the offline R package. Importantly, the server presents a comprehensive workflow for both regular and integrated pathway analysis of multiple omics data. In addition, the server also provides a RESTful API for programmatic access and conveniently integration in third-party software or workflows. Pathview Web is openly and freely accessible at https://pathview.uncc.edu/.
Collapse
Affiliation(s)
- Weijun Luo
- Department of Bioinformatics and Genomics, UNC Charlotte, Charlotte, NC 28223, USA.,UNC Charlotte Bioinformatics Service Division, North Carolina Research Campus, Kannapolis, NC 28081, USA
| | - Gaurav Pant
- Department of Bioinformatics and Genomics, UNC Charlotte, Charlotte, NC 28223, USA.,Department of Computer Science, UNC Charlotte, Charlotte, NC 28223, USA
| | - Yeshvant K Bhavnasi
- Department of Bioinformatics and Genomics, UNC Charlotte, Charlotte, NC 28223, USA.,Department of Computer Science, UNC Charlotte, Charlotte, NC 28223, USA
| | - Steven G Blanchard
- Department of Bioinformatics and Genomics, UNC Charlotte, Charlotte, NC 28223, USA.,UNC Charlotte Bioinformatics Service Division, North Carolina Research Campus, Kannapolis, NC 28081, USA
| | - Cory Brouwer
- Department of Bioinformatics and Genomics, UNC Charlotte, Charlotte, NC 28223, USA.,UNC Charlotte Bioinformatics Service Division, North Carolina Research Campus, Kannapolis, NC 28081, USA
| |
Collapse
|
5
|
Bauer CR, Knecht C, Fretter C, Baum B, Jendrossek S, Rühlemann M, Heinsen FA, Umbach N, Grimbacher B, Franke A, Lieb W, Krawczak M, Hütt MT, Sax U. Interdisciplinary approach towards a systems medicine toolbox using the example of inflammatory diseases. Brief Bioinform 2017; 18:479-487. [PMID: 27016392 PMCID: PMC5428997 DOI: 10.1093/bib/bbw024] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Revised: 01/28/2016] [Indexed: 12/24/2022] Open
Abstract
Electronic access to multiple data types, from generic information on biological systems at different functional and cellular levels to high-throughput molecular data from human patients, is a prerequisite of successful systems medicine research. However, scientists often encounter technical and conceptual difficulties that forestall the efficient and effective use of these resources. We summarize and discuss some of these obstacles, and suggest ways to avoid or evade them.The methodological gap between data capturing and data analysis is huge in human medical research. Primary data producers often do not fully apprehend the scientific value of their data, whereas data analysts maybe ignorant of the circumstances under which the data were collected. Therefore, the provision of easy-to-use data access tools not only helps to improve data quality on the part of the data producers but also is likely to foster an informed dialogue with the data analysts.We propose a means to integrate phenotypic data, questionnaire data and microbiome data with a user-friendly Systems Medicine toolbox embedded into i2b2/tranSMART. Our approach is exemplified by the integration of a basic outlier detection tool and a more advanced microbiome analysis (alpha diversity) script. Continuous discussion with clinicians, data managers, biostatisticians and systems medicine experts should serve to enrich even further the functionality of toolboxes like ours, being geared to be used by 'informed non-experts' but at the same time attuned to existing, more sophisticated analysis tools.
Collapse
Affiliation(s)
- Christian R Bauer
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
| | - Carolin Knecht
- Institute of Medical Informatics and Statistics, Christian-Albrechts-University, Kiel, Germany
| | - Christoph Fretter
- Institute of Medical Informatics and Statistics, Christian-Albrechts-University, Kiel, Germany
| | - Benjamin Baum
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
| | - Sandra Jendrossek
- Center for Chronic Immune Deficiency, University Medical Center Freiburg, Freiburg, Germany
| | - Malte Rühlemann
- Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, Germany
| | - Femke-Anouska Heinsen
- Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, Germany
| | - Nadine Umbach
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
| | - Bodo Grimbacher
- Center for Chronic Immune Deficiency, University Medical Center Freiburg, Freiburg, Germany
| | - Andre Franke
- Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, Germany
| | - Wolfgang Lieb
- Institute of Epidemiology, Christian-Albrechts-University, Kiel, Germany
| | - Michael Krawczak
- Institute of Medical Informatics and Statistics, Christian-Albrechts-University, Kiel, Germany
| | - Marc-Thorsten Hütt
- Department of Life Sciences and Chemistry, Jacobs University, Bremen, Germany
| | - Ulrich Sax
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
| |
Collapse
|
6
|
Abstract
Ontologies are powerful and popular tools to encode data in a structured format and manage knowledge. A large variety of existing ontologies offer users access to biomedical knowledge. This chapter contains a short theoretical background of ontologies and introduces two notable examples: The Gene Ontology and the ontology for Biological Pathways Exchange. For both ontologies a short overview and working bioinformatic applications, i.e., Gene Ontology enrichment analyses and pathway data visualization, are provided.
Collapse
Affiliation(s)
- Frank Kramer
- Department of Medical Statistics, University Medical Center Göttingen, Humboldtallee 32, 37073, Göttingen, Germany.
| | - Tim Beißbarth
- Department of Medical Statistics, University Medical Center Göttingen, Humboldtallee 32, 37073, Göttingen, Germany
| |
Collapse
|
7
|
Kruppa J, Kramer F, Beißbarth T, Jung K. A simulation framework for correlated count data of features subsets in high-throughput sequencing or proteomics experiments. Stat Appl Genet Mol Biol 2016; 15:401-414. [PMID: 27655448 DOI: 10.1515/sagmb-2015-0082] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
As part of the data processing of high-throughput-sequencing experiments count data are produced representing the amount of reads that map to specific genomic regions. Count data also arise in mass spectrometric experiments for the detection of protein-protein interactions. For evaluating new computational methods for the analysis of sequencing count data or spectral count data from proteomics experiments artificial count data is thus required. Although, some methods for the generation of artificial sequencing count data have been proposed, all of them simulate single sequencing runs, omitting thus the correlation structure between the individual genomic features, or they are limited to specific structures. We propose to draw correlated data from the multivariate normal distribution and round these continuous data in order to obtain discrete counts. In our approach, the required distribution parameters can either be constructed in different ways or estimated from real count data. Because rounding affects the correlation structure we evaluate the use of shrinkage estimators that have already been used in the context of artificial expression data from DNA microarrays. Our approach turned out to be useful for the simulation of counts for defined subsets of features such as individual pathways or GO categories.
Collapse
|
8
|
Abstract
Biological pathways are increasingly available in the BioPAX format which uses an RDF model for data storage. One can retrieve the information in this data model in the scripting language R using the package
rBiopaxParser, which converts the BioPAX format to one readable in R. It also has a function to build a regulatory network from the pathway information. Here we describe an extension of this function. The new function allows the user to build graphs of entire pathways, including regulated as well as non-regulated elements, and therefore provides a maximum of information. This function is available as part of the
rBiopaxParser distribution from Bioconductor.
Collapse
Affiliation(s)
- Nirupama Benis
- Host Microbe Interactomics, Wageningen University & Research, Wageningen, Netherlands
| | - Dirkjan Schokker
- Wageningen Livestock Research, Wageningen University & Research, Wageningen, Netherlands
| | - Frank Kramer
- Department of Medical Statistics, University Medical Center Goettingen, Goettingen, Germany
| | - Mari A Smits
- Wageningen Bioveterinary Research, Wageningen University & Research, Wageningen, Netherlands
| | - Maria Suarez-Diez
- Systems and Synthetic Biology, Wageningen University & Research, Wageningen, Netherlands
| |
Collapse
|
9
|
Luna A, Babur Ö, Aksoy BA, Demir E, Sander C. PaxtoolsR: pathway analysis in R using Pathway Commons. Bioinformatics 2015; 32:1262-4. [PMID: 26685306 PMCID: PMC4824129 DOI: 10.1093/bioinformatics/btv733] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Accepted: 12/09/2015] [Indexed: 11/13/2022] Open
Abstract
Purpose: PaxtoolsR package enables access to pathway data represented in the BioPAX format and made available through the Pathway Commons webservice for users of the R language to aid in advanced pathway analyses. Features include the extraction, merging and validation of pathway data represented in the BioPAX format. This package also provides novel pathway datasets and advanced querying features for R users through the Pathway Commons webservice allowing users to query, extract and retrieve data and integrate these data with local BioPAX datasets. Availability and implementation: The PaxtoolsR package is compatible with versions of R 3.1.1 (and higher) on Windows, Mac OS X and Linux using Bioconductor 3.0 and is available through the Bioconductor R package repository along with source code and a tutorial vignette describing common tasks, such as data visualization and gene set enrichment analysis. Source code and documentation are at http://www.bioconductor.org/packages/paxtoolsr. This plugin is free, open-source and licensed under the LGPL-3. Contact:paxtools@cbio.mskcc.org or lunaa@cbio.mskcc.org
Collapse
Affiliation(s)
- Augustin Luna
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Özgün Babur
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Bülent Arman Aksoy
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Emek Demir
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Chris Sander
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| |
Collapse
|
10
|
Bayerlová M, Klemm F, Kramer F, Pukrop T, Beißbarth T, Bleckmann A. Newly Constructed Network Models of Different WNT Signaling Cascades Applied to Breast Cancer Expression Data. PLoS One 2015; 10:e0144014. [PMID: 26632845 PMCID: PMC4669165 DOI: 10.1371/journal.pone.0144014] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 11/12/2015] [Indexed: 12/26/2022] Open
Abstract
INTRODUCTION WNT signaling is a complex process comprising multiple pathways: the canonical β-catenin-dependent pathway and several alternative non-canonical pathways that act in a β-catenin-independent manner. Representing these intricate signaling mechanisms through bioinformatic approaches is challenging. Nevertheless, a simplified but reliable bioinformatic WNT pathway model is needed, which can be further utilized to decipher specific WNT activation states within e.g. high-throughput data. RESULTS In order to build such a model, we collected, parsed, and curated available WNT signaling knowledge from different pathway databases. The data were assembled to construct computationally suitable models of different WNT signaling cascades in the form of directed signaling graphs. This resulted in four networks representing canonical WNT signaling, non-canonical WNT signaling, the inhibition of canonical WNT signaling and the regulation of WNT signaling pathways, respectively. Furthermore, these networks were integrated with microarray and RNA sequencing data to gain deeper insight into the underlying biology of gene expression differences between MCF-7 and MDA-MB-231 breast cancer cell lines, representing weakly and highly invasive breast carcinomas, respectively. Differential genes up-regulated in the MDA-MB-231 compared to the MCF-7 cell line were found to display enrichment in the gene set originating from the non-canonical network. Moreover, we identified and validated differentially regulated modules representing canonical and non-canonical WNT pathway components specific for the aggressive basal-like breast cancer subtype. CONCLUSIONS In conclusion, we demonstrated that these newly constructed WNT networks reliably reflect distinct WNT signaling processes. Using transcriptomic data, we shaped these networks into comprehensive modules of the genes implicated in the aggressive basal-like breast cancer subtype and demonstrated that non-canonical WNT signaling is important in this context. The topology of these networks can be further refined in the future by integration with complementary data such as protein-protein interactions, in order to gain greater insight into signaling processes.
Collapse
Affiliation(s)
- Michaela Bayerlová
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Florian Klemm
- Department of Hematology and Medical Oncology, University Medical Center Göttingen, Göttingen, Germany
| | - Frank Kramer
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Tobias Pukrop
- Department of Hematology and Medical Oncology, University Medical Center Göttingen, Göttingen, Germany
- Department of Internal Medicine III, University Hospital Regensburg, Regensburg, Germany
| | - Tim Beißbarth
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Annalen Bleckmann
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
- Department of Hematology and Medical Oncology, University Medical Center Göttingen, Göttingen, Germany
| |
Collapse
|
11
|
Bayerlová M, Jung K, Kramer F, Klemm F, Bleckmann A, Beißbarth T. Comparative study on gene set and pathway topology-based enrichment methods. BMC Bioinformatics 2015; 16:334. [PMID: 26489510 PMCID: PMC4618947 DOI: 10.1186/s12859-015-0751-5] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2015] [Accepted: 09/29/2015] [Indexed: 01/08/2023] Open
Abstract
Background Enrichment analysis is a popular approach to identify pathways or sets of genes which are significantly enriched in the context of differentially expressed genes. The traditional gene set enrichment approach considers a pathway as a simple gene list disregarding any knowledge of gene or protein interactions. In contrast, the new group of so called pathway topology-based methods integrates the topological structure of a pathway into the analysis. Methods We comparatively investigated gene set and pathway topology-based enrichment approaches, considering three gene set and four topological methods. These methods were compared in two extensive simulation studies and on a benchmark of 36 real datasets, providing the same pathway input data for all methods. Results In the benchmark data analysis both types of methods showed a comparable ability to detect enriched pathways. The first simulation study was conducted with KEGG pathways, which showed considerable gene overlaps between each other. In this study with original KEGG pathways, none of the topology-based methods outperformed the gene set approach. Therefore, a second simulation study was performed on non-overlapping pathways created by unique gene IDs. Here, methods accounting for pathway topology reached higher accuracy than the gene set methods, however their sensitivity was lower. Conclusions We conducted one of the first comprehensive comparative works on evaluating gene set against pathway topology-based enrichment methods. The topological methods showed better performance in the simulation scenarios with non-overlapping pathways, however, they were not conclusively better in the other scenarios. This suggests that simple gene set approach might be sufficient to detect an enriched pathway under realistic circumstances. Nevertheless, more extensive studies and further benchmark data are needed to systematically evaluate these methods and to assess what gain and cost pathway topology information introduces into enrichment analysis. Both types of methods for enrichment analysis require further improvements in order to deal with the problem of pathway overlaps. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0751-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Michaela Bayerlová
- Department of Medical Statistics, University Medical Center Göttingen, 37099, Göttingen, Germany.
| | - Klaus Jung
- Department of Medical Statistics, University Medical Center Göttingen, 37099, Göttingen, Germany.
| | - Frank Kramer
- Department of Medical Statistics, University Medical Center Göttingen, 37099, Göttingen, Germany.
| | - Florian Klemm
- Department of Hematology and Medical Oncology, University Medical Center Göttingen, 37099, Göttingen, Germany.
| | - Annalen Bleckmann
- Department of Medical Statistics, University Medical Center Göttingen, 37099, Göttingen, Germany. .,Department of Hematology and Medical Oncology, University Medical Center Göttingen, 37099, Göttingen, Germany.
| | - Tim Beißbarth
- Department of Medical Statistics, University Medical Center Göttingen, 37099, Göttingen, Germany.
| |
Collapse
|
12
|
Wachter A, Beißbarth T. pwOmics: an R package for pathway-based integration of time-series omics data using public database knowledge. Bioinformatics 2015; 31:3072-4. [PMID: 26002883 DOI: 10.1093/bioinformatics/btv323] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 05/17/2015] [Indexed: 02/07/2023] Open
Abstract
UNLABELLED Characterization of biological processes is progressively enabled with the increased generation of omics data on different signaling levels. Here we present a straightforward approach for the integrative analysis of data from different high-throughput technologies based on pathway and interaction models from public databases. pwOmics performs pathway-based level-specific data comparison of coupled human proteomic and genomic/transcriptomic datasets based on their log fold changes. Separate downstream and upstream analyses results on the functional levels of pathways, transcription factors and genes/transcripts are performed in the cross-platform consensus analysis. These provide a basis for the combined interpretation of regulatory effects over time. Via network reconstruction and inference methods (Steiner tree, dynamic Bayesian network inference) consensus graphical networks can be generated for further analyses and visualization. AVAILABILITY AND IMPLEMENTATION The R package pwOmics is freely available on Bioconductor (http://www.bioconductor.org/). CONTACT astrid.wachter@med.uni-goettingen.de.
Collapse
Affiliation(s)
- Astrid Wachter
- Department of Medical Statistics, Georg-August-University Göttingen, Germany
| | - Tim Beißbarth
- Department of Medical Statistics, Georg-August-University Göttingen, Germany
| |
Collapse
|
13
|
Dräger A, Palsson BØ. Improving collaboration by standardization efforts in systems biology. Front Bioeng Biotechnol 2014; 2:61. [PMID: 25538939 PMCID: PMC4259112 DOI: 10.3389/fbioe.2014.00061] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Accepted: 11/14/2014] [Indexed: 11/17/2022] Open
Abstract
Collaborative genome-scale reconstruction endeavors of metabolic networks would not be possible without a common, standardized formal representation of these systems. The ability to precisely define biological building blocks together with their dynamic behavior has even been considered a prerequisite for upcoming synthetic biology approaches. Driven by the requirements of such ambitious research goals, standardization itself has become an active field of research on nearly all levels of granularity in biology. In addition to the originally envisaged exchange of computational models and tool interoperability, new standards have been suggested for an unambiguous graphical display of biological phenomena, to annotate, archive, as well as to rank models, and to describe execution and the outcomes of simulation experiments. The spectrum now even covers the interaction of entire neurons in the brain, three-dimensional motions, and the description of pharmacometric studies. Thereby, the mathematical description of systems and approaches for their (repeated) simulation are clearly separated from each other and also from their graphical representation. Minimum information definitions constitute guidelines and common operation protocols in order to ensure reproducibility of findings and a unified knowledge representation. Central database infrastructures have been established that provide the scientific community with persistent links from model annotations to online resources. A rich variety of open-source software tools thrives for all data formats, often supporting a multitude of programing languages. Regular meetings and workshops of developers and users lead to continuous improvement and ongoing development of these standardization efforts. This article gives a brief overview about the current state of the growing number of operation protocols, mark-up languages, graphical descriptions, and fundamental software support with relevance to systems biology.
Collapse
Affiliation(s)
- Andreas Dräger
- Systems Biology Research Group, Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
- Cognitive Systems, Center for Bioinformatics Tübingen (ZBIT), Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Bernhard Ø. Palsson
- Systems Biology Research Group, Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| |
Collapse
|