1
|
Candia J, Ferrucci L. Assessment of Gene Set Enrichment Analysis using curated RNA-seq-based benchmarks. PLoS One 2024; 19:e0302696. [PMID: 38753612 PMCID: PMC11098418 DOI: 10.1371/journal.pone.0302696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 04/09/2024] [Indexed: 05/18/2024] Open
Abstract
Pathway enrichment analysis is a ubiquitous computational biology method to interpret a list of genes (typically derived from the association of large-scale omics data with phenotypes of interest) in terms of higher-level, predefined gene sets that share biological function, chromosomal location, or other common features. Among many tools developed so far, Gene Set Enrichment Analysis (GSEA) stands out as one of the pioneering and most widely used methods. Although originally developed for microarray data, GSEA is nowadays extensively utilized for RNA-seq data analysis. Here, we quantitatively assessed the performance of a variety of GSEA modalities and provide guidance in the practical use of GSEA in RNA-seq experiments. We leveraged harmonized RNA-seq datasets available from The Cancer Genome Atlas (TCGA) in combination with large, curated pathway collections from the Molecular Signatures Database to obtain cancer-type-specific target pathway lists across multiple cancer types. We carried out a detailed analysis of GSEA performance using both gene-set and phenotype permutations combined with four different choices for the Kolmogorov-Smirnov enrichment statistic. Based on our benchmarks, we conclude that the classic/unweighted gene-set permutation approach offered comparable or better sensitivity-vs-specificity tradeoffs across cancer types compared with other, more complex and computationally intensive permutation methods. Finally, we analyzed other large cohorts for thyroid cancer and hepatocellular carcinoma. We utilized a new consensus metric, the Enrichment Evidence Score (EES), which showed a remarkable agreement between pathways identified in TCGA and those from other sources, despite differences in cancer etiology. This finding suggests an EES-based strategy to identify a core set of pathways that may be complemented by an expanded set of pathways for downstream exploratory analysis. This work fills the existing gap in current guidelines and benchmarks for the use of GSEA with RNA-seq data and provides a framework to enable detailed benchmarking of other RNA-seq-based pathway analysis tools.
Collapse
Affiliation(s)
- Julián Candia
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, United States of America
| | - Luigi Ferrucci
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, United States of America
| |
Collapse
|
2
|
Hemandhar Kumar S, Tapken I, Kuhn D, Claus P, Jung K. bootGSEA: a bootstrap and rank aggregation pipeline for multi-study and multi-omics enrichment analyses. FRONTIERS IN BIOINFORMATICS 2024; 4:1380928. [PMID: 38633435 PMCID: PMC11021641 DOI: 10.3389/fbinf.2024.1380928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 03/18/2024] [Indexed: 04/19/2024] Open
Abstract
Introduction: Gene set enrichment analysis (GSEA) subsequent to differential expression analysis is a standard step in transcriptomics and proteomics data analysis. Although many tools for this step are available, the results are often difficult to reproduce because set annotations can change in the databases, that is, new features can be added or existing features can be removed. Finally, such changes in set compositions can have an impact on biological interpretation. Methods: We present bootGSEA, a novel computational pipeline, to study the robustness of GSEA. By repeating GSEA based on bootstrap samples, the variability and robustness of results can be studied. In our pipeline, not all genes or proteins are involved in the different bootstrap replicates of the analyses. Finally, we aggregate the ranks from the bootstrap replicates to obtain a score per gene set that shows whether it gains or loses evidence compared to the ranking of the standard GSEA. Rank aggregation is also used to combine GSEA results from different omics levels or from multiple independent studies at the same omics level. Results: By applying our approach to six independent cancer transcriptomics datasets, we showed that bootstrap GSEA can aid in the selection of more robust enriched gene sets. Additionally, we applied our approach to paired transcriptomics and proteomics data obtained from a mouse model of spinal muscular atrophy (SMA), a neurodegenerative and neurodevelopmental disease associated with multi-system involvement. After obtaining a robust ranking at both omics levels, both ranking lists were combined to aggregate the findings from the transcriptomics and proteomics results. Furthermore, we constructed the new R-package "bootGSEA," which implements the proposed methods and provides graphical views of the findings. Bootstrap-based GSEA was able in the example datasets to identify gene or protein sets that were less robust when the set composition changed during bootstrap analysis. Discussion: The rank aggregation step was useful for combining bootstrap results and making them comparable to the original findings on the single-omics level or for combining findings from multiple different omics levels.
Collapse
Affiliation(s)
- Shamini Hemandhar Kumar
- Institute for Animal Genomics, University of Veterinary Medicine, Foundation, Hannover, Germany
- Center for Systems Neuroscience (ZSN), University of Veterinary Medicine, Foundation, Hannover, Germany
| | - Ines Tapken
- Center for Systems Neuroscience (ZSN), University of Veterinary Medicine, Foundation, Hannover, Germany
- SMATHERIA gGmbH—Non-Profit Biomedical Research Institute, Hannover, Germany
| | - Daniela Kuhn
- SMATHERIA gGmbH—Non-Profit Biomedical Research Institute, Hannover, Germany
- Clinic for Conservative Dentistry, Periodontology and Preventive Dentistry, Hannover Medical School, Hannover, Germany
| | - Peter Claus
- Center for Systems Neuroscience (ZSN), University of Veterinary Medicine, Foundation, Hannover, Germany
- SMATHERIA gGmbH—Non-Profit Biomedical Research Institute, Hannover, Germany
| | - Klaus Jung
- Institute for Animal Genomics, University of Veterinary Medicine, Foundation, Hannover, Germany
- Center for Systems Neuroscience (ZSN), University of Veterinary Medicine, Foundation, Hannover, Germany
| |
Collapse
|
3
|
Vaswani CM, Simone J, Pavelick JL, Wu X, Tan GW, Ektesabi AM, Gupta S, Tsoporis JN, Dos Santos CC. Tiny Guides, Big Impact: Focus on the Opportunities and Challenges of miR-Based Treatments for ARDS. Int J Mol Sci 2024; 25:2812. [PMID: 38474059 DOI: 10.3390/ijms25052812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 02/24/2024] [Accepted: 02/25/2024] [Indexed: 03/14/2024] Open
Abstract
Acute Respiratory Distress Syndrome (ARDS) is characterized by lung inflammation and increased membrane permeability, which represents the leading cause of mortality in ICUs. Mechanical ventilation strategies are at the forefront of supportive approaches for ARDS. Recently, an increasing understanding of RNA biology, function, and regulation, as well as the success of RNA vaccines, has spurred enthusiasm for the emergence of novel RNA-based therapeutics. The most common types of RNA seen in development are silencing (si)RNAs, antisense oligonucleotide therapy (ASO), and messenger (m)RNAs that collectively account for 80% of the RNA therapeutics pipeline. These three RNA platforms are the most mature, with approved products and demonstrated commercial success. Most recently, miRNAs have emerged as pivotal regulators of gene expression. Their dysregulation in various clinical conditions offers insights into ARDS pathogenesis and offers the innovative possibility of using microRNAs as targeted therapy. This review synthesizes the current state of the literature to contextualize the therapeutic potential of miRNA modulation. It considers the potential for miR-based therapeutics as a nuanced approach that incorporates the complexity of ARDS pathophysiology and the multifaceted nature of miRNA interactions.
Collapse
Affiliation(s)
- Chirag M Vaswani
- Department of Physiology, Temerty Faculty of Medicine, University of Toronto, Toronto, ON M5S 1A8, Canada
- Keenan Research Centre for Biomedical Science, St. Michael's Hospital, University of Toronto, Toronto, ON M5B 1W8, Canada
| | - Julia Simone
- Department of Medicine, McMaster University, Hamilton, ON L8V 5C2, Canada
| | - Jacqueline L Pavelick
- Institute of Medical Sciences, Temerty Faculty of Medicine, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Xiao Wu
- Keenan Research Centre for Biomedical Science, St. Michael's Hospital, University of Toronto, Toronto, ON M5B 1W8, Canada
| | - Greaton W Tan
- Department of Physiology, Temerty Faculty of Medicine, University of Toronto, Toronto, ON M5S 1A8, Canada
- Keenan Research Centre for Biomedical Science, St. Michael's Hospital, University of Toronto, Toronto, ON M5B 1W8, Canada
| | - Amin M Ektesabi
- Keenan Research Centre for Biomedical Science, St. Michael's Hospital, University of Toronto, Toronto, ON M5B 1W8, Canada
- Institute of Medical Sciences, Temerty Faculty of Medicine, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Sahil Gupta
- Faculty of Medicine, School of Medicine, The University of Queensland, Herston, QLD 4006, Australia
| | - James N Tsoporis
- Keenan Research Centre for Biomedical Science, St. Michael's Hospital, University of Toronto, Toronto, ON M5B 1W8, Canada
| | - Claudia C Dos Santos
- Department of Physiology, Temerty Faculty of Medicine, University of Toronto, Toronto, ON M5S 1A8, Canada
- Keenan Research Centre for Biomedical Science, St. Michael's Hospital, University of Toronto, Toronto, ON M5B 1W8, Canada
- Institute of Medical Sciences, Temerty Faculty of Medicine, University of Toronto, Toronto, ON M5S 1A8, Canada
- Laboratory Medicine and Pathobiology, Temerty Faculty of Medicine, University of Toronto, Toronto, ON M5S 1A8, Canada
- Interdepartmental Division of Critical Care, St. Michael's Hospital, University of Toronto, Toronto, ON M5B 1W8, Canada
| |
Collapse
|
4
|
Buzzao D, Castresana-Aguirre M, Guala D, Sonnhammer ELL. Benchmarking enrichment analysis methods with the disease pathway network. Brief Bioinform 2024; 25:bbae069. [PMID: 38436561 PMCID: PMC10939300 DOI: 10.1093/bib/bbae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 01/10/2024] [Accepted: 02/03/2024] [Indexed: 03/05/2024] Open
Abstract
Enrichment analysis (EA) is a common approach to gain functional insights from genome-scale experiments. As a consequence, a large number of EA methods have been developed, yet it is unclear from previous studies which method is the best for a given dataset. The main issues with previous benchmarks include the complexity of correctly assigning true pathways to a test dataset, and lack of generality of the evaluation metrics, for which the rank of a single target pathway is commonly used. We here provide a generalized EA benchmark and apply it to the most widely used EA methods, representing all four categories of current approaches. The benchmark employs a new set of 82 curated gene expression datasets from DNA microarray and RNA-Seq experiments for 26 diseases, of which only 13 are cancers. In order to address the shortcomings of the single target pathway approach and to enhance the sensitivity evaluation, we present the Disease Pathway Network, in which related Kyoto Encyclopedia of Genes and Genomes pathways are linked. We introduce a novel approach to evaluate pathway EA by combining sensitivity and specificity to provide a balanced evaluation of EA methods. This approach identifies Network Enrichment Analysis methods as the overall top performers compared with overlap-based methods. By using randomized gene expression datasets, we explore the null hypothesis bias of each method, revealing that most of them produce skewed P-values.
Collapse
Affiliation(s)
- Davide Buzzao
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden
| | | | - Dimitri Guala
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden
| |
Collapse
|
5
|
Hui TX, Kasim S, Aziz IA, Fudzee MFM, Haron NS, Sutikno T, Hassan R, Mahdin H, Sen SC. Robustness evaluations of pathway activity inference methods on gene expression data. BMC Bioinformatics 2024; 25:23. [PMID: 38216898 PMCID: PMC10785356 DOI: 10.1186/s12859-024-05632-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 01/02/2024] [Indexed: 01/14/2024] Open
Abstract
BACKGROUND With the exponential growth of high-throughput technologies, multiple pathway analysis methods have been proposed to estimate pathway activities from gene expression profiles. These pathway activity inference methods can be divided into two main categories: non-Topology-Based (non-TB) and Pathway Topology-Based (PTB) methods. Although some review and survey articles discussed the topic from different aspects, there is a lack of systematic assessment and comparisons on the robustness of these approaches. RESULTS Thus, this study presents comprehensive robustness evaluations of seven widely used pathway activity inference methods using six cancer datasets based on two assessments. The first assessment seeks to investigate the robustness of pathway activity in pathway activity inference methods, while the second assessment aims to assess the robustness of risk-active pathways and genes predicted by these methods. The mean reproducibility power and total number of identified informative pathways and genes were evaluated. Based on the first assessment, the mean reproducibility power of pathway activity inference methods generally decreased as the number of pathway selections increased. Entropy-based Directed Random Walk (e-DRW) distinctly outperformed other methods in exhibiting the greatest reproducibility power across all cancer datasets. On the other hand, the second assessment shows that no methods provide satisfactory results across datasets. CONCLUSION However, PTB methods generally appear to perform better in producing greater reproducibility power and identifying potential cancer markers compared to non-TB methods.
Collapse
Affiliation(s)
- Tay Xin Hui
- Soft Computing and Data Mining Center, Faculty of Computer Sciences and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), 83000, Batu Pahat, Malaysia
| | - Shahreen Kasim
- Soft Computing and Data Mining Center, Faculty of Computer Sciences and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), 83000, Batu Pahat, Malaysia.
| | - Izzatdin Abdul Aziz
- Computer and Information Sciences Department (CISD), Universiti Teknologi PETRONAS (UTP), 32610, Seri Iskandar, Malaysia
| | - Mohd Farhan Md Fudzee
- Soft Computing and Data Mining Center, Faculty of Computer Sciences and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), 83000, Batu Pahat, Malaysia
| | - Nazleeni Samiha Haron
- Computer and Information Sciences Department (CISD), Universiti Teknologi PETRONAS (UTP), 32610, Seri Iskandar, Malaysia
| | - Tole Sutikno
- Department of Electrical Engineering, Universitas Ahmad Dahlan (UAD), 55166, Yogyakarta, Indonesia
| | - Rohayanti Hassan
- Faculty of Electrical Engineering, Universiti Teknologi Malaysia (UTM), 81310, Johor Bahru, Malaysia
| | - Hairulnizam Mahdin
- Soft Computing and Data Mining Center, Faculty of Computer Sciences and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), 83000, Batu Pahat, Malaysia
| | - Seah Choon Sen
- Faculty of Computing, Universiti Teknologi Malaysia (UTM), 81310, Johor Bahru, Malaysia
| |
Collapse
|
6
|
Hakobyan S, Stepanyan A, Nersisyan L, Binder H, Arakelyan A. PSF toolkit: an R package for pathway curation and topology-aware analysis. Front Genet 2023; 14:1264656. [PMID: 37680201 PMCID: PMC10482229 DOI: 10.3389/fgene.2023.1264656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 08/09/2023] [Indexed: 09/09/2023] Open
Abstract
Most high throughput genomic data analysis pipelines currently rely on over-representation or gene set enrichment analysis (ORA/GSEA) approaches for functional analysis. In contrast, topology-based pathway analysis methods, which offer a more biologically informed perspective by incorporating interaction and topology information, have remained underutilized and inaccessible due to various limiting factors. These methods heavily rely on the quality of pathway topologies and often utilize predefined topologies from databases without assessing their correctness. To address these issues and make topology-aware pathway analysis more accessible and flexible, we introduce the PSF (Pathway Signal Flow) toolkit R package. Our toolkit integrates pathway curation and topology-based analysis, providing interactive and command-line tools that facilitate pathway importation, correction, and modification from diverse sources. This enables users to perform topology-based pathway signal flow analysis in both interactive and command-line modes. To showcase the toolkit's usability, we curated 36 KEGG signaling pathways and conducted several use-case studies, comparing our method with ORA and the topology-based signaling pathway impact analysis (SPIA) method. The results demonstrate that the algorithm can effectively identify ORA enriched pathways while providing more detailed branch-level information. Moreover, in contrast to the SPIA method, it offers the advantage of being cut-off free and less susceptible to the variability caused by selection thresholds. By combining pathway curation and topology-based analysis, the PSF toolkit enhances the quality, flexibility, and accessibility of topology-aware pathway analysis. Researchers can now easily import pathways from various sources, correct and modify them as needed, and perform detailed topology-based pathway signal flow analysis. In summary, our PSF toolkit offers an integrated solution that addresses the limitations of current topology-based pathway analysis methods. By providing interactive and command-line tools for pathway curation and topology-based analysis, we empower researchers to conduct comprehensive pathway analyses across a wide range of applications.
Collapse
Affiliation(s)
- Siras Hakobyan
- Bioinformatics Group, Institute of Molecular Biology, Armenian National Academy of Sciences, Yerevan, Armenia
- Armenian Bioinformatics Institute (ABI), Yerevan, Armenia
| | | | | | - Hans Binder
- Armenian Bioinformatics Institute, Yerevan, Armenia
- Interdisciplinary Centre for Bioinformatics, University of Leipzig, Leipzig, Germany
| | - Arsen Arakelyan
- Bioinformatics Group, Institute of Molecular Biology, Armenian National Academy of Sciences, Yerevan, Armenia
- Russian-Armenian University, Yerevan, Armenia
| |
Collapse
|
7
|
Zhao K, Rhee SY. Interpreting omics data with pathway enrichment analysis. Trends Genet 2023; 39:308-319. [PMID: 36750393 DOI: 10.1016/j.tig.2023.01.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 11/24/2022] [Accepted: 01/13/2023] [Indexed: 02/09/2023]
Abstract
Pathway enrichment analysis is indispensable for interpreting omics datasets and generating hypotheses. However, the foundations of enrichment analysis remain elusive to many biologists. Here, we discuss best practices in interpreting different types of omics data using pathway enrichment analysis and highlight the importance of considering intrinsic features of various types of omics data. We further explain major components that influence the outcomes of a pathway enrichment analysis, including defining background sets and choosing reference annotation databases. To improve reproducibility, we describe how to standardize reporting methodological details in publications. This article aims to serve as a primer for biologists to leverage the wealth of omics resources and motivate bioinformatics tool developers to enhance the power of pathway enrichment analysis.
Collapse
Affiliation(s)
- Kangmei Zhao
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94025, USA.
| | - Seung Yon Rhee
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94025, USA.
| |
Collapse
|
8
|
Lu Y, Pang Z, Xia J. Comprehensive investigation of pathway enrichment methods for functional interpretation of LC-MS global metabolomics data. Brief Bioinform 2023; 24:bbac553. [PMID: 36572652 PMCID: PMC9851290 DOI: 10.1093/bib/bbac553] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/31/2022] [Accepted: 11/15/2022] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Global or untargeted metabolomics is widely used to comprehensively investigate metabolic profiles under various pathophysiological conditions such as inflammations, infections, responses to exposures or interactions with microbial communities. However, biological interpretation of global metabolomics data remains a daunting task. Recent years have seen growing applications of pathway enrichment analysis based on putative annotations of liquid chromatography coupled with mass spectrometry (LC-MS) peaks for functional interpretation of LC-MS-based global metabolomics data. However, due to intricate peak-metabolite and metabolite-pathway relationships, considerable variations are observed among results obtained using different approaches. There is an urgent need to benchmark these approaches to inform the best practices. RESULTS We have conducted a benchmark study of common peak annotation approaches and pathway enrichment methods in current metabolomics studies. Representative approaches, including three peak annotation methods and four enrichment methods, were selected and benchmarked under different scenarios. Based on the results, we have provided a set of recommendations regarding peak annotation, ranking metrics and feature selection. The overall better performance was obtained for the mummichog approach. We have observed that a ~30% annotation rate is sufficient to achieve high recall (~90% based on mummichog), and using semi-annotated data improves functional interpretation. Based on the current platforms and enrichment methods, we further propose an identifiability index to indicate the possibility of a pathway being reliably identified. Finally, we evaluated all methods using 11 COVID-19 and 8 inflammatory bowel diseases (IBD) global metabolomics datasets.
Collapse
Affiliation(s)
- Yao Lu
- Department of Microbiology and Immunology, McGill University, Quebec, Canada
| | - Zhiqiang Pang
- Institute of Parasitology, McGill University, Quebec, Canada
| | - Jianguo Xia
- Department of Microbiology and Immunology, McGill University, Quebec, Canada
- Institute of Parasitology, McGill University, Quebec, Canada
| |
Collapse
|
9
|
Data-driven analysis and druggability assessment methods to accelerate the identification of novel cancer targets. Comput Struct Biotechnol J 2022; 21:46-57. [PMID: 36514341 PMCID: PMC9732000 DOI: 10.1016/j.csbj.2022.11.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 11/21/2022] [Accepted: 11/21/2022] [Indexed: 11/27/2022] Open
Abstract
Over the past few decades, drug discovery has greatly improved the outcomes for patients, but several challenges continue to hinder the rapid development of novel drugs. Addressing unmet clinical needs requires the pursuit of drug targets that have a higher likelihood to lead to the development of successful drugs. Here we describe a bioinformatic approach for identifying novel cancer drug targets by performing statistical analysis to ascertain quantitative changes in expression levels between protein-coding genes, as well as co-expression networks to classify these genes into groups. Subsequently, we provide an overview of druggability assessment methodologies to prioritize and select the best targets to pursue.
Collapse
|
10
|
Wieder C, Lai RPJ, Ebbels TMD. Single sample pathway analysis in metabolomics: performance evaluation and application. BMC Bioinformatics 2022; 23:481. [PMID: 36376837 PMCID: PMC9664704 DOI: 10.1186/s12859-022-05005-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 10/25/2022] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Single sample pathway analysis (ssPA) transforms molecular level omics data to the pathway level, enabling the discovery of patient-specific pathway signatures. Compared to conventional pathway analysis, ssPA overcomes the limitations by enabling multi-group comparisons, alongside facilitating numerous downstream analyses such as pathway-based machine learning. While in transcriptomics ssPA is a widely used technique, there is little literature evaluating its suitability for metabolomics. Here we provide a benchmark of established ssPA methods (ssGSEA, GSVA, SVD (PLAGE), and z-score) alongside the evaluation of two novel methods we propose: ssClustPA and kPCA, using semi-synthetic metabolomics data. We then demonstrate how ssPA can facilitate pathway-based interpretation of metabolomics data by performing a case-study on inflammatory bowel disease mass spectrometry data, using clustering to determine subtype-specific pathway signatures. RESULTS While GSEA-based and z-score methods outperformed the others in terms of recall, clustering/dimensionality reduction-based methods provided higher precision at moderate-to-high effect sizes. A case study applying ssPA to inflammatory bowel disease data demonstrates how these methods yield a much richer depth of interpretation than conventional approaches, for example by clustering pathway scores to visualise a pathway-based patient subtype-specific correlation network. We also developed the sspa python package (freely available at https://pypi.org/project/sspa/ ), providing implementations of all the methods benchmarked in this study. CONCLUSION This work underscores the value ssPA methods can add to metabolomic studies and provides a useful reference for those wishing to apply ssPA methods to metabolomics data.
Collapse
Affiliation(s)
- Cecilia Wieder
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, UK
| | - Rachel P J Lai
- Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, UK
| | - Timothy M D Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, UK.
| |
Collapse
|
11
|
Liu H, Yuan M, Mitra R, Zhou X, Long M, Lei W, Zhou S, Huang YE, Hou F, Eischen CM, Jiang W. CTpathway: a CrossTalk-based pathway enrichment analysis method for cancer research. Genome Med 2022; 14:118. [PMID: 36229842 PMCID: PMC9563764 DOI: 10.1186/s13073-022-01119-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 09/26/2022] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Pathway enrichment analysis (PEA) is a common method for exploring functions of hundreds of genes and identifying disease-risk pathways. Moreover, different pathways exert their functions through crosstalk. However, existing PEA methods do not sufficiently integrate essential pathway features, including pathway crosstalk, molecular interactions, and network topologies, resulting in many risk pathways that remain uninvestigated. METHODS To overcome these limitations, we develop a new crosstalk-based PEA method, CTpathway, based on a global pathway crosstalk map (GPCM) with >440,000 edges by combing pathways from eight resources, transcription factor-gene regulations, and large-scale protein-protein interactions. Integrating gene differential expression and crosstalk effects in GPCM, we assign a risk score to genes in the GPCM and identify risk pathways enriched with the risk genes. RESULTS Analysis of >8300 expression profiles covering ten cancer tissues and blood samples indicates that CTpathway outperforms the current state-of-the-art methods in identifying risk pathways with higher accuracy, reproducibility, and speed. CTpathway recapitulates known risk pathways and exclusively identifies several previously unreported critical pathways for individual cancer types. CTpathway also outperforms other methods in identifying risk pathways across all cancer stages, including early-stage cancer with a small number of differentially expressed genes. Moreover, the robust design of CTpathway enables researchers to analyze both bulk and single-cell RNA-seq profiles to predict both cancer tissue and cell type-specific risk pathways with higher accuracy. CONCLUSIONS Collectively, CTpathway is a fast, accurate, and stable pathway enrichment analysis method for cancer research that can be used to identify cancer risk pathways. The CTpathway interactive web server can be accessed here http://www.jianglab.cn/CTpathway/ . The stand-alone program can be accessed here https://github.com/Bioccjw/CTpathway .
Collapse
Affiliation(s)
- Haizhou Liu
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China
| | - Mengqin Yuan
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China
| | - Ramkrishna Mitra
- Department of Pharmacology, Physiology, and Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson University, 233 South 10th St., Philadelphia, PA, 19107, USA
| | - Xu Zhou
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China
| | - Min Long
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China
| | - Wanyue Lei
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China
| | - Shunheng Zhou
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China
| | - Yu-E Huang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China
| | - Fei Hou
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China
| | - Christine M Eischen
- Department of Pharmacology, Physiology, and Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson University, 233 South 10th St., Philadelphia, PA, 19107, USA.
| | - Wei Jiang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China.
| |
Collapse
|
12
|
Grassi M, Tarantino B. SEMgsa: topology-based pathway enrichment analysis with structural equation models. BMC Bioinformatics 2022; 23:344. [PMID: 35978279 PMCID: PMC9385099 DOI: 10.1186/s12859-022-04884-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 08/09/2022] [Indexed: 11/25/2022] Open
Abstract
Background Pathway enrichment analysis is extensively used in high-throughput experimental studies to gain insight into the functional roles of pre-defined subsets of genes, proteins and metabolites. Methods that leverages information on the topology of the underlying pathways outperform simpler methods that only consider pathway membership, leading to improved performance. Among all the proposed software tools, there’s the need to combine high statistical power together with a user-friendly framework, making it difficult to choose the best method for a particular experimental environment. Results We propose SEMgsa, a topology-based algorithm developed into the framework of structural equation models. SEMgsa combine the SEM p values regarding node-specific group effect estimates in terms of activation or inhibition, after statistically controlling biological relations among genes within pathways. We used SEMgsa to identify biologically relevant results in a Coronavirus disease (COVID-19) RNA-seq dataset (GEO accession: GSE172114) together with a frontotemporal dementia (FTD) DNA methylation dataset (GEO accession: GSE53740) and compared its performance with some existing methods. SEMgsa is highly sensitive to the pathways designed for the specific disease, showing low p values (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$< 0.001$$\end{document}<0.001) and ranking in high positions, outperforming existing software tools. Three pathway dysregulation mechanisms were used to generate simulated expression data and evaluate the performance of methods in terms of type I error followed by their statistical power. Simulation results confirm best overall performance of SEMgsa. Conclusions SEMgsa is a novel yet powerful method for identifying enrichment with regard to gene expression data. It takes into account topological information and exploits pathway perturbation statistics to reveal biological information. SEMgsa is implemented in the R package SEMgraph, easily available at https://CRAN.R-project.org/package=SEMgraph. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04884-8.
Collapse
Affiliation(s)
- Mario Grassi
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| | - Barbara Tarantino
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy.
| |
Collapse
|
13
|
Wang Y, Hong Y, Mao S, Jiang Y, Cui Y, Pan J, Luo Y. An Interaction-Based Method for Refining Results From Gene Set Enrichment Analysis. Front Genet 2022; 13:890672. [PMID: 35706447 PMCID: PMC9189359 DOI: 10.3389/fgene.2022.890672] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 05/04/2022] [Indexed: 11/13/2022] Open
Abstract
Purpose: To demonstrate an interaction-based method for the refinement of Gene Set Enrichment Analysis (GSEA) results. Method: Intravitreal injection of miR-124-3p antagomir was used to knockdown the expression of miR-124-3p in mouse retina at postnatal day 3 (P3). Whole retinal RNA was extracted for mRNA transcriptome sequencing at P9. After preprocessing the dataset, GSEA was performed, and the leading-edge subsets were obtained. The Apriori algorithm was used to identify the frequent genes or gene sets from the union of the leading-edge subsets. A new statistic d was introduced to evaluate the frequent genes or gene sets. Reverse transcription quantitative PCR (RT-qPCR) was performed to validate the expression trend of candidate genes after the knockdown of miR-124-3p. Results: A total of 115,140 assembled transcript sequences were obtained from the clean data. With GSEA, the NOD-like receptor signaling pathway, C-type-like lectin receptor signaling pathway, phagosome, necroptosis, JAK-STAT signaling pathway, Toll-like receptor signaling pathway, leukocyte transendothelial migration, chemokine signaling pathway, NF-kappa B signaling pathway and RIG-I-like signaling pathway were identified as the top 10 enriched pathways, and their leading-edge subsets were obtained. After being refined by the Apriori algorithm and sorted by the value of the modulus of d, Prkcd, Irf9, Stat3, Cxcl12, Stat1, Stat2, Isg15, Eif2ak2, Il6st, Pdgfra, Socs4 and Csf2ra had the significant number of interactions and the greatest value of d to downstream genes among all frequent transactions. Results of RT-qPCR validation for the expression of candidate genes after the knockdown of miR-124-3p showed a similar trend to the RNA-Seq results. Conclusion: This study indicated that using the Apriori algorithm and defining the statistic d was a novel way to refine the GSEA results. We hope to convey the intricacies from the computational results to the low-throughput experiments, and to plan experimental investigations specifically.
Collapse
Affiliation(s)
- Yishen Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Yiwen Hong
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Shudi Mao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Yukang Jiang
- Department of Statistical Science, School of Mathematics, Sun Yat-Sen University, Guangzhou, China
| | - Yamei Cui
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Jianying Pan
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Yan Luo
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
- *Correspondence: Yan Luo,
| |
Collapse
|
14
|
Mubeen S, Tom Kodamullil A, Hofmann-Apitius M, Domingo-Fernández D. On the influence of several factors on pathway enrichment analysis. Brief Bioinform 2022; 23:bbac143. [PMID: 35453140 PMCID: PMC9116215 DOI: 10.1093/bib/bbac143] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 03/21/2022] [Accepted: 03/30/2022] [Indexed: 02/01/2023] Open
Abstract
Pathway enrichment analysis has become a widely used knowledge-based approach for the interpretation of biomedical data. Its popularity has led to an explosion of both enrichment methods and pathway databases. While the elegance of pathway enrichment lies in its simplicity, multiple factors can impact the results of such an analysis, which may not be accounted for. Researchers may fail to give influential aspects their due, resorting instead to popular methods and gene set collections, or default settings. Despite ongoing efforts to establish set guidelines, meaningful results are still hampered by a lack of consensus or gold standards around how enrichment analysis should be conducted. Nonetheless, such concerns have prompted a series of benchmark studies specifically focused on evaluating the influence of various factors on pathway enrichment results. In this review, we organize and summarize the findings of these benchmarks to provide a comprehensive overview on the influence of these factors. Our work covers a broad spectrum of factors, spanning from methodological assumptions to those related to prior biological knowledge, such as pathway definitions and database choice. In doing so, we aim to shed light on how these aspects can lead to insignificant, uninteresting or even contradictory results. Finally, we conclude the review by proposing future benchmarks as well as solutions to overcome some of the challenges, which originate from the outlined factors.
Collapse
Affiliation(s)
- Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115 Bonn, Germany
- Fraunhofer Center for Machine Learning, Germany
| | - Alpha Tom Kodamullil
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115 Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Fraunhofer Center for Machine Learning, Germany
- Enveda Biosciences, Boulder, CO, 80301, USA
| |
Collapse
|
15
|
Jaakkola MK, Elo LL. Estimating cell type-specific differential expression using deconvolution. Brief Bioinform 2021; 23:6396788. [PMID: 34651640 PMCID: PMC8769698 DOI: 10.1093/bib/bbab433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 09/17/2021] [Accepted: 09/23/2021] [Indexed: 12/02/2022] Open
Affiliation(s)
- Maria K Jaakkola
- Department of Mathematics and Statistics, University of Turku, Yliopistonmäki, 20014, Turku, Finland
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520, Turku, Finland.,Institute of Biomedicine, University of Turku, Kiinamyllynkatu 10, FI-20520, Turku, Finland
| |
Collapse
|
16
|
Fabris F, Palmer D, de Magalhães JP, Freitas AA. Comparing enrichment analysis and machine learning for identifying gene properties that discriminate between gene classes. Brief Bioinform 2021; 21:803-814. [PMID: 30895300 DOI: 10.1093/bib/bbz028] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 02/18/2019] [Accepted: 02/19/2019] [Indexed: 01/08/2023] Open
Abstract
Biologists very often use enrichment methods based on statistical hypothesis tests to identify gene properties that are significantly over-represented in a given set of genes of interest, by comparison with a 'background' set of genes. These enrichment methods, although based on rigorous statistical foundations, are not always the best single option to identify patterns in biological data. In many cases, one can also use classification algorithms from the machine-learning field. Unlike enrichment methods, classification algorithms are designed to maximize measures of predictive performance and are capable of analysing combinations of gene properties, instead of one property at a time. In practice, however, the majority of studies use either enrichment or classification methods (rather than both), and there is a lack of literature discussing the pros and cons of both types of method. The goal of this paper is to compare and contrast enrichment and classification methods, offering two contributions. First, we discuss the (to some extent complementary) advantages and disadvantages of both types of methods for identifying gene properties that discriminate between gene classes. Second, we provide a set of high-level recommendations for using enrichment and classification methods. Overall, by highlighting the strengths and the weaknesses of both types of methods we argue that both should be used in bioinformatics analyses.
Collapse
Affiliation(s)
- Fabio Fabris
- School of Computing, University of Kent, Kent, CT2 7NF, UK
| | - Daniel Palmer
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - João Pedro de Magalhães
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - Alex A Freitas
- School of Computing, University of Kent, Kent, CT2 7NF, UK
| |
Collapse
|
17
|
Pérez-Rodríguez D, López-Fernández H, Agís-Balboa RC. Application of miRNA-seq in neuropsychiatry: A methodological perspective. Comput Biol Med 2021; 135:104603. [PMID: 34216893 DOI: 10.1016/j.compbiomed.2021.104603] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 06/21/2021] [Accepted: 06/21/2021] [Indexed: 10/21/2022]
Abstract
MiRNAs are emerging as key molecules to study neuropsychiatric diseases. However, despite the large number of methodologies and software for miRNA-seq analyses, there is little supporting literature for researchers in this area. This review focuses on evaluating how miRNA-seq has been used to study neuropsychiatric diseases to date, analyzing both the main findings discovered and the bioinformatics workflows and tools used from a methodological perspective. The objective of this review is two-fold: first, to evaluate current miRNA-seq procedures used in neuropsychiatry; and second, to offer comprehensive information that can serve as a guide to new researchers in bioinformatics. After conducting a systematic search (from 2016 to June 30, 2020) of articles using miRNA-seq in neuropsychiatry, we have seen that it has already been used for different types of studies in three main categories: diagnosis, prognosis, and mechanism. We carefully analyzed the bioinformatics workflows of each study, observing a high degree of variability with respect to the tools and methods used and several methodological complexities that are identified and discussed in this review.
Collapse
Affiliation(s)
- Daniel Pérez-Rodríguez
- Translational Neuroscience Group-CIBERSAM, Galicia Sur Health Research Institute (IIS Galicia Sur), Área Sanitaria de Vigo-Hospital Álvaro Cunqueiro, SERGAS-UVIGO, 36213, Vigo, Spain; NeuroEpigenetics Lab. University Hospital Complex of Vigo, SERGAS-UVIGO, 36213, Vigo, Spain
| | - Hugo López-Fernández
- Instituto de Investigação e Inovação Em Saúde (I3S), Universidade Do Porto, Rua Alfredo Allen, 208, 4200-135, Porto, Portugal; CINBIO, Universidade de Vigo, Department of Computer Science, ESEI - Escuela Superior de Ingeniería Informática, 32004, Ourense, Spain; SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Spain.
| | - Roberto C Agís-Balboa
- Translational Neuroscience Group-CIBERSAM, Galicia Sur Health Research Institute (IIS Galicia Sur), Área Sanitaria de Vigo-Hospital Álvaro Cunqueiro, SERGAS-UVIGO, 36213, Vigo, Spain; NeuroEpigenetics Lab. University Hospital Complex of Vigo, SERGAS-UVIGO, 36213, Vigo, Spain.
| |
Collapse
|
18
|
Riddell N, Murphy MJ, Crewther SG. Electroretinography and Gene Expression Measures Implicate Phototransduction and Metabolic Shifts in Chick Myopia and Hyperopia Models. Life (Basel) 2021; 11:life11060501. [PMID: 34072440 PMCID: PMC8228081 DOI: 10.3390/life11060501] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Revised: 05/23/2021] [Accepted: 05/25/2021] [Indexed: 12/26/2022] Open
Abstract
The Retinal Ion-Driven Fluid Efflux (RIDE) model theorizes that phototransduction-driven changes in trans-retinal ion and fluid transport underlie the development of myopia (short-sightedness). In support of this model, previous functional studies have identified the attenuation of outer retinal contributions to the global flash electroretinogram (gfERG) following weeks of myopia induction in chicks, while discovery-driven transcriptome studies have identified changes to the expression of ATP-driven ion transport and mitochondrial metabolism genes in the retina/RPE/choroid at the mid- to late-induction time-points. Less is known about the early time-points despite biometric analyses demonstrating changes in eye growth by 3 h in the chick lens defocus model. Thus, the present study compared gfERG and transcriptome profiles between 3 h and 3 days of negative lens-induced myopia and positive lens-induced hyperopia in chicks. Photoreceptor (a-wave and d-wave) and bipolar (b-wave and late-stage d-wave) cell responses were suppressed following negative lens-wear, particularly at the 3–4 h and 3-day time-points when active shifts in the rate of ocular growth were expected. Transcriptome measures revealed the up-regulation of oxidative phosphorylation genes following 6 h of negative lens-wear, concordant with previous reports at 2 days in this model. Signal transduction pathways, with core genes involved in glutamate and G-protein coupled receptor signalling, were down-regulated at 6 h. These findings contribute to a growing body of evidence for the dysregulation of phototransduction and mitochondrial metabolism in animal models of myopia.
Collapse
|
19
|
Xie C, Jauhari S, Mora A. Popularity and performance of bioinformatics software: the case of gene set analysis. BMC Bioinformatics 2021; 22:191. [PMID: 33858350 PMCID: PMC8050894 DOI: 10.1186/s12859-021-04124-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Accepted: 04/08/2021] [Indexed: 11/22/2022] Open
Abstract
Background Gene Set Analysis (GSA) is arguably the method of choice for the functional interpretation of omics results. The following paper explores the popularity and the performance of all the GSA methodologies and software published during the 20 years since its inception. "Popularity" is estimated according to each paper's citation counts, while "performance" is based on a comprehensive evaluation of the validation strategies used by papers in the field, as well as the consolidated results from the existing benchmark studies. Results Regarding popularity, data is collected into an online open database ("GSARefDB") which allows browsing bibliographic and method-descriptive information from 503 GSA paper references; regarding performance, we introduce a repository of jupyter workflows and shiny apps for automated benchmarking of GSA methods (“GSA-BenchmarKING”). After comparing popularity versus performance, results show discrepancies between the most popular and the best performing GSA methods. Conclusions The above-mentioned results call our attention towards the nature of the tool selection procedures followed by researchers and raise doubts regarding the quality of the functional interpretation of biological datasets in current biomedical studies. Suggestions for the future of the functional interpretation field are made, including strategies for education and discussion of GSA tools, better validation and benchmarking practices, reproducibility, and functional re-analysis of previously reported data. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04124-5.
Collapse
Affiliation(s)
- Chengshu Xie
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health - Chinese Academy of Sciences, Guangzhou, China
| | - Shaurya Jauhari
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health - Chinese Academy of Sciences, Guangzhou, China
| | - Antonio Mora
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health - Chinese Academy of Sciences, Guangzhou, China.
| |
Collapse
|
20
|
Ietswaart R, Gyori BM, Bachman JA, Sorger PK, Churchman LS. GeneWalk identifies relevant gene functions for a biological context using network representation learning. Genome Biol 2021; 22:55. [PMID: 33526072 PMCID: PMC7852222 DOI: 10.1186/s13059-021-02264-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 01/05/2021] [Indexed: 12/13/2022] Open
Abstract
A bottleneck in high-throughput functional genomics experiments is identifying the most important genes and their relevant functions from a list of gene hits. Gene Ontology (GO) enrichment methods provide insight at the gene set level. Here, we introduce GeneWalk ( github.com/churchmanlab/genewalk ) that identifies individual genes and their relevant functions critical for the experimental setting under examination. After the automatic assembly of an experiment-specific gene regulatory network, GeneWalk uses representation learning to quantify the similarity between vector representations of each gene and its GO annotations, yielding annotation significance scores that reflect the experimental context. By performing gene- and condition-specific functional analysis, GeneWalk converts a list of genes into data-driven hypotheses.
Collapse
Affiliation(s)
- Robert Ietswaart
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - John A Bachman
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - Peter K Sorger
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - L Stirling Churchman
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
21
|
Rosario FJ, Powell TL, Gupta MB, Cox L, Jansson T. mTORC1 Transcriptional Regulation of Ribosome Subunits, Protein Synthesis, and Molecular Transport in Primary Human Trophoblast Cells. Front Cell Dev Biol 2020; 8:583801. [PMID: 33324640 PMCID: PMC7726231 DOI: 10.3389/fcell.2020.583801] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 10/20/2020] [Indexed: 12/12/2022] Open
Abstract
Mechanistic Target of Rapamycin Complex 1 (mTORC1) serves as positive regulator of placental nutrient transport and mitochondrial respiration. The role of mTORC1 signaling in modulating other placental functions is largely unexplored. We used gene array following silencing of raptor to identify genes regulated by mTORC1 in primary human trophoblast (PHT) cells. Seven hundred and thirty-nine genes were differentially expressed; 487 genes were down-regulated and 252 up-regulated. Bioinformatic analyses demonstrated that inhibition of mTORC1 resulted in decreased expression of genes encoding ribosomal proteins in the 60S and 40S ribosome subunits. Furthermore, down-regulated genes were functionally enriched in genes involved in eIF2, sirtuin and mTOR signaling, mitochondrial function, and glutamine and zinc transport. Stress response genes were enriched among up-regulated genes following mTORC1 inhibition. The protein expression of ribosomal proteins RPL26 (RPL26) and Ribosomal Protein S10 (RPS10) was decreased and positively correlated to mTORC1 signaling and System A amino acid transport in human placentas collected from pregnancies complicated by intrauterine growth restriction (IUGR). In conclusion, mTORC1 signaling regulates the expression of trophoblast genes involved in ribosome and protein synthesis, mitochondrial function, lipid metabolism, nutrient transport, and angiogenesis, representing novel links between mTOR signaling and multiple placental functions critical for normal fetal growth and development.
Collapse
Affiliation(s)
- Fredrick J. Rosario
- Division of Reproductive Sciences, Department of Obstetrics and Gynecology, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Theresa L. Powell
- Division of Reproductive Sciences, Department of Obstetrics and Gynecology, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
- Section of Neonatology, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Madhulika B. Gupta
- Department of Biochemistry, University of Western Ontario, London, ON, Canada
| | - Laura Cox
- Center for Precision Medicine, Department of Internal Medicine, Section of Molecular Medicine, Wake Forest School of Medicine, Winston-Salem, NC, United States
| | - Thomas Jansson
- Division of Reproductive Sciences, Department of Obstetrics and Gynecology, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| |
Collapse
|
22
|
Maleki F, Ovens K, Hogan DJ, Kusalik AJ. Gene Set Analysis: Challenges, Opportunities, and Future Research. Front Genet 2020; 11:654. [PMID: 32695141 PMCID: PMC7339292 DOI: 10.3389/fgene.2020.00654] [Citation(s) in RCA: 93] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Accepted: 05/29/2020] [Indexed: 12/14/2022] Open
Abstract
Gene set analysis methods are widely used to provide insight into high-throughput gene expression data. There are many gene set analysis methods available. These methods rely on various assumptions and have different requirements, strengths and weaknesses. In this paper, we classify gene set analysis methods based on their components, describe the underlying requirements and assumptions for each class, and provide directions for future research in developing and evaluating gene set analysis methods.
Collapse
|
23
|
Zeng X, Zong W, Lin CW, Fang Z, Ma T, Lewis DA, Enwright JF, Tseng GC. Comparative Pathway Integrator: A Framework of Meta-Analytic Integration of Multiple Transcriptomic Studies for Consensual and Differential Pathway Analysis. Genes (Basel) 2020; 11:E696. [PMID: 32599927 PMCID: PMC7348908 DOI: 10.3390/genes11060696] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2020] [Revised: 06/15/2020] [Accepted: 06/17/2020] [Indexed: 11/16/2022] Open
Abstract
Pathway enrichment analysis provides a knowledge-driven approach to interpret differentially expressed genes associated with disease status. Many tools have been developed to analyze a single study. However, when multiple studies of different conditions are jointly analyzed, novel integrative tools are needed. In addition, pathway redundancy introduced by combining multiple public pathway databases hinders interpretation and knowledge discovery. We present a meta-analytic integration tool, Comparative Pathway Integrator (CPI), to address these issues using adaptively weighted Fisher's method to discover consensual and differential enrichment patterns, a tight clustering algorithm to reduce pathway redundancy, and a text mining algorithm to assist interpretation of the pathway clusters. We applied CPI to jointly analyze six psychiatric disorder transcriptomic studies to demonstrate its effectiveness, and found functions confirmed by previous biological studies as well as novel enrichment patterns. CPI's R package is accessible online on Github metaOmics/MetaPath.
Collapse
Affiliation(s)
- Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA;
| | - Wei Zong
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15260, USA; (W.Z.); (Z.F.)
| | - Chien-Wei Lin
- Division of Biostatistics, Medical College of Wisconsin, Wauwatosa, WI 53226, USA;
| | - Zhou Fang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15260, USA; (W.Z.); (Z.F.)
| | - Tianzhou Ma
- Department of Epidemiology and Biostatistics, University of Maryland, College Park, MD 20742, USA;
| | - David A. Lewis
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA 15260, USA; (D.A.L.); (J.F.E.)
| | - John F. Enwright
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA 15260, USA; (D.A.L.); (J.F.E.)
| | - George C. Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15260, USA; (W.Z.); (Z.F.)
| |
Collapse
|
24
|
Getachew A, Abejew TA, Wu J, Xu J, Yu H, Tan J, Wu P, Tu Y, Kang W, Wang Z, Xu S. Transcriptome profiling reveals insertional mutagenesis suppressed the expression of candidate pathogenicity genes in honeybee fungal pathogen, Ascosphaera apis. Sci Rep 2020; 10:7532. [PMID: 32372055 PMCID: PMC7200787 DOI: 10.1038/s41598-020-64022-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 04/03/2020] [Indexed: 11/30/2022] Open
Abstract
Chalkbrood disease is caused by Ascosphaera apis which severely affects honeybee brood. Spore inoculation experiments shown pathogenicity varies among different strains and mutants, however, the molecular mechanism of pathogenicity is unclear. We sequenced, assembled and annotated the transcriptomes of wild type (SPE1) and three mutants (SPE2, SPE3 and SPE4) with reduced pathogenicity that were constructed in our previous study. Illumina sequencing generated a total of 394,910,604 clean reads and de novo Trinity-based assembled into 12,989 unigenes, among these, 9,598 genes were successfully annotated to known proteins in UniProt database. A total of 172, 3,996, and 650 genes were up-regulated and 4,403, 2,845, and 3,016 genes were down-regulated between SPE2-SPE1, SPE3-SPE1, and SPE4-SPE1, respectively. Overall, several genes with a potential role in fungal pathogenicity were detected down-regulated in mutants including 100 hydrolytic enzymes, 117 transcriptional factors, and 47 cell wall related genes. KEGG pathway enrichment analysis reveals 216 genes involved in nine pathways were down-regulated in mutants compared to wild type. The down-regulation of more pathways involved in pathogenicity in SPE2 and SPE4 than SPE3 supports their lower pathogenicity during in-vitro bioassay experiment. Expression of 12 down-regulated genes in mutants was validated by quantitative real time PCR. This study provides valuable information on transcriptome variation caused by mutation for further functional validation of candidate pathogenicity genes in A. apis.
Collapse
Affiliation(s)
- Awraris Getachew
- Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture; Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, 100093, Beijing, China
- College of Agriculture and Environmental Sciences, Bahir Dar University, Bahir Dar, Ethiopia
| | - Tessema Aynalem Abejew
- Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture; Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, 100093, Beijing, China
- College of Agriculture and Environmental Sciences, Bahir Dar University, Bahir Dar, Ethiopia
| | - Jiangli Wu
- Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture; Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, 100093, Beijing, China
| | - Jin Xu
- Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture; Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, 100093, Beijing, China
| | - Huimin Yu
- Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture; Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, 100093, Beijing, China
| | - Jing Tan
- Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture; Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, 100093, Beijing, China
| | - Pengjie Wu
- Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture; Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, 100093, Beijing, China
| | - Yangyang Tu
- Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture; Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, 100093, Beijing, China
| | - Weipeng Kang
- Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture; Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, 100093, Beijing, China
| | - Zheng Wang
- Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture; Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, 100093, Beijing, China
| | - Shufa Xu
- Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture; Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, 100093, Beijing, China.
| |
Collapse
|
25
|
Liu W, Venugopal S, Majid S, Ahn IS, Diamante G, Hong J, Yang X, Chandler SH. Single-cell RNA-seq analysis of the brainstem of mutant SOD1 mice reveals perturbed cell types and pathways of amyotrophic lateral sclerosis. Neurobiol Dis 2020; 141:104877. [PMID: 32360664 PMCID: PMC7519882 DOI: 10.1016/j.nbd.2020.104877] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 04/13/2020] [Accepted: 04/22/2020] [Indexed: 12/13/2022] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease in which motor neurons throughout the brain and spinal cord progressively degenerate resulting in muscle atrophy, paralysis and death. Recent studies using animal models of ALS implicate multiple cell-types (e.g., astrocytes and microglia) in ALS pathogenesis in the spinal motor systems. To ascertain cellular vulnerability and cell-type specific mechanisms of ALS in the brainstem that orchestrates oral-motor functions, we conducted parallel single cell RNA sequencing (scRNA-seq) analysis using the high-throughput Drop-seq method. We isolated 1894 and 3199 cells from the brainstem of wildtype and mutant SOD1 symptomatic mice respectively, at postnatal day 100. We recovered major known cell types and neuronal subpopulations, such as interneurons and motor neurons, and trigeminal ganglion (TG) peripheral sensory neurons, as well as, previously uncharacterized interneuron subtypes. We found that the majority of the cell types displayed transcriptomic alterations in ALS mice. Differentially expressed genes (DEGs) of individual cell populations revealed cell-type specific alterations in numerous pathways, including previously known ALS pathways such as inflammation (in microglia), stress response (ependymal and an uncharacterized cell population), neurogenesis (astrocytes, oligodendrocytes, neurons), synapse organization and transmission (microglia, oligodendrocyte precursor cells, and neuronal subtypes), and mitochondrial function (uncharacterized cell populations). Other cell-type specific processes altered in SOD1 mutant brainstem include those from motor neurons (axon regeneration, voltage-gated sodium and potassium channels underlying excitability, potassium ion transport), trigeminal sensory neurons (detection of temperature stimulus involved in sensory perception), and cellular response to toxic substances (uncharacterized cell populations). DEGs consistently altered across cell types (e.g., Malat1), as well as cell-type specific DEGs, were identified. Importantly, DEGs from various cell types overlapped with known ALS genes from the literature and with top hits from an existing human ALS genome-wide association study (GWAS), implicating the potential cell types in which the ALS genes function with ALS pathogenesis. Our molecular investigation at single cell resolution provides comprehensive insights into the cell types, genes and pathways altered in the brainstem in a widely used ALS mouse model.
Collapse
Affiliation(s)
- Wenting Liu
- Department of Integrative Biology & Physiology, University of California, 2024 Terasaki Bld, 610 Charles E. Young Dr. East, Los Angeles, USA
| | - Sharmila Venugopal
- Department of Integrative Biology & Physiology, University of California, 2024 Terasaki Bld, 610 Charles E. Young Dr. East, Los Angeles, USA
| | - Sana Majid
- Department of Integrative Biology & Physiology, University of California, 2024 Terasaki Bld, 610 Charles E. Young Dr. East, Los Angeles, USA
| | - In Sook Ahn
- Department of Integrative Biology & Physiology, University of California, 2024 Terasaki Bld, 610 Charles E. Young Dr. East, Los Angeles, USA
| | - Graciel Diamante
- Department of Integrative Biology & Physiology, University of California, 2024 Terasaki Bld, 610 Charles E. Young Dr. East, Los Angeles, USA
| | - Jason Hong
- Department of Integrative Biology & Physiology, University of California, 2024 Terasaki Bld, 610 Charles E. Young Dr. East, Los Angeles, USA
| | - Xia Yang
- Department of Integrative Biology & Physiology, University of California, 2024 Terasaki Bld, 610 Charles E. Young Dr. East, Los Angeles, USA; Brain Research Institute, University of California, Los Angeles, USA; Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, USA.
| | - Scott H Chandler
- Department of Integrative Biology & Physiology, University of California, 2024 Terasaki Bld, 610 Charles E. Young Dr. East, Los Angeles, USA; Brain Research Institute, University of California, Los Angeles, USA.
| |
Collapse
|
26
|
Geistlinger L, Csaba G, Santarelli M, Ramos M, Schiffer L, Turaga N, Law C, Davis S, Carey V, Morgan M, Zimmer R, Waldron L. Toward a gold standard for benchmarking gene set enrichment analysis. Brief Bioinform 2020; 22:545-556. [PMID: 32026945 PMCID: PMC7820859 DOI: 10.1093/bib/bbz158] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Revised: 10/11/2019] [Accepted: 11/09/2019] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Although gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected datasets and biological reasoning on the relevance of resulting enriched gene sets. RESULTS We develop an extensible framework for reproducible benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization and detection of relevant processes. This framework incorporates a curated compendium of 75 expression datasets investigating 42 human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods, identifying significant differences in runtime and applicability to RNA-seq data, fraction of enriched gene sets depending on the null hypothesis tested and recovery of the predefined relevance rankings. We make practical recommendations on how methods originally developed for microarray data can efficiently be applied to RNA-seq data, how to interpret results depending on the type of gene set test conducted and which methods are best suited to effectively prioritize gene sets with high phenotype relevance. AVAILABILITY http://bioconductor.org/packages/GSEABenchmarkeR. CONTACT ludwig.geistlinger@sph.cuny.edu.
Collapse
Affiliation(s)
- Ludwig Geistlinger
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY 10027, USA
| | - Gergely Csaba
- Institute for Implementation Science and Population Health, City University of New York, New York, NY 10027, USA
| | - Mara Santarelli
- Institute for Bioinformatics, Ludwig-Maximilians-Universität München, 80333 Munich, Germany
| | - Marcel Ramos
- Roswell Park Cancer Institute, Buffalo, NY 14203, USA
| | - Lucas Schiffer
- Graduate School of Arts and Sciences, Boston University, Boston, MA 02215, USA
| | - Nitesh Turaga
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia
| | - Charity Law
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Sean Davis
- Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| | | | | | | | - Levi Waldron
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY 10027, USA
| |
Collapse
|
27
|
Zaffaroni G, Okawa S, Morales-Ruiz M, del Sol A. An integrative method to predict signalling perturbations for cellular transitions. Nucleic Acids Res 2020; 47:e72. [PMID: 30949696 PMCID: PMC6614844 DOI: 10.1093/nar/gkz232] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 02/22/2019] [Accepted: 03/22/2019] [Indexed: 12/19/2022] Open
Abstract
Induction of specific cellular transitions is of clinical importance, as it allows to revert disease cellular phenotype, or induce cellular reprogramming and differentiation for regenerative medicine. Signalling is a convenient way to accomplish such transitions without transfer of genetic material. Here we present the first general computational method that systematically predicts signalling molecules, whose perturbations induce desired cellular transitions. This probabilistic method integrates gene regulatory networks (GRNs) with manually-curated signalling pathways obtained from MetaCore from Clarivate Analytics, to model how signalling cues are received and processed in the GRN. The method was applied to 219 cellular transition examples, including cell type transitions, and overall correctly predicted experimentally validated signalling molecules, consistently outperforming other well-established approaches, such as differential gene expression and pathway enrichment analyses. Further, we validated our method predictions in the case of rat cirrhotic liver, and identified the activation of angiopoietins receptor Tie2 as a potential target for reverting the disease phenotype. Experimental results indicated that this perturbation induced desired changes in the gene expression of key TFs involved in fibrosis and angiogenesis. Importantly, this method only requires gene expression data of the initial and desired cell states, and therefore is suited for the discovery of signalling interventions for disease treatments and cellular therapies.
Collapse
Affiliation(s)
- Gaia Zaffaroni
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette L-4362, Luxembourg
| | - Satoshi Okawa
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette L-4362, Luxembourg
- Integrated BioBank of Luxembourg, Dudelange L-3555, Luxembourg
| | - Manuel Morales-Ruiz
- Biochemistry and Molecular Genetics Department-Hospital Clínic of Barcelona, Institut d’Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona 08036, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Barcelona 08036, Spain
- Working group for the biochemical assessment of hepatic disease-SEQC, Barcelona 08036, Spain
- Department of Biomedicine-Biochemistry Unit, School of Medicine-University of Barcelona, Barcelona 08036, Spain
| | - Antonio del Sol
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette L-4362, Luxembourg
- CIC bioGUNE, Bizkaia Technology Park, Derio 48160, Spain
- IKERBASQUE, Basque Foundation for Science, Bilbao 48013, Spain
- To whom correspondence should be addressed. Tel: +352 46 66 44 6982; Fax: +352 46 66 44 6949;
| |
Collapse
|
28
|
Benis N, Wells JM, Smits MA, Kar SK, van der Hee B, Dos Santos VAPM, Suarez-Diez M, Schokker D. High-level integration of murine intestinal transcriptomics data highlights the importance of the complement system in mucosal homeostasis. BMC Genomics 2019; 20:1028. [PMID: 31888466 PMCID: PMC6937694 DOI: 10.1186/s12864-019-6390-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2018] [Accepted: 12/12/2019] [Indexed: 12/25/2022] Open
Abstract
Background The mammalian intestine is a complex biological system that exhibits functional plasticity in its response to diverse stimuli to maintain homeostasis. To improve our understanding of this plasticity, we performed a high-level data integration of 14 whole-genome transcriptomics datasets from samples of intestinal mouse mucosa. We used the tool Centrality based Pathway Analysis (CePa), along with information from the Reactome database. Results The results show an integrated response of the mouse intestinal mucosa to challenges with agents introduced orally that were expected to perturb homeostasis. We observed that a common set of pathways respond to different stimuli, of which the most reactive was the Regulation of Complement Cascade pathway. Altered expression of the Regulation of Complement Cascade pathway was verified in mouse organoids challenged with different stimuli in vitro. Conclusions Results of the integrated transcriptomics analysis and data driven experiment suggest an important role of epithelial production of complement and host complement defence factors in the maintenance of homeostasis.
Collapse
Affiliation(s)
- Nirupama Benis
- Host Microbe Interactomics, Wageningen University & Research, Wageningen, The Netherlands. .,Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands.
| | - Jerry M Wells
- Host Microbe Interactomics, Wageningen University & Research, Wageningen, The Netherlands
| | - Mari A Smits
- Host Microbe Interactomics, Wageningen University & Research, Wageningen, The Netherlands.,Wageningen Livestock Research, Wageningen University & Research, Wageningen, The Netherlands.,Wageningen Bioveterinary Research, Wageningen University, Wageningen, The Netherlands
| | - Soumya Kanti Kar
- Host Microbe Interactomics, Wageningen University & Research, Wageningen, The Netherlands.,Wageningen Livestock Research, Wageningen University & Research, Wageningen, The Netherlands
| | - Bart van der Hee
- Host Microbe Interactomics, Wageningen University & Research, Wageningen, The Netherlands
| | - Vitor A P Martins Dos Santos
- Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands.,LifeGlimmer GmbH, Berlin, Germany
| | - Maria Suarez-Diez
- Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands
| | - Dirkjan Schokker
- Wageningen Livestock Research, Wageningen University & Research, Wageningen, The Netherlands
| |
Collapse
|
29
|
Zyla J, Marczyk M, Domaszewska T, Kaufmann SHE, Polanska J, Weiner J. Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms. Bioinformatics 2019; 35:5146-5154. [PMID: 31165139 PMCID: PMC6954644 DOI: 10.1093/bioinformatics/btz447] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 05/08/2019] [Accepted: 06/10/2019] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Analysis of gene set (GS) enrichment is an essential part of functional omics studies. Here, we complement the established evaluation metrics of GS enrichment algorithms with a novel approach to assess the practical reproducibility of scientific results obtained from GS enrichment tests when applied to related data from different studies. RESULTS We evaluated eight established and one novel algorithm for reproducibility, sensitivity, prioritization, false positive rate and computational time. In addition to eight established algorithms, we also included Coincident Extreme Ranks in Numerical Observations (CERNO), a flexible and fast algorithm based on modified Fisher P-value integration. Using real-world datasets, we demonstrate that CERNO is robust to ranking metrics, as well as sample and GS size. CERNO had the highest reproducibility while remaining sensitive, specific and fast. In the overall ranking Pathway Analysis with Down-weighting of Overlapping Genes, CERNO and over-representation analysis performed best, while CERNO and GeneSetTest scored high in terms of reproducibility. AVAILABILITY AND IMPLEMENTATION tmod package implementing the CERNO algorithm is available from CRAN (cran.r-project.org/web/packages/tmod/index.html) and an online implementation can be found at http://tmod.online/. The datasets analyzed in this study are widely available in the KEGGdzPathwaysGEO, KEGGandMetacoreDzPathwaysGEO R package and GEO repository. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joanna Zyla
- Data Mining Group, Faculty of Automatic Control, Electronic and Computer Science, Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland
- Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany
| | - Michal Marczyk
- Data Mining Group, Faculty of Automatic Control, Electronic and Computer Science, Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland
- Yale School of Medicine, Yale Cancer Center, New Haven, CT 06510, USA
| | - Teresa Domaszewska
- Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany
| | - Stefan H E Kaufmann
- Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany
| | - Joanna Polanska
- Data Mining Group, Faculty of Automatic Control, Electronic and Computer Science, Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland
| | - January Weiner
- Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany
| |
Collapse
|
30
|
Mubeen S, Hoyt CT, Gemünd A, Hofmann-Apitius M, Fröhlich H, Domingo-Fernández D. The Impact of Pathway Database Choice on Statistical Enrichment Analysis and Predictive Modeling. Front Genet 2019; 10:1203. [PMID: 31824580 PMCID: PMC6883970 DOI: 10.3389/fgene.2019.01203] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 10/30/2019] [Indexed: 02/04/2023] Open
Abstract
Pathway-centric approaches are widely used to interpret and contextualize -omics data. However, databases contain different representations of the same biological pathway, which may lead to different results of statistical enrichment analysis and predictive models in the context of precision medicine. We have performed an in-depth benchmarking of the impact of pathway database choice on statistical enrichment analysis and predictive modeling. We analyzed five cancer datasets using three major pathway databases and developed an approach to merge several databases into a single integrative one: MPath. Our results show that equivalent pathways from different databases yield disparate results in statistical enrichment analysis. Moreover, we observed a significant dataset-dependent impact on the performance of machine learning models on different prediction tasks. In some cases, MPath significantly improved prediction performance and also reduced the variance of prediction performances. Furthermore, MPath yielded more consistent and biologically plausible results in statistical enrichment analyses. In summary, this benchmarking study demonstrates that pathway database choice can influence the results of statistical enrichment analysis and predictive modeling. Therefore, we recommend the use of multiple pathway databases or integrative ones.
Collapse
Affiliation(s)
- Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Charles Tapley Hoyt
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - André Gemünd
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Holger Fröhlich
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| |
Collapse
|
31
|
Ma J, Shojaie A, Michailidis G. A comparative study of topology-based pathway enrichment analysis methods. BMC Bioinformatics 2019; 20:546. [PMID: 31684881 PMCID: PMC6829999 DOI: 10.1186/s12859-019-3146-1] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 10/02/2019] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Pathway enrichment extensively used in the analysis of Omics data for gaining biological insights into the functional roles of pre-defined subsets of genes, proteins and metabolites. A large number of methods have been proposed in the literature for this task. The vast majority of these methods use as input expression levels of the biomolecules under study together with their membership in pathways of interest. The latest generation of pathway enrichment methods also leverages information on the topology of the underlying pathways, which as evidence from their evaluation reveals, lead to improved sensitivity and specificity. Nevertheless, a systematic empirical comparison of such methods is still lacking, making selection of the most suitable method for a specific experimental setting challenging. This comparative study of nine network-based methods for pathway enrichment analysis aims to provide a systematic evaluation of their performance based on three real data sets with different number of features (genes/metabolites) and number of samples. RESULTS The findings highlight both methodological and empirical differences across the nine methods. In particular, certain methods assess pathway enrichment due to differences both across expression levels and in the strength of the interconnectedness of the members of the pathway, while others only leverage differential expression levels. In the more challenging setting involving a metabolomics data set, the results show that methods that utilize both pieces of information (with NetGSA being a prototypical one) exhibit superior statistical power in detecting pathway enrichment. CONCLUSION The analysis reveals that a number of methods perform equally well when testing large size pathways, which is the case with genomic data. On the other hand, NetGSA that takes into consideration both differential expression of the biomolecules in the pathway, as well as changes in the topology exhibits a superior performance when testing small size pathways, which is usually the case for metabolomics data.
Collapse
Affiliation(s)
- Jing Ma
- Texas A&M University, Department of Statistics, College Station, 77840 USA
- Fred Hutchinson Cancer Research Center, Public Health Sciences Division, Seattle, 98107 USA
| | - Ali Shojaie
- University of Washington, Department of Biostatistics, Seattle, 98105 USA
| | | |
Collapse
|
32
|
Mora A. Gene set analysis methods for the functional interpretation of non-mRNA data—Genomic range and ncRNA data. Brief Bioinform 2019; 21:1495-1508. [DOI: 10.1093/bib/bbz090] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 05/30/2019] [Accepted: 06/28/2019] [Indexed: 12/31/2022] Open
Abstract
Abstract
Gene set analysis (GSA) is one of the methods of choice for analyzing the results of current omics studies; however, it has been mainly developed to analyze mRNA (microarray, RNA-Seq) data. The following review includes an update regarding general methods and resources for GSA and then emphasizes GSA methods and tools for non-mRNA omics datasets, specifically genomic range data (ChIP-Seq, SNP and methylation) and ncRNA data (miRNAs, lncRNAs and others). In the end, the state of the GSA field for non-mRNA datasets is discussed, and some current challenges and trends are highlighted, especially the use of network approaches to face complexity issues.
Collapse
Affiliation(s)
- Antonio Mora
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health - Chinese Academy of Sciences
| |
Collapse
|
33
|
Nguyen TM, Shafi A, Nguyen T, Draghici S. Identifying significantly impacted pathways: a comprehensive review and assessment. Genome Biol 2019; 20:203. [PMID: 31597578 PMCID: PMC6784345 DOI: 10.1186/s13059-019-1790-4] [Citation(s) in RCA: 96] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Accepted: 08/13/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Many high-throughput experiments compare two phenotypes such as disease vs. healthy, with the goal of understanding the underlying biological phenomena characterizing the given phenotype. Because of the importance of this type of analysis, more than 70 pathway analysis methods have been proposed so far. These can be categorized into two main categories: non-topology-based (non-TB) and topology-based (TB). Although some review papers discuss this topic from different aspects, there is no systematic, large-scale assessment of such methods. Furthermore, the majority of the pathway analysis approaches rely on the assumption of uniformity of p values under the null hypothesis, which is often not true. RESULTS This article presents the most comprehensive comparative study on pathway analysis methods available to date. We compare the actual performance of 13 widely used pathway analysis methods in over 1085 analyses. These comparisons were performed using 2601 samples from 75 human disease data sets and 121 samples from 11 knockout mouse data sets. In addition, we investigate the extent to which each method is biased under the null hypothesis. Together, these data and results constitute a reliable benchmark against which future pathway analysis methods could and should be tested. CONCLUSION Overall, the result shows that no method is perfect. In general, TB methods appear to perform better than non-TB methods. This is somewhat expected since the TB methods take into consideration the structure of the pathway which is meant to describe the underlying phenomena. We also discover that most, if not all, listed approaches are biased and can produce skewed results under the null.
Collapse
Affiliation(s)
- Tuan-Minh Nguyen
- Department of Computer Science, Wayne State University, Detroit, 48202 USA
| | - Adib Shafi
- Department of Computer Science, Wayne State University, Detroit, 48202 USA
| | - Tin Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557 USA
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, 48202 USA
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, 48202 USA
| |
Collapse
|
34
|
Nguyen T, Mitrea C, Draghici S. Network-Based Approaches for Pathway Level Analysis. ACTA ACUST UNITED AC 2019; 61:8.25.1-8.25.24. [PMID: 30040185 DOI: 10.1002/cpbi.42] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Identification of impacted pathways is an important problem because it allows us to gain insights into the underlying biology beyond the detection of differentially expressed genes. In the past decade, a plethora of methods have been developed for this purpose. The last generation of pathway analysis methods are designed to take into account various aspects of pathway topology in order to increase the accuracy of the findings. Here, we cover 34 such topology-based pathway analysis methods published in the past 13 years. We compare these methods on categories related to implementation, availability, input format, graph models, and statistical approaches used to compute pathway level statistics and statistical significance. We also discuss a number of critical challenges that need to be addressed, arising both in methodology and pathway representation, including inconsistent terminology, data format, lack of meaningful benchmarks, and, more importantly, a systematic bias that is present in most existing methods. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Tin Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, Nevada
| | - Cristina Mitrea
- Department of Computer Science, Wayne State University, Detroit, Michigan
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, Michigan.,Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan
| |
Collapse
|
35
|
Tian S, Wang C, Wang B. Incorporating Pathway Information into Feature Selection towards Better Performed Gene Signatures. BIOMED RESEARCH INTERNATIONAL 2019; 2019:2497509. [PMID: 31073522 PMCID: PMC6470448 DOI: 10.1155/2019/2497509] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 03/07/2019] [Indexed: 12/29/2022]
Abstract
To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, bilevel selection, and pathway-guided gene selection. With bilevel selection methods being regarded as a special case of pathway-guided gene selection process, we discuss pathway-guided gene selection methods in detail and the importance of penalization in such methods. Last, we point out the potential utilizations of pathway-guided gene selection in one active research avenue, namely, to analyze longitudinal gene expression data. We believe this article provides valuable insights for computational biologists and biostatisticians so that they can make biology more computable.
Collapse
Affiliation(s)
- Suyan Tian
- Division of Clinical Research, The First Hospital of Jilin University, 71 Xinmin Street, Changchun, Jilin 130021, China
| | - Chi Wang
- Department of Biostatistics, Markey Cancer Center, The University of Kentucky, 800 Rose St., Lexington, KY 40536, USA
| | - Bing Wang
- School of Life Science, Jilin University, 2699 Qianjin Street, Changchun, Jilin 130012, China
| |
Collapse
|
36
|
Mansoori F, Rahgozar M, Kavousi K. FoPA: identifying perturbed signaling pathways in clinical conditions using formal methods. BMC Bioinformatics 2019; 20:92. [PMID: 30808299 PMCID: PMC6390332 DOI: 10.1186/s12859-019-2635-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Accepted: 01/17/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate identification of perturbed signaling pathways based on differentially expressed genes between sample groups is one of the key factors in the understanding of diseases and druggable targets. Most pathway analysis methods prioritize impacted signaling pathways by incorporating pathway topology using simple graph-based models. Despite their relative success, these models are limited in describing all types of dependencies and interactions that exist in biological pathways. RESULTS In this work, we propose a new approach based on the formal modeling of signaling pathways. Signaling pathways are formally modeled, and then model checking tools are applied to find the likelihood of perturbation for each pathway in a given condition. By adopting formal methods, various complex interactions among biological parts are modeled, which can contribute to reducing the false-positive rate of the proposed approach. We have developed a tool named Formal model checking based pathway analysis (FoPA) based on this approach. FoPA is compared with three well-known pathway analysis methods: PADOG, CePa, and SPIA on the benchmark of 36 GEO datasets from various diseases by applying the target pathway technique. This validation technique eliminates the need for possibly biased human assessments of results. In the cases that, there is no apriori knowledge of all relevant pathways, simulated false inputs (permuted class labels and decoy pathways) are chosen as a set of negative controls to test the false positive rate of the methods. Finally, to further evaluate the efficiency of FoPA, it is applied to a list of autism-related genes. CONCLUSIONS The results obtained by the target pathway technique demonstrate that FoPA is able to prioritize target pathways as well as PADOG but better than CePa and SPIA. Also, the false-positive rate of finding significant pathways using FoPA is lower than other compared methods. Also, FoPA can detect more consistent relevant pathways than other methods. The results of FoPA on autism-related genes highlight the role of "Renin-angiotensin system" pathway. This pathway has been supposed to have a pivotal role in some neurodegenerative diseases, while little attention has been paid to its impact on autism development so far.
Collapse
Affiliation(s)
- Fatemeh Mansoori
- Database Research Group, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
| | - Maseud Rahgozar
- Database Research Group, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran.
| | - Kaveh Kavousi
- Complex Biological Systems and Bioinformatics Lab (CBB), Bioinformatics department, University of Tehran, Tehran, Iran.
| |
Collapse
|
37
|
Rosario FJ, Gupta MB, Myatt L, Powell TL, Glenn JP, Cox L, Jansson T. Mechanistic Target of Rapamycin Complex 1 Promotes the Expression of Genes Encoding Electron Transport Chain Proteins and Stimulates Oxidative Phosphorylation in Primary Human Trophoblast Cells by Regulating Mitochondrial Biogenesis. Sci Rep 2019; 9:246. [PMID: 30670706 PMCID: PMC6343003 DOI: 10.1038/s41598-018-36265-8] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 11/13/2018] [Indexed: 01/06/2023] Open
Abstract
Trophoblast oxidative phosphorylation provides energy for active transport and protein synthesis, which are critical placental functions influencing fetal growth and long-term health. The molecular mechanisms regulating trophoblast mitochondrial oxidative phosphorylation are largely unknown. We hypothesized that mechanistic Target of Rapamycin Complex 1 (mTORC1) is a positive regulator of key genes encoding Electron Transport Chain (ETC) proteins and stimulates oxidative phosphorylation in trophoblast and that ETC protein expression is down-regulated in placentas of infants with intrauterine growth restriction (IUGR). We silenced raptor (mTORC1 inhibition), rictor (mTORC2 inhibition) or DEPTOR (mTORC1/2 activation) in cultured term primary human trophoblast (PHT) cells. mTORC1 inhibition caused a coordinated down-regulation of 18 genes encoding ETC proteins representing all ETC complexes. Inhibition of mTORC1, but not mTORC2, decreased protein expression of ETC complexes I–IV, mitochondrial basal, ATP coupled and maximal respiration, reserve capacity and proton leak, whereas activation of mTORC1 had the opposite effects. Moreover, placental protein expression of ETC complexes was decreased and positively correlated to mTOR signaling activity in IUGR. By controlling trophoblast ATP production, mTORC1 links nutrient and O2 availability and growth factor signaling to placental function and fetal growth. Reduced placental mTOR activity may impair mitochondrial respiration and contribute to placental insufficiency in IUGR pregnancies.
Collapse
Affiliation(s)
- Fredrick J Rosario
- Division of Reproductive Sciences, Department of Obstetrics and Gynecology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| | - Madhulika B Gupta
- Children's Health Research Institute and Department of Pediatrics and Biochemistry, University of Western Ontario, London, Ontario, N6A 5C1, Canada
| | - Leslie Myatt
- Department of Obstetrics and Gynecology, Oregon Health and Science University, Portland, USA
| | - Theresa L Powell
- Division of Reproductive Sciences, Department of Obstetrics and Gynecology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Section of Neonatology, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Jeremy P Glenn
- Department of Genetics, Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, TX, USA
| | - Laura Cox
- Department of Genetics, Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, TX, USA.,Department of Internal Medicine, Section of Molecular Medicine and Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Thomas Jansson
- Division of Reproductive Sciences, Department of Obstetrics and Gynecology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
38
|
Mubeen S, Hoyt CT, Gemünd A, Hofmann-Apitius M, Fröhlich H, Domingo-Fernández D. The Impact of Pathway Database Choice on Statistical Enrichment Analysis and Predictive Modeling. Front Genet 2019. [PMID: 31824580 DOI: 10.3389/fgene.2019.01203/bibtex] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/06/2023] Open
Abstract
Pathway-centric approaches are widely used to interpret and contextualize -omics data. However, databases contain different representations of the same biological pathway, which may lead to different results of statistical enrichment analysis and predictive models in the context of precision medicine. We have performed an in-depth benchmarking of the impact of pathway database choice on statistical enrichment analysis and predictive modeling. We analyzed five cancer datasets using three major pathway databases and developed an approach to merge several databases into a single integrative one: MPath. Our results show that equivalent pathways from different databases yield disparate results in statistical enrichment analysis. Moreover, we observed a significant dataset-dependent impact on the performance of machine learning models on different prediction tasks. In some cases, MPath significantly improved prediction performance and also reduced the variance of prediction performances. Furthermore, MPath yielded more consistent and biologically plausible results in statistical enrichment analyses. In summary, this benchmarking study demonstrates that pathway database choice can influence the results of statistical enrichment analysis and predictive modeling. Therefore, we recommend the use of multiple pathway databases or integrative ones.
Collapse
Affiliation(s)
- Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Charles Tapley Hoyt
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - André Gemünd
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Holger Fröhlich
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| |
Collapse
|
39
|
Ren J, Wang B, Li J. Integrating proteomic and phosphoproteomic data for pathway analysis in breast cancer. BMC SYSTEMS BIOLOGY 2018; 12:130. [PMID: 30577793 PMCID: PMC6302460 DOI: 10.1186/s12918-018-0646-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Background As protein is the basic unit of cell function and biological pathway, shotgun proteomics, the large-scale analysis of proteins, is contributing greatly to our understanding of disease mechanisms. Proteomics study could detect the changes of both protein expression and modification. With the releases of large-scale cancer proteome studies, how to integrate acquired proteomic and phosphoproteomic data in more comprehensive pathway analysis becomes implemented, but remains challenging. Integrative pathway analysis at proteome level provides a systematic insight into the signaling network adaptations in the development of cancer. Results Here we integrated proteomic and phosphoproteomic data to perform pathway prioritization in breast cancer. We manually collected and curated breast cancer well-known related pathways from the literature as target pathways (TPs) or positive control in method evaluation. Three different strategies including Hypergeometric test based over-representation analysis, Kolmogorov-Smirnov (K-S) test based gene set analysis and topology-based pathway analysis, were applied and evaluated in integrating protein expression and phosphorylation. In comparison, we also assessed the ranking performance of the strategy using information of protein expression or protein phosphorylation individually. Target pathways were ranked more top with the data integration than using the information from proteomic or phosphoproteomic data individually. In the comparisons of pathway analysis strategies, topology-based method outperformed than the others. The subtypes of breast cancer, which consist of Luminal A, Luminal B, Basal and HER2-enriched, vary greatly in prognosis and require distinct treatment. Therefore we applied topology-based pathway analysis with integrating protein expression and phosphorylation profiles on four subtypes of breast cancer. The results showed that TPs were enriched in all subtypes but their ranks were significantly different among the subtypes. For instance, p53 pathway ranked top in the Basal-like breast cancer subtype, but not in HER2-enriched type. The rank of Focal adhesion pathway was more top in HER2- subtypes than in HER2+ subtypes. The results were consistent with some previous researches. Conclusions The results demonstrate that the network topology-based method is more powerful by integrating proteomic and phosphoproteomic in pathway analysis of proteomics study. This integrative strategy can also be used to rank the specific pathways for the disease subtypes. Electronic supplementary material The online version of this article (10.1186/s12918-018-0646-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jie Ren
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Bo Wang
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Jing Li
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
40
|
Domingo-Fernández D, Hoyt CT, Bobis-Álvarez C, Marín-Llaó J, Hofmann-Apitius M. ComPath: an ecosystem for exploring, analyzing, and curating mappings across pathway databases. NPJ Syst Biol Appl 2018; 5:3. [PMID: 30564458 PMCID: PMC6292919 DOI: 10.1038/s41540-018-0078-8] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Revised: 10/31/2018] [Accepted: 11/02/2018] [Indexed: 11/09/2022] Open
Abstract
Although pathways are widely used for the analysis and representation of biological systems, their lack of clear boundaries, their dispersion across numerous databases, and the lack of interoperability impedes the evaluation of the coverage, agreements, and discrepancies between them. Here, we present ComPath, an ecosystem that supports curation of pathway mappings between databases and fosters the exploration of pathway knowledge through several novel visualizations. We have curated mappings between three of the major pathway databases and present a case study focusing on Parkinson’s disease that illustrates how ComPath can generate new biological insights by identifying pathway modules, clusters, and cross-talks with these mappings. The ComPath source code and resources are available at https://github.com/ComPath and the web application can be accessed at https://compath.scai.fraunhofer.de/.
Collapse
Affiliation(s)
- Daniel Domingo-Fernández
- 1Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53754 Sankt Augustin, Germany.,2Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, 53115 Bonn, Germany
| | - Charles Tapley Hoyt
- 1Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53754 Sankt Augustin, Germany.,2Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, 53115 Bonn, Germany
| | - Carlos Bobis-Álvarez
- 3Faculty of Medicine and Health Sciences, University of Oviedo, 33006 Oviedo, Spain
| | - Josep Marín-Llaó
- 1Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53754 Sankt Augustin, Germany.,4Rovira i Virgili University, 43003 Tarragona, Spain
| | - Martin Hofmann-Apitius
- 1Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53754 Sankt Augustin, Germany.,2Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, 53115 Bonn, Germany
| |
Collapse
|
41
|
Andrejeva D, Kugler JM, Nguyen HT, Malmendal A, Holm ML, Toft BG, Loya AC, Cohen SM. Metabolic control of PPAR activity by aldehyde dehydrogenase regulates invasive cell behavior and predicts survival in hepatocellular and renal clear cell carcinoma. BMC Cancer 2018; 18:1180. [PMID: 30486822 PMCID: PMC6264057 DOI: 10.1186/s12885-018-5061-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 11/07/2018] [Indexed: 01/16/2023] Open
Abstract
Background Changes in cellular metabolism are now recognized as potential drivers of cancer development, rather than as secondary consequences of disease. Here, we explore the mechanism by which metabolic changes dependent on aldehyde dehydrogenase impact cancer development. Methods ALDH7A1 was identified as a potential cancer gene using a Drosophila in vivo metastasis model. The role of the human ortholog was examined using RNA interference in cell-based assays of cell migration and invasion. 1H-NMR metabolite profiling was used to identify metabolic changes in ALDH7A1-depleted cells. Publically available cancer gene expression data was interrogated to identify a gene-expression signature associated with depletion of ALDH7A1. Computational pathway and gene set enrichment analysis was used to identify signaling pathways and cellular processes that were correlated with reduced ALDH7A1 expression in cancer. A variety of statistical tests used to evaluate these analyses are described in detail in the methods section. Immunohistochemistry was used to assess ALDH7A1 expression in tissue samples from cancer patients. Results Depletion of ALDH7A1 increased cellular migration and invasiveness in vitro. Depletion of ALDH7A1 led to reduced levels of metabolites identified as ligands for Peroxisome proliferator-activated receptor (PPARα). Analysis of publically available cancer gene expression data revealed that ALDH7A1 mRNA levels were reduced in many human cancers, and that this correlated with poor survival in kidney and liver cancer patients. Using pathway and gene set enrichment analysis, we establish a correlation between low ALDH7A1 levels, reduced PPAR signaling and reduced patient survival. Metabolic profiling showed that endogenous PPARα ligands were reduced in ALDH7A1-depleted cells. ALDH7A1-depletion led to reduced PPAR transcriptional activity. Treatment with a PPARα agonist restored normal cellular behavior. Low ALDH7A1 protein levels correlated with poor clinical outcome in hepatocellular and renal clear cell carcinoma patients. Conclusions We provide evidence that low ALDH7A1 expression is a useful prognostic marker of poor clinical outcome for hepatocellular and renal clear cell carcinomas and hypothesize that patients with low ALDH7A1 might benefit from therapeutic approaches addressing PPARα activity. Electronic supplementary material The online version of this article (10.1186/s12885-018-5061-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Diana Andrejeva
- Department of Cellular and Molecular Medicine, University of Copenhagen, Blegdamsvej 3, DK-2200, Copenhagen N, Denmark
| | - Jan-Michael Kugler
- Department of Cellular and Molecular Medicine, University of Copenhagen, Blegdamsvej 3, DK-2200, Copenhagen N, Denmark.
| | - Hung Thanh Nguyen
- Department of Cellular and Molecular Medicine, University of Copenhagen, Blegdamsvej 3, DK-2200, Copenhagen N, Denmark
| | - Anders Malmendal
- Department of Cellular and Molecular Medicine, University of Copenhagen, Blegdamsvej 3, DK-2200, Copenhagen N, Denmark
| | - Mette Lind Holm
- Department of Urology, Rigshospitalet, Blegdamsvej 9, DK-2100, Copenhagen Ø, Denmark
| | | | - Anand C Loya
- Department of Pathology, Rigshospitalet, Blegdamsvej 9, DK-2100, Copenhagen Ø, Denmark
| | - Stephen M Cohen
- Department of Cellular and Molecular Medicine, University of Copenhagen, Blegdamsvej 3, DK-2200, Copenhagen N, Denmark.
| |
Collapse
|
42
|
Lim S, Lee S, Jung I, Rhee S, Kim S. Comprehensive and critical evaluation of individualized pathway activity measurement tools on pan-cancer data. Brief Bioinform 2018; 21:36-46. [PMID: 30462155 DOI: 10.1093/bib/bby097] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Revised: 08/20/2018] [Accepted: 09/09/2018] [Indexed: 12/11/2022] Open
Abstract
Motivation : Biological pathways are extensively used for the analysis of transcriptome data to characterize biological mechanisms underlying various phenotypes. There are a number of computational tools that summarize transcriptome data at the pathway level. However, there is no comparative study on how well these tools produce useful information at the cohort level, enabling comparison of many samples or patients. Results : In this study, we systematically compared and evaluated 13 different pathway activity inference tools based on 5 comparison criteria using pan-cancer data set. This study has two major contributions. First, our study provides a comprehensive survey on computational techniques used by existing pathway activity inference tools. The tools use different strategies and assume different requirements on data: input transformation, use of labels, necessity of cohort-level input data, use of gene relations and scoring metric. Second, we performed extensive evaluations on the performance of these tools. Because different tools use different methods to map samples to the pathway dimension, the tools are evaluated at the pathway level using five comparison criteria. Starting from measuring how well a tool maintains the characteristics of original gene expression values, robustness was also investigated by adding noise into gene expression data. Classification tasks on three clinical variables (tumor versus normal, survival and cancer subtypes) were performed to evaluate the utility of tools for their clinical applications. In addition, the inferred activity values were compared between the tools to see how similar they are along with the scoring schemes they use.
Collapse
Affiliation(s)
- Sangsoo Lim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea
| | - Sangseon Lee
- Department of Computer Science and Engineering, Seoul National University, Seoul, Korea
| | - Inuk Jung
- Bioinformatics Institute, Seoul National University, Seoul, Korea
| | - Sungmin Rhee
- Department of Computer Science and Engineering, Seoul National University, Seoul, Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea.,Department of Computer Science and Engineering, Seoul National University, Seoul, Korea.,Bioinformatics Institute, Seoul National University, Seoul, Korea
| |
Collapse
|
43
|
CGPS: A machine learning-based approach integrating multiple gene set analysis tools for better prioritization of biologically relevant pathways. J Genet Genomics 2018; 45:489-504. [PMID: 30292791 DOI: 10.1016/j.jgg.2018.08.002] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 08/11/2018] [Accepted: 08/13/2018] [Indexed: 12/20/2022]
Abstract
Gene set enrichment (GSE) analyses play an important role in the interpretation of large-scale transcriptome datasets. Multiple GSE tools can be integrated into a single method as obtaining optimal results is challenging due to the plethora of GSE tools and their discrepant performances. Several existing ensemble methods lead to different scores in sorting pathways as integrated results; furthermore, it is difficult for users to choose a single ensemble score to obtain optimal final results. Here, we develop an ensemble method using a machine learning approach called Combined Gene set analysis incorporating Prioritization and Sensitivity (CGPS) that integrates the results provided by nine prominent GSE tools into a single ensemble score (R score) to sort pathways as integrated results. Moreover, to the best of our knowledge, CGPS is the first GSE ensemble method built based on a priori knowledge of pathways and phenotypes. Compared with 10 widely used individual methods and five types of ensemble scores from two ensemble methods, we demonstrate that sorting pathways based on the R score can better prioritize relevant pathways, as established by an evaluation of 120 simulated datasets and 45 real datasets. Additionally, CGPS is applied to expression data involving the drug panobinostat, which is an anticancer treatment against multiple myeloma. The results identify cell processes associated with cancer, such as the p53 signaling pathway (hsa04115); by contrast, according to two ensemble methods (EnrichmentBrowser and EGSEA), this pathway has a rank higher than 20, which may cause users to miss the pathway in their analyses. We show that this method, which is based on a priori knowledge, can capture valuable biological information from numerous types of gene set collections, such as KEGG pathways, GO terms, Reactome, and BioCarta. CGPS is publicly available as a standalone source code at ftp://ftp.cbi.pku.edu.cn/pub/CGPS_download/cgps-1.0.0.tar.gz.
Collapse
|
44
|
Abstract
High throughput techniques such as RNA-seq or microarray analysis have proven to be invaluable for the characterizing of global transcriptional gene activity changes due to external stimuli or diseases. Differential gene expression analysis (DGEA) is the first step in the course of data interpretation, typically producing lists of dozens to thousands of differentially expressed genes. To further guide the interpretation of these lists, different pathway analysis approaches have been developed. These tools typically rely on the classification of genes into sets of genes, such as pathways, based on the interactions between the genes and their function in a common biological process. Regardless of technical differences, these methods do not properly account for cross talk between different pathways and most of the methods rely on binary separation into differentially expressed gene and unaffected genes based on an arbitrarily set
p-value cut-off. To overcome this limitation, we developed a novel approach to identify concertedly modulated sub-graphs in the global cell signaling network, based on the DGEA results of all genes tested. To this end, expression patterns of genes are integrated according to the topology of their interactions and allow potentially to read the flow of information and identify the effectors. The described software, named Modulated Sub-graph Finder (MSF) is freely available at
https://github.com/Modulated-Subgraph-Finder/MSF.
Collapse
Affiliation(s)
- Mariam R Farman
- Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria
| | - Fabian Amman
- Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria.,Department of Chromosome Biology, Max F. Perutz Laboratories,, University of Vienna, Vienna, 1030, Austria
| |
Collapse
|
45
|
Fröhlich H, Balling R, Beerenwinkel N, Kohlbacher O, Kumar S, Lengauer T, Maathuis MH, Moreau Y, Murphy SA, Przytycka TM, Rebhan M, Röst H, Schuppert A, Schwab M, Spang R, Stekhoven D, Sun J, Weber A, Ziemek D, Zupan B. From hype to reality: data science enabling personalized medicine. BMC Med 2018; 16:150. [PMID: 30145981 PMCID: PMC6109989 DOI: 10.1186/s12916-018-1122-7] [Citation(s) in RCA: 187] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 07/09/2018] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Personalized, precision, P4, or stratified medicine is understood as a medical approach in which patients are stratified based on their disease subtype, risk, prognosis, or treatment response using specialized diagnostic tests. The key idea is to base medical decisions on individual patient characteristics, including molecular and behavioral biomarkers, rather than on population averages. Personalized medicine is deeply connected to and dependent on data science, specifically machine learning (often named Artificial Intelligence in the mainstream media). While during recent years there has been a lot of enthusiasm about the potential of 'big data' and machine learning-based solutions, there exist only few examples that impact current clinical practice. The lack of impact on clinical practice can largely be attributed to insufficient performance of predictive models, difficulties to interpret complex model predictions, and lack of validation via prospective clinical trials that demonstrate a clear benefit compared to the standard of care. In this paper, we review the potential of state-of-the-art data science approaches for personalized medicine, discuss open challenges, and highlight directions that may help to overcome them in the future. CONCLUSIONS There is a need for an interdisciplinary effort, including data scientists, physicians, patient advocates, regulatory agencies, and health insurance organizations. Partially unrealistic expectations and concerns about data science-based solutions need to be better managed. In parallel, computational methods must advance more to provide direct benefit to clinical practice.
Collapse
Affiliation(s)
- Holger Fröhlich
- UCB Biosciences GmbH, Alfred-Nobel-Str. Str. 10, 40789 Monheim, Germany
- University of Bonn, Bonn-Aachen International Center for IT, Endenicher Allee 19c, 53115 Bonn, Germany
| | - Rudi Balling
- University of Luxembourg, 6 avenue du Swing, 4367 Belvaux, Luxembourg
| | - Niko Beerenwinkel
- Department of Biosciences and Engineering, ETH Zurich, Mattenstr. 26, 4058 Basel, Switzerland
| | - Oliver Kohlbacher
- University of Tübingen, WSI/ZBIT, Sand 14, 72076 Tübingen, Germany
- Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, 72076 Tübingen, Germany
- Quantitative Biology Center, University of Tübingen, Auf der Morgenstelle 8, 72076 Tübingen, Germany
- Institute for Translational Bioinformatics, University Medical Center Tübingen, Sand 14, 72076 Tübingen, Germany
| | - Santosh Kumar
- Department of Computer Science, University of Memphis, 2222 Dunn Hall, Memphis, TN 38152 USA
| | - Thomas Lengauer
- Max-Planck-Institute for Informatics, 66123 Saarbrücken, Germany
| | - Marloes H. Maathuis
- ETH Zurich, Seminar für Statistik, Rämistrasse 101, 8092 Zurich, Switzerland
| | - Yves Moreau
- University of Leuven, ESAT, Kasteelpark Arenberg 10, 3001 Leuven, Belgium
| | - Susan A. Murphy
- Harvard University, Science Center 400 Suite, Oxford Street, Cambridge, MA 02138-2901 USA
| | - Teresa M. Przytycka
- National Center of Biotechnology Information, National Institute of Health, 8600 Rockville Pike, Bethesda, MD 20894-6075 USA
| | - Michael Rebhan
- Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Hannes Röst
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, ON M5S 3E1 Canada
| | - Andreas Schuppert
- RWTH Aachen, Joint Research Center for Computational Biomedicine, Pauwelsstrasse 19, 52074 Aachen, Germany
| | - Matthias Schwab
- Dr. Margarete Fischer-Bosch Institute of Clinical Pharmacology, Aucherbachstrasse 112, 70376 Stuttgart, Germany
- University of Tübingen, Departments of Clinical Pharmacology and of Pharmacy and Biochemistry, Tübingen, Germany
| | - Rainer Spang
- University of Regensburg, Institute of Functional Genomics, Am BioPark 9, 93053 Regensburg, Germany
| | - Daniel Stekhoven
- ETH Zurich, NEXUS Personalized Health Technol., Otto-Stern-Weg 7, 8093 Zurich, Switzerland
| | - Jimeng Sun
- Georgia Tech University, 801 Atlantic Drive, Atlanta, GA 30332-0280 USA
| | - Andreas Weber
- Institute for Computer Science, University of Bonn, Endenicher Allee 19a, 53115 Bonn, Germany
| | - Daniel Ziemek
- Pfizer, Worldwide Research and Development, Linkstraße 10, 10785 Berlin, Germany
| | - Blaz Zupan
- Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, SI-1000 Ljubljana, Slovenia
| |
Collapse
|
46
|
Pathway and Network Analysis of Differentially Expressed Genes in Transcriptomes. Methods Mol Biol 2018. [PMID: 29508288 DOI: 10.1007/978-1-4939-7710-9_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2023]
Abstract
In recent years, transcriptome sequencing has become very popular, encompassing a wide variety of applications from simple mRNA profiling to discovery and analysis of the entire transcriptome. One of the most common aims of transcriptome sequencing is to identify genes that are differentially expressed (DE) between two or more biological conditions, and to infer associated pathways and gene networks from expression profiles. It can provide avenues for further systematic investigation into potential biologic mechanisms. Gene Set (GS) enrichment analysis is a popular approach to identify pathways or sets of genes that are significantly enriched in the context of differentially expressed genes. However, the approach considers a pathway as a simple gene collection disregarding knowledge of gene or protein interactions. In contrast, topology-based methods integrate the topological structure of a pathway and gene network into the analysis. To provide a panoramic view of such approaches, this chapter demonstrates several recent computational workflows, including gene set enrichment and topology-based methods, for analysis of the DE pathways and gene networks from transcriptome-wide sequencing data.
Collapse
|
47
|
Ihnatova I, Popovici V, Budinska E. A critical comparison of topology-based pathway analysis methods. PLoS One 2018; 13:e0191154. [PMID: 29370226 PMCID: PMC5784953 DOI: 10.1371/journal.pone.0191154] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2017] [Accepted: 12/29/2017] [Indexed: 11/18/2022] Open
Abstract
One of the aims of high-throughput gene/protein profiling experiments is the identification of biological processes altered between two or more conditions. Pathway analysis is an umbrella term for a multitude of computational approaches used for this purpose. While in the beginning pathway analysis relied on enrichment-based approaches, a newer generation of methods is now available, exploiting pathway topologies in addition to gene/protein expression levels. However, little effort has been invested in their critical assessment with respect to their performance in different experimental setups. Here, we assessed the performance of seven representative methods identifying differentially expressed pathways between two groups of interest based on gene expression data with prior knowledge of pathway topologies: SPIA, PRS, CePa, TAPPA, TopologyGSA, Clipper and DEGraph. We performed a number of controlled experiments that investigated their sensitivity to sample and pathway size, threshold-based filtering of differentially expressed genes, ability to detect target pathways, ability to exploit the topological information and the sensitivity to different pre-processing strategies. We also verified type I error rates and described the influence of overexpression of single genes, gene sets and topological motifs of various sizes on the detection of a pathway as differentially expressed. The results of our experiments demonstrate a wide variability of the tested methods. We provide a set of recommendations for an informed selection of the proper method for a given data analysis task.
Collapse
Affiliation(s)
- Ivana Ihnatova
- RECETOX, Faculty of Science, Masarykova Univerzita, Brno, Czech Republic
- Institute of Biostatistics and Analyses, Faculty of Medicine, Masarykova Univerzita, Brno, Czech Republic
| | - Vlad Popovici
- RECETOX, Faculty of Science, Masarykova Univerzita, Brno, Czech Republic
| | - Eva Budinska
- RECETOX, Faculty of Science, Masarykova Univerzita, Brno, Czech Republic
- Institute of Biostatistics and Analyses, Faculty of Medicine, Masarykova Univerzita, Brno, Czech Republic
- * E-mail:
| |
Collapse
|
48
|
Gonzalez-Vicente A, Hopfer U, Garvin JL. Developing Tools for Analysis of Renal Genomic Data: An Invitation to Participate. J Am Soc Nephrol 2017; 28:3438-3440. [PMID: 28982694 DOI: 10.1681/asn.2017070811] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Affiliation(s)
- Agustin Gonzalez-Vicente
- Department of Physiology and Biophysics, School of Medicine, Case Western Reserve University, Cleveland, Ohio
| | - Ulrich Hopfer
- Department of Physiology and Biophysics, School of Medicine, Case Western Reserve University, Cleveland, Ohio
| | - Jeffrey L Garvin
- Department of Physiology and Biophysics, School of Medicine, Case Western Reserve University, Cleveland, Ohio
| |
Collapse
|
49
|
Bayerlová M, Menck K, Klemm F, Wolff A, Pukrop T, Binder C, Beißbarth T, Bleckmann A. Ror2 Signaling and Its Relevance in Breast Cancer Progression. Front Oncol 2017; 7:135. [PMID: 28695110 PMCID: PMC5483589 DOI: 10.3389/fonc.2017.00135] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 06/07/2017] [Indexed: 12/31/2022] Open
Abstract
Breast cancer is a heterogeneous disease and has been classified into five molecular subtypes based on gene expression profiles. Signaling processes linked to different breast cancer molecular subtypes and different clinical outcomes are still poorly understood. Aberrant regulation of Wnt signaling has been implicated in breast cancer progression. In particular Ror1/2 receptors and several other members of the non-canonical Wnt signaling pathway were associated with aggressive breast cancer behavior. However, Wnt signals are mediated via multiple complex pathways, and it is clinically important to determine which particular Wnt cascades, including their domains and targets, are deregulated in poor prognosis breast cancer. To investigate activation and outcome of the Ror2-dependent non-canonical Wnt signaling pathway, we overexpressed the Ror2 receptor in MCF-7 and MDA-MB231 breast cancer cells, stimulated the cells with its ligand Wnt5a, and we knocked-down Ror1 in MDA-MB231 cells. We measured the invasive capacity of perturbed cells to assess phenotypic changes, and mRNA was profiled to quantify gene expression changes. Differentially expressed genes were integrated into a literature-based non-canonical Wnt signaling network. The results were further used in the analysis of an independent dataset of breast cancer patients with metastasis-free survival annotation. Overexpression of the Ror2 receptor, stimulation with Wnt5a, as well as the combination of both perturbations enhanced invasiveness of MCF-7 cells. The expression-responsive targets of Ror2 overexpression in MCF-7 induced a Ror2/Wnt module of the non-canonical Wnt signaling pathway. These targets alter regulation of other pathways involved in cell remodeling processing and cell metabolism. Furthermore, the genes of the Ror2/Wnt module were assessed as a gene signature in patient gene expression data and showed an association with clinical outcome. In summary, results of this study indicate a role of a newly defined Ror2/Wnt module in breast cancer progression and present a link between Ror2 expression and increased cell invasiveness.
Collapse
Affiliation(s)
- Michaela Bayerlová
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Kerstin Menck
- Department of Hematology and Medical Oncology, University Medical Center Göttingen, Göttingen, Germany
| | - Florian Klemm
- Department of Hematology and Medical Oncology, University Medical Center Göttingen, Göttingen, Germany
| | - Alexander Wolff
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Tobias Pukrop
- Department of Hematology and Medical Oncology, University Medical Center Göttingen, Göttingen, Germany
- Clinic for Internal Medicine III, Hematology and Medical Oncology, University Regensburg, Regensburg, Germany
| | - Claudia Binder
- Department of Hematology and Medical Oncology, University Medical Center Göttingen, Göttingen, Germany
| | - Tim Beißbarth
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Annalen Bleckmann
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
- Department of Hematology and Medical Oncology, University Medical Center Göttingen, Göttingen, Germany
| |
Collapse
|
50
|
Alhamdoosh M, Ng M, Wilson NJ, Sheridan JM, Huynh H, Wilson MJ, Ritchie ME. Combining multiple tools outperforms individual methods in gene set enrichment analyses. Bioinformatics 2017; 33:414-424. [PMID: 27694195 PMCID: PMC5408797 DOI: 10.1093/bioinformatics/btw623] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Accepted: 09/23/2016] [Indexed: 12/22/2022] Open
Abstract
Motivation Gene set enrichment (GSE) analysis allows researchers to efficiently extract biological insight from long lists of differentially expressed genes by interrogating them at a systems level. In recent years, there has been a proliferation of GSE analysis methods and hence it has become increasingly difficult for researchers to select an optimal GSE tool based on their particular dataset. Moreover, the majority of GSE analysis methods do not allow researchers to simultaneously compare gene set level results between multiple experimental conditions. Results The ensemble of genes set enrichment analyses (EGSEA) is a method developed for RNA-sequencing data that combines results from twelve algorithms and calculates collective gene set scores to improve the biological relevance of the highest ranked gene sets. EGSEA’s gene set database contains around 25 000 gene sets from sixteen collections. It has multiple visualization capabilities that allow researchers to view gene sets at various levels of granularity. EGSEA has been tested on simulated data and on a number of human and mouse datasets and, based on biologists’ feedback, consistently outperforms the individual tools that have been combined. Our evaluation demonstrates the superiority of the ensemble approach for GSE analysis, and its utility to effectively and efficiently extrapolate biological functions and potential involvement in disease processes from lists of differentially regulated genes. Availability and Implementation EGSEA is available as an R package at http://www.bioconductor.org/packages/EGSEA/. The gene sets collections are available in the R package EGSEAdata from http://www.bioconductor.org/packages/EGSEAdata/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Milica Ng
- CSL Limited, Bio21 Institute, Parkville, Australia
| | | | - Julie M Sheridan
- ACRF Stem Cells and Cancer Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Australia.,Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Huy Huynh
- CSL Limited, Bio21 Institute, Parkville, Australia
| | | | - Matthew E Ritchie
- Molecular Medicine Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Australia.,School of Mathematics and Statistics, The University of Melbourne, Parkville, Australia
| |
Collapse
|